idlebox / 2007 / flex-bison-cpp-example
Funny Drawing with 'C++' 'FLEX' and a Bison

Flex Bison C++ Template/Example

ShoutBox
Sergey: thanks. it is a good work!
wilm: Thanks, this is great. I had dealt with this problem quite a while back. I came up with a similar design using a node like structure with a virtual method. It looks like your driver class is a cleaner implementation than mine. My edges were not nearly as clean, plus I was using flex/bison in C mode. Thanks again this should save me lots of headache.
Timo: Thanks for the notice. Version 0.1.4 fixes that.
brucer42: doesn't work with bison-2.4.1, does work with bison-2.3. botheration
djeauh: Very good tutorial. Thanks!
Wolfgang: Hi, really fine explanation, and exactly what I was searching for! I need to extend C-code file-parser to C++-code iostream and string parsing :-) Thanks for that!    
Rgds,    
  Wolfgang
Jason: Hey, great tutorial.  Builds just great on Mac, but I can't find where it dumps the executable (exprtest).  Any ideas?
Timo: The example builds well in Ubuntu Hardy, both without flex / bison and with the two tools installed. Regenerated parser and scanner source code also compiles ok.

flex and bison are not called directly in the Makefiles, they are triggered by automake's magic. It works by simply putting the scanner.ll and parser.yy files into the program_SOURCES rule inside Makefile.am. Calling flex and bison yourself in a custom makefile is straight-forward parameter fiddeling.
Name: Dude, I cant build your project in Ubuntu? I cant find bison anywhere in your makefile



 

Summary

This code template can be used to integrate a Flex scanner and Bison parser pair into a modern C++ program. These two universal tools are very difficult to incorporate into a good C++ design. The template utilizes both Flex and Bison in C++ mode and their output are encapsulate into classes. Some tricky code is used to bind the two classes together. Thus the lexer and parser become fully re-entrant, as all state variables are contained in the class objects. Furthermore multiple different lexer-parser pairs can easily be linked into one binary, because they have different class names and/or are located in a different namespace.

Why use these old tools? Well, they are here to stay and they work well. But most important, the code generated by Flex and Bison requires no compile-time dependencies, because they generate fully autonomous source code. So far I have not found any modern parser generator which outputs independent code. It is even possible to compile the generated source on Windows with Visual C++ 2005.

The implemented grammar is a standard infix-notation calculator, which even supports some variables. This application is not the focus of the template, it is there as a starting-point for you to insert your own grammar.

See the README below for more information and pointers into the code.

For the changes between versions see the corresponding weblog entries for 0.1.3 and 0.1.4 or the ChangeLog below.

Downloads

Flex Bison C++ Example 0.1.4 (current) released 2009-09-05
Source code archive: Download flex-bison-cpp-example-0.1.4.tar.bz2 (245kb)
MD5: 9138d558a6af8de798947f3d86e2430f
  Browse doxygen documentation online
Browse source online

See bottom of this page for older downloads.

License

The parts of the example code written by myself are released into the public domain or, at your option, under the Do What The Fuck You Want To Public License (WTFPL). There are some special GPL license exceptions for the included source files from the Bison and Flex distributions. All in all you can just copy all the code and don't have to credit me at all.

Subversion

The subversion repository containing all sources and packages is available at http://idlebox.net/2007/flex-bison-cpp-example/svn/.

README

Website / License

The current example package can be downloaded from http://idlebox.net/2007/flex-bison-cpp-example/

The following just mean you can copy the example code into your program or use it for whatever purpose without crediting me (though I would really like it if you did):

The parts of the example code written by myself are released into the public domain or, at your option, under the Do What The Fuck You Want To Public License (WTFPL), which can be found in the file COPYING. There are some special GPL license exceptions for the included source files from the Bison and Flex distributions.

The idea and method of this example is based on code from http://ioctl.org/jan/bison/

Why Use These Old Tools?

Well, they are here to stay and they work well. These days there are much more sophisticated C++ parser generation frameworks around:

All these libraries do good jobs when you need to generate parsers for more difficult grammars.

However if you write a program with one of the frameworks above, then your users need that parser framework installed to compile your program. But Flex and Bison require no compile-time dependencies, because they generate fully autonomous source code. (And Flex and Bison are installed almost everywhere.) So far I have not found any modern parser generator which outputs independent code.

It is even possible to compile the generated source with Visual C++ on Windows (worked with 8.0 aka 2005). Flex and Bison need not be installed on the windows machine. The source package includes a VC++ solution and two project files.

Source and Generated Files

The src directory contains the following source files. Note that some of them are automatically generated from others.





  • expression.h defines the example's calculator node classes.
  • exprtest.cc contains a main function to run the example calculator.

  • readme.dox doxygen explanation text, which you are reading right now.

So if you wish to create a program using a C++ Flex lexer and Bison parser, you need to copy the following files:

Namespace and Library

The scanner, parser and driver classes are located within the example namespace. When coding a larger program, I believe it is most convenient to put all scanner/parser source files into a separate directory and build a static library. Then all parts of the parser are located in a separate namespace and directory.

Code Overview

This is a brief overview of the code's structure. Further detailed information is contained in the doxygen documentation, comments in the source and ultimately in the code itself.

Scanner

The input stream is first converted by the lexical scanner into tokens. The scanner is defined by the list of regular expressions in scanner.ll . From this file Flex generates the file scanner.cc, which mainly contains a function called yylex(). This function returns the next token for the parser. For a C++ scanner the yylex() function is contained in a class, which is named yyFlexLexer by default. It is declared in the FlexLexer.h and is derived from the abstract FlexLexer class.

By defining the macro yyFlexLexer => ExampleFlexLexer in scanner.h, the default name of the scanner class is changed. Furthermore to extend yylex()'s parameter list, the class example::Scanner is derived from the ExampleFlexLexer class. It is mainly a forwarding class. By defining the macro YY_DECL, the yylex() function generated by Flex is renamed to example::Scanner::lex().

Another change to the default Flex code is that the token type is changed from int to the enum example::Parser::token defined by parser.

Parser

Bison's support for C++ is much more sophisticated. In C++ mode it generates a class named example::Parser, which is located in parser.cc and declared in parser.h . The header file also defines the scanner tokens, which must be returned by the Flex scanner's regular expression rules. Bison's C++ skeleton also installs the three .hh files, which contain utility classes required by the parser.

In the example calculator the Bison code constructs a calculation node tree. The tree's nodes are derived from CalcNode and are evaluated to output the parsed expression's result.

Driver

The example::Driver class brings the two components scanner and parser classes together. It is the context parameter of the parser class. The hook between scanner object and parser object is done by defining the yylex() macro to be "driver.lexer->lex". This way the Bison parser requests the next token from the scanner object contained within the driver.

The example::Driver object can be accessed by the Bison actions. Therefore it will contain a reference to the data classes filled by the parser's rules. In the example it contains a reference to the CalcContext. Thus a refernce to a CalcContext must be given to the constructor of example::Driver. This CalcContext object will be filled with the parsed data.

To initiate parsing the example::Driver class contains the three functions example::Driver::parse_stream(), example::Driver::parse_file() and example::Driver::parse_string().

Example Calculator

The example lexer and grammar is a simple floating point arithmetic calculator. It follows the usual operator precedence rules (sometimes called BODMAS or PEMDAS): Parentheses, Exponentation, Multiplication/Division, Addition/Subtraction.

Besides these simple arithmetic operators, the program also supports variables. These can be assigned a value and used in subsequent expressions.

It can be started interactively and will process expressions entered on the console. The expression's parse tree is printed and then evaluated. Here some examples:

$ ./exprtest
Reading expressions from stdin
input: 4 * 1.5 + 3 * (2 ^ 4 - 4)
tree:
+ add
  * multiply
    4
    1.5
  * multiply
    3
    - subtract
      ^ power
        2
        4
      4
evaluated: 42
input: v = (2 ^ 4 - 4)
Setting variable v = 12
input: 3.5 * a
input:1.6: Unknown variable "a"
input: 3.5 * v
tree:
* multiply
  3.5
  12
evaluated: 42
input: 5 + * 6
input:1.4: syntax error, unexpected '*'
input: 5 + (4 * 4
input:1.10-9: syntax error, unexpected end of file, expecting ')'

The exprtest can also be used to process text files containing expressions. Within the file each line is parsed as an expression. Multiple expressions can be put into one line by terminating them with a semicolon ';'. The exprtest outputs a parse tree for each parsed non-assignment line.


v = (2 ^ 4 - 4); e = 2.71828

4 * 1.5 + 3 * v

6 * (2 * 2) ^ 2 / 2

The above example file (included as exprtest.txt) can be processed by calling ./exprtest exprtest.txt. The program outputs the following evaluation:

Setting variable v = 12
Setting variable e = 2.71828
Expressions:
[0]:
tree:
+ add
  * multiply
    4
    1.5
  * multiply
    3
    12
evaluated: 42
[1]:
tree:
/ divide
  * multiply
    6
    ^ power
      * multiply
        2
        2
      2
  2
evaluated: 48

ChangeLog

2009-09-05 - Timo Bingmann - v0.1.4
  • Two small fix-up lines to get it working with bison-2.4.1. Added one standard header include and making CalcNode a opaque class pointer in %union due to changed include file order.
  • Correcting associativity of the power operator: right.
2008-10-23 - Timo Bingmann - v0.1.4
  • FlexLexer.h, scanner.ll: Corrected a very subtle bug with the newly introduced virtual yywrap() function in the FlexLexer class. Depending on how the header was included, the class contained the virtual yywrap() function or not. These differing class declarations lead to very strange NULL pointer exceptions, because the different compiled objects assume different class memory layouts. Ultimately the exprtest program always segfaulted.
2008-08-03 - Timo Bingmann - v0.1.2
  • scanner.ll, y.tab.h:
    Fixing compilation problem which was reported to happen when no %union directive is used: in this case the include headers order is changed around by bison and breaks compilation. This was fixed by never including parser.h directly, but always using scanner.h.
  • FlexLexer.h:
    updated file to work with new flex version 2.5.35. The new version adds a virtual function yywrap() to the yyFlexLexer class. This function is automatically defined in any lexer source file generated by flex. However because I copied FlexLexer.h from an older flex distribution, the function definition throughs a "no member function" compiler error.

Older Downloads

Flex Bison C++ Example 0.1.3 released 2008-10-23
Source code archive: Download flex-bison-cpp-example-0.1.3.tar.bz2 (242kb)
MD5: 80ebfaa321aef9c871670f1823b071b3
  Browse doxygen documentation online
Browse source online

Flex Bison C++ Example 0.1.2 released 2008-08-03
Source code archive: Download flex-bison-cpp-example-0.1.2.tar.bz2 (241kb)
MD5: 265c45d4e47818bc368ea765f9841da6
  Browse doxygen documentation online
Browse source online

Flex Bison C++ Example 0.1 released 2007-08-20
Source code archive: Download flex-bison-cpp-example-0.1.tar.bz2 (248kb)
MD5: abba99e37c6790a4f1a914fc13611258
  Browse doxygen documentation online
Browse source online
RSS 2.0 Weblog Feed Atom 1.0 Weblog Feed Valid XHTML 1.1 Valid CSS (2.1)