| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
| 10.1 C++ Parsers | The interface to generate C++ parser classes | |
| 10.2 A Complete C++ Example | Demonstrating their use |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
| 10.1.1 C++ Bison Interface | Asking for C++ parser generation | |
| 10.1.2 C++ Semantic Values | %union vs. C++ | |
| 10.1.3 C++ Location Values | The position and location classes | |
| 10.1.4 C++ Parser Interface | Instantiating and running the parser | |
| 10.1.5 C++ Scanner Interface | Exchanges between yylex and parse |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The C++ parser LALR(1) skeleton is named `lalr1.cc'. To
select it, you may either pass the option `--skeleton=lalr1.cc'
to Bison, or include the directive `%skeleton "lalr1.cc"' in the
grammar preamble. When run, bison will create several
entities in the `yy' namespace. Use the `%name-prefix'
directive to change the namespace name, see Bison Declaration Summary. The
various classes are generated in the following files:
The definition of the classes position and location,
used for location tracking. See section C++ Location Values.
An auxiliary class stack used by the parser.
(Assuming the extension of the input file was `.yy'.) The declaration and implementation of the C++ parser class. The basename and extension of these two files follow the same rules as with regular C parsers (see section Invoking Bison).
The header is mandatory; you must either pass
`-d'/`--defines' to bison, or use the
`%defines' directive.
All these files are documented using Doxygen; run doxygen
for a complete and accurate documentation.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The %union directive works as for C, see The Collection of Value Types. In particular it produces a genuine
union(1), which have a few specific features in C++.
YYSTYPE is defined but its use is discouraged: rather
you should refer to the parser's encapsulated type
yy::parser::semantic_type.
Because objects have to be stored via pointers, memory is not
reclaimed automatically: using the %destructor directive is the
only means to avoid leaks. See section Freeing Discarded Symbols.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When the directive %locations is used, the C++ parser supports
location tracking, see Locations Overview. Two
auxiliary classes define a position, a single point in a file,
and a location, a range composed of a pair of
positions (possibly spanning several files).
The name of the file. It will always be handled as a pointer, the parser will never duplicate nor deallocate it. As an experimental feature you may change it to `type*' using `%define "filename_type" "type"'.
The line, starting at 1.
Advance by height lines, resetting the column number.
The column, starting at 0.
Advance by width columns, without changing the line number.
Various forms of syntactic sugar for columns.
Report p on o like this: `file:line.column', or `line.column' if file is null.
The first, inclusive, position of the range, and the first beyond.
Advance the end position.
Various forms of syntactic sugar.
Move begin onto end.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The output files `output.hh' and `output.cc'
declare and define the parser class in the namespace yy. The
class name defaults to parser, but may be changed using
`%define "parser_class_name" "name"'. The interface of
this class is detailed below. It can be extended using the
%parse-param feature: its semantics is slightly changed since
it describes an additional member of the parser class, and an
additional argument for its constructor.
The types for semantics value and locations.
Build a new parser object. There are no arguments by default, unless `%parse-param {type1 arg1}' was used.
Run the syntactic analysis, and return 0 on success, 1 otherwise.
Get or set the stream used for tracing the parsing. It defaults to
std::cerr.
Get or set the tracing level. Currently its value is either 0, no trace, or nonzero, full tracing.
The definition for this member function must be supplied by the user: the parser uses it to report a parser error occurring at l, described by m.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The parser invokes the scanner by calling yylex. Contrary to C
parsers, C++ parsers are always pure: there is no point in using the
%pure-parser directive. Therefore the interface is as follows.
Return the next token. Its type is the return value, its semantic value and location being yylval and yylloc. Invocations of `%lex-param {type1 arg1}' yield additional arguments.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This section demonstrates the use of a C++ parser with a simple but complete example. This example should be available on your system, ready to compile, in the directory ../bison/examples/calc++. It focuses on the use of Bison, therefore the design of the various C++ classes is very naive: no accessors, no encapsulation of members etc. We will use a Lex scanner, and more precisely, a Flex scanner, to demonstrate the various interaction. A hand written scanner is actually easier to interface with.
| 10.2.1 Calc++ -- C++ Calculator | The specifications | |
| 10.2.2 Calc++ Parsing Driver | An active parsing context | |
| 10.2.3 Calc++ Parser | A parser class | |
| 10.2.4 Calc++ Scanner | A pure C++ Flex scanner | |
| 10.2.5 Calc++ Top Level | Conducting the band |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Of course the grammar is dedicated to arithmetics, a single
expression, possibly preceded by variable assignments. An
environment containing possibly predefined variables such as
one and two, is exchanged with the parser. An example
of valid input follows.
three := 3 seven := one + two * three seven * seven |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
To support a pure interface with the parser (and the scanner) the technique of the "parsing context" is convenient: a structure containing all the data to exchange. Since, in addition to simply launch the parsing, there are several auxiliary tasks to execute (open the file for parsing, instantiate the parser etc.), we recommend transforming the simple parsing context structure into a fully blown parsing driver class.
The declaration of this driver class, `calc++-driver.hh', is as follows. The first part includes the CPP guard and imports the required standard library components, and the declaration of the parser class.
#ifndef CALCXX_DRIVER_HH # define CALCXX_DRIVER_HH # include <string> # include <map> # include "calc++-parser.hh" |
Then comes the declaration of the scanning function. Flex expects
the signature of yylex to be defined in the macro
YY_DECL, and the C++ parser expects it to be declared. We can
factor both as follows.
// Announce to Flex the prototype we want for lexing function, ...
# define YY_DECL \
yy::calcxx_parser::token_type \
yylex (yy::calcxx_parser::semantic_type* yylval, \
yy::calcxx_parser::location_type* yylloc, \
calcxx_driver& driver)
// ... and declare it for the parser's sake.
YY_DECL;
|
The calcxx_driver class is then declared with its most obvious
members.
// Conducting the whole scanning and parsing of Calc++.
class calcxx_driver
{
public:
calcxx_driver ();
virtual ~calcxx_driver ();
std::map<std::string, int> variables;
int result;
|
To encapsulate the coordination with the Flex scanner, it is useful to have two members function to open and close the scanning phase. members.
// Handling the scanner. void scan_begin (); void scan_end (); bool trace_scanning; |
Similarly for the parser itself.
// Handling the parser. void parse (const std::string& f); std::string file; bool trace_parsing; |
To demonstrate pure handling of parse errors, instead of simply dumping them on the standard error output, we will pass them to the compiler driver using the following two member functions. Finally, we close the class declaration and CPP guard.
// Error handling. void error (const yy::location& l, const std::string& m); void error (const std::string& m); }; #endif // ! CALCXX_DRIVER_HH |
The implementation of the driver is straightforward. The parse
member function deserves some attention. The error functions
are simple stubs, they should actually register the located error
messages and set error state.
#include "calc++-driver.hh"
#include "calc++-parser.hh"
calcxx_driver::calcxx_driver ()
: trace_scanning (false), trace_parsing (false)
{
variables["one"] = 1;
variables["two"] = 2;
}
calcxx_driver::~calcxx_driver ()
{
}
void
calcxx_driver::parse (const std::string &f)
{
file = f;
scan_begin ();
yy::calcxx_parser parser (*this);
parser.set_debug_level (trace_parsing);
parser.parse ();
scan_end ();
}
void
calcxx_driver::error (const yy::location& l, const std::string& m)
{
std::cerr << l << ": " << m << std::endl;
}
void
calcxx_driver::error (const std::string& m)
{
std::cerr << m << std::endl;
}
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The parser definition file `calc++-parser.yy' starts by asking for the C++ LALR(1) skeleton, the creation of the parser header file, and specifies the name of the parser class. Because the C++ skeleton changed several times, it is safer to require the version you designed the grammar for.
%skeleton "lalr1.cc" /* -*- C++ -*- */ %require "2.1a" %defines %define "parser_class_name" "calcxx_parser" |
Then come the declarations/inclusions needed to define the
%union. Because the parser uses the parsing driver and
reciprocally, both cannot include the header of the other. Because the
driver's header needs detailed knowledge about the parser class (in
particular its inner types), it is the parser's header which will simply
use a forward declaration of the driver.
%{
# include <string>
class calcxx_driver;
%}
|
The driver is passed by reference to the parser and to the scanner. This provides a simple but effective pure interface, not relying on global variables.
// The parsing context.
%parse-param { calcxx_driver& driver }
%lex-param { calcxx_driver& driver }
|
Then we request the location tracking feature, and initialize the first location's file name. Afterwards new locations are computed relatively to the previous locations: the file name will be automatically propagated.
%locations
%initial-action
{
// Initialize the initial location.
@$.begin.filename = @$.end.filename = &driver.file;
};
|
Use the two following directives to enable parser tracing and verbose error messages.
%debug %error-verbose |
Semantic values cannot use "real" objects, but only pointers to them.
// Symbols.
%union
{
int ival;
std::string *sval;
};
|
The code between `%{' and `%}' after the introduction of the `%union' is output in the `*.cc' file; it needs detailed knowledge about the driver.
%{
# include "calc++-driver.hh"
%}
|
The token numbered as 0 corresponds to end of file; the following line
allows for nicer error messages referring to "end of file" instead
of "$end". Similarly user friendly named are provided for each
symbol. Note that the tokens names are prefixed by TOKEN_ to
avoid name clashes.
%token END 0 "end of file" %token ASSIGN ":=" %token <sval> IDENTIFIER "identifier" %token <ival> NUMBER "number" %type <ival> exp "expression" |
To enable memory deallocation during error recovery, use
%destructor.
%printer { debug_stream () << *$$; } "identifier"
%destructor { delete $$; } "identifier"
%printer { debug_stream () << $$; } "number" "expression"
|
The grammar itself is straightforward.
%%
%start unit;
unit: assignments exp { driver.result = $2; };
assignments: assignments assignment {}
| /* Nothing. */ {};
assignment: "identifier" ":=" exp { driver.variables[*$1] = $3; };
%left '+' '-';
%left '*' '/';
exp: exp '+' exp { $$ = $1 + $3; }
| exp '-' exp { $$ = $1 - $3; }
| exp '*' exp { $$ = $1 * $3; }
| exp '/' exp { $$ = $1 / $3; }
| "identifier" { $$ = driver.variables[*$1]; }
| "number" { $$ = $1; };
%%
|
Finally the error member function registers the errors to the
driver.
void
yy::calcxx_parser::error (const yy::calcxx_parser::location_type& l,
const std::string& m)
{
driver.error (l, m);
}
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The Flex scanner first includes the driver declaration, then the parser's to get the set of defined tokens.
%{ /* -*- C++ -*- */
# include <cstdlib>
# include <errno.h>
# include <limits.h>
# include <string>
# include "calc++-driver.hh"
# include "calc++-parser.hh"
/* Work around an incompatibility in flex (at least versions
2.5.31 through 2.5.33): it generates code that does
not conform to C89. See Debian bug 333231
<http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=333231>. */
# undef yywrap
# define yywrap() 1
/* By default yylex returns int, we use token_type.
Unfortunately yyterminate by default returns 0, which is
not of token_type. */
#define yyterminate() return token::END
%}
|
Because there is no #include-like feature we don't need
yywrap, we don't need unput either, and we parse an
actual file, this is not an interactive session with the user.
Finally we enable the scanner tracing features.
%option noyywrap nounput batch debug |
Abbreviations allow for more readable rules.
id [a-zA-Z][a-zA-Z_0-9]* int [0-9]+ blank [ \t] |
The following paragraph suffices to track locations accurately. Each
time yylex is invoked, the begin position is moved onto the end
position. Then when a pattern is matched, the end position is
advanced of its width. In case it matched ends of lines, the end
cursor is adjusted, and each time blanks are matched, the begin cursor
is moved onto the end cursor to effectively ignore the blanks
preceding tokens. Comments would be treated equally.
%{
# define YY_USER_ACTION yylloc->columns (yyleng);
%}
%%
%{
yylloc->step ();
%}
{blank}+ yylloc->step ();
[\n]+ yylloc->lines (yyleng); yylloc->step ();
|
The rules are simple, just note the use of the driver to report errors.
It is convenient to use a typedef to shorten
yy::calcxx_parser::token::identifier into
token::identifier for instance.
%{
typedef yy::calcxx_parser::token token;
%}
/* Convert ints to the actual type of tokens. */
[-+*/] return yy::calcxx_parser::token_type (yytext[0]);
":=" return token::ASSIGN;
{int} {
errno = 0;
long n = strtol (yytext, NULL, 10);
if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE))
driver.error (*yylloc, "integer is out of range");
yylval->ival = n;
return token::NUMBER;
}
{id} yylval->sval = new std::string (yytext); return token::IDENTIFIER;
. driver.error (*yylloc, "invalid character");
%%
|
Finally, because the scanner related driver's member function depend on the scanner's data, it is simpler to implement them in this file.
void
calcxx_driver::scan_begin ()
{
yy_flex_debug = trace_scanning;
if (!(yyin = fopen (file.c_str (), "r")))
error (std::string ("cannot open ") + file);
}
void
calcxx_driver::scan_end ()
{
fclose (yyin);
}
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The top level file, `calc++.cc', poses no problem.
#include <iostream>
#include "calc++-driver.hh"
int
main (int argc, char *argv[])
{
calcxx_driver driver;
for (++argv; argv[0]; ++argv)
if (*argv == std::string ("-p"))
driver.trace_parsing = true;
else if (*argv == std::string ("-s"))
driver.trace_scanning = true;
else
{
driver.parse (*argv);
std::cout << driver.result << std::endl;
}
}
|
| [ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated
using texi2html 1.76.