[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10. C++ Language Interface


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.1 C++ Parsers


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.1.1 C++ Bison Interface

The C++ parser LALR(1) skeleton is named `lalr1.cc'. To select it, you may either pass the option `--skeleton=lalr1.cc' to Bison, or include the directive `%skeleton "lalr1.cc"' in the grammar preamble. When run, bison will create several entities in the `yy' namespace. Use the `%name-prefix' directive to change the namespace name, see Bison Declaration Summary. The various classes are generated in the following files:

`position.hh'
`location.hh'

The definition of the classes position and location, used for location tracking. See section C++ Location Values.

`stack.hh'

An auxiliary class stack used by the parser.

`file.hh'
`file.cc'

(Assuming the extension of the input file was `.yy'.) The declaration and implementation of the C++ parser class. The basename and extension of these two files follow the same rules as with regular C parsers (see section Invoking Bison).

The header is mandatory; you must either pass `-d'/`--defines' to bison, or use the `%defines' directive.

All these files are documented using Doxygen; run doxygen for a complete and accurate documentation.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.1.2 C++ Semantic Values

The %union directive works as for C, see The Collection of Value Types. In particular it produces a genuine union(1), which have a few specific features in C++.

Because objects have to be stored via pointers, memory is not reclaimed automatically: using the %destructor directive is the only means to avoid leaks. See section Freeing Discarded Symbols.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.1.3 C++ Location Values

When the directive %locations is used, the C++ parser supports location tracking, see Locations Overview. Two auxiliary classes define a position, a single point in a file, and a location, a range composed of a pair of positions (possibly spanning several files).

Method on position: std::string* file

The name of the file. It will always be handled as a pointer, the parser will never duplicate nor deallocate it. As an experimental feature you may change it to `type*' using `%define "filename_type" "type"'.

Method on position: unsigned int line

The line, starting at 1.

Method on position: unsigned int lines (int height = 1)

Advance by height lines, resetting the column number.

Method on position: unsigned int column

The column, starting at 0.

Method on position: unsigned int columns (int width = 1)

Advance by width columns, without changing the line number.

Method on position: position& operator+= (position& pos, int width)
Method on position: position operator+ (const position& pos, int width)
Method on position: position& operator-= (const position& pos, int width)
Method on position: position operator- (position& pos, int width)

Various forms of syntactic sugar for columns.

Method on position: position operator<< (std::ostream o, const position& p)

Report p on o like this: `file:line.column', or `line.column' if file is null.

Method on location: position begin
Method on location: position end

The first, inclusive, position of the range, and the first beyond.

Method on location: unsigned int columns (int width = 1)
Method on location: unsigned int lines (int height = 1)

Advance the end position.

Method on location: location operator+ (const location& begin, const location& end)
Method on location: location operator+ (const location& begin, int width)
Method on location: location operator+= (const location& loc, int width)

Various forms of syntactic sugar.

Method on location: void step ()

Move begin onto end.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.1.4 C++ Parser Interface

The output files `output.hh' and `output.cc' declare and define the parser class in the namespace yy. The class name defaults to parser, but may be changed using `%define "parser_class_name" "name"'. The interface of this class is detailed below. It can be extended using the %parse-param feature: its semantics is slightly changed since it describes an additional member of the parser class, and an additional argument for its constructor.

Type of parser: semantic_value_type
Type of parser: location_value_type

The types for semantics value and locations.

Method on parser: parser (type1 arg1, ...)

Build a new parser object. There are no arguments by default, unless `%parse-param {type1 arg1}' was used.

Method on parser: int parse ()

Run the syntactic analysis, and return 0 on success, 1 otherwise.

Method on parser: std::ostream& debug_stream ()
Method on parser: void set_debug_stream (std::ostream& o)

Get or set the stream used for tracing the parsing. It defaults to std::cerr.

Method on parser: debug_level_type debug_level ()
Method on parser: void set_debug_level (debug_level l)

Get or set the tracing level. Currently its value is either 0, no trace, or nonzero, full tracing.

Method on parser: void error (const location_type& l, const std::string& m)

The definition for this member function must be supplied by the user: the parser uses it to report a parser error occurring at l, described by m.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.1.5 C++ Scanner Interface

The parser invokes the scanner by calling yylex. Contrary to C parsers, C++ parsers are always pure: there is no point in using the %pure-parser directive. Therefore the interface is as follows.

Method on parser: int yylex (semantic_value_type& yylval, location_type& yylloc, type1 arg1, ...)

Return the next token. Its type is the return value, its semantic value and location being yylval and yylloc. Invocations of `%lex-param {type1 arg1}' yield additional arguments.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2 A Complete C++ Example

This section demonstrates the use of a C++ parser with a simple but complete example. This example should be available on your system, ready to compile, in the directory ../bison/examples/calc++. It focuses on the use of Bison, therefore the design of the various C++ classes is very naive: no accessors, no encapsulation of members etc. We will use a Lex scanner, and more precisely, a Flex scanner, to demonstrate the various interaction. A hand written scanner is actually easier to interface with.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2.1 Calc++ -- C++ Calculator

Of course the grammar is dedicated to arithmetics, a single expression, possibly preceded by variable assignments. An environment containing possibly predefined variables such as one and two, is exchanged with the parser. An example of valid input follows.

 
three := 3
seven := one + two * three
seven * seven

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2.2 Calc++ Parsing Driver

To support a pure interface with the parser (and the scanner) the technique of the "parsing context" is convenient: a structure containing all the data to exchange. Since, in addition to simply launch the parsing, there are several auxiliary tasks to execute (open the file for parsing, instantiate the parser etc.), we recommend transforming the simple parsing context structure into a fully blown parsing driver class.

The declaration of this driver class, `calc++-driver.hh', is as follows. The first part includes the CPP guard and imports the required standard library components, and the declaration of the parser class.

 
#ifndef CALCXX_DRIVER_HH
# define CALCXX_DRIVER_HH
# include <string>
# include <map>
# include "calc++-parser.hh"

Then comes the declaration of the scanning function. Flex expects the signature of yylex to be defined in the macro YY_DECL, and the C++ parser expects it to be declared. We can factor both as follows.

 
// Announce to Flex the prototype we want for lexing function, ...
# define YY_DECL					\
  yy::calcxx_parser::token_type                         \
  yylex (yy::calcxx_parser::semantic_type* yylval,      \
         yy::calcxx_parser::location_type* yylloc,      \
         calcxx_driver& driver)
// ... and declare it for the parser's sake.
YY_DECL;

The calcxx_driver class is then declared with its most obvious members.

 
// Conducting the whole scanning and parsing of Calc++.
class calcxx_driver
{
public:
  calcxx_driver ();
  virtual ~calcxx_driver ();

  std::map<std::string, int> variables;

  int result;

To encapsulate the coordination with the Flex scanner, it is useful to have two members function to open and close the scanning phase. members.

 
  // Handling the scanner.
  void scan_begin ();
  void scan_end ();
  bool trace_scanning;

Similarly for the parser itself.

 
  // Handling the parser.
  void parse (const std::string& f);
  std::string file;
  bool trace_parsing;

To demonstrate pure handling of parse errors, instead of simply dumping them on the standard error output, we will pass them to the compiler driver using the following two member functions. Finally, we close the class declaration and CPP guard.

 
  // Error handling.
  void error (const yy::location& l, const std::string& m);
  void error (const std::string& m);
};
#endif // ! CALCXX_DRIVER_HH

The implementation of the driver is straightforward. The parse member function deserves some attention. The error functions are simple stubs, they should actually register the located error messages and set error state.

 
#include "calc++-driver.hh"
#include "calc++-parser.hh"

calcxx_driver::calcxx_driver ()
  : trace_scanning (false), trace_parsing (false)
{
  variables["one"] = 1;
  variables["two"] = 2;
}

calcxx_driver::~calcxx_driver ()
{
}

void
calcxx_driver::parse (const std::string &f)
{
  file = f;
  scan_begin ();
  yy::calcxx_parser parser (*this);
  parser.set_debug_level (trace_parsing);
  parser.parse ();
  scan_end ();
}

void
calcxx_driver::error (const yy::location& l, const std::string& m)
{
  std::cerr << l << ": " << m << std::endl;
}

void
calcxx_driver::error (const std::string& m)
{
  std::cerr << m << std::endl;
}

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2.3 Calc++ Parser

The parser definition file `calc++-parser.yy' starts by asking for the C++ LALR(1) skeleton, the creation of the parser header file, and specifies the name of the parser class. Because the C++ skeleton changed several times, it is safer to require the version you designed the grammar for.

 
%skeleton "lalr1.cc"                          /*  -*- C++ -*- */
%require "2.1a"
%defines
%define "parser_class_name" "calcxx_parser"

Then come the declarations/inclusions needed to define the %union. Because the parser uses the parsing driver and reciprocally, both cannot include the header of the other. Because the driver's header needs detailed knowledge about the parser class (in particular its inner types), it is the parser's header which will simply use a forward declaration of the driver.

 
%{
# include <string>
class calcxx_driver;
%}

The driver is passed by reference to the parser and to the scanner. This provides a simple but effective pure interface, not relying on global variables.

 
// The parsing context.
%parse-param { calcxx_driver& driver }
%lex-param   { calcxx_driver& driver }

Then we request the location tracking feature, and initialize the first location's file name. Afterwards new locations are computed relatively to the previous locations: the file name will be automatically propagated.

 
%locations
%initial-action
{
  // Initialize the initial location.
  @$.begin.filename = @$.end.filename = &driver.file;
};

Use the two following directives to enable parser tracing and verbose error messages.

 
%debug
%error-verbose

Semantic values cannot use "real" objects, but only pointers to them.

 
// Symbols.
%union
{
  int          ival;
  std::string *sval;
};

The code between `%{' and `%}' after the introduction of the `%union' is output in the `*.cc' file; it needs detailed knowledge about the driver.

 
%{
# include "calc++-driver.hh"
%}

The token numbered as 0 corresponds to end of file; the following line allows for nicer error messages referring to "end of file" instead of "$end". Similarly user friendly named are provided for each symbol. Note that the tokens names are prefixed by TOKEN_ to avoid name clashes.

 
%token        END      0 "end of file"
%token        ASSIGN     ":="
%token <sval> IDENTIFIER "identifier"
%token <ival> NUMBER     "number"
%type  <ival> exp        "expression"

To enable memory deallocation during error recovery, use %destructor.

 
%printer    { debug_stream () << *$$; } "identifier"
%destructor { delete $$; } "identifier"

%printer    { debug_stream () << $$; } "number" "expression"

The grammar itself is straightforward.

 
%%
%start unit;
unit: assignments exp  { driver.result = $2; };

assignments: assignments assignment {}
           | /* Nothing.  */        {};

assignment: "identifier" ":=" exp { driver.variables[*$1] = $3; };

%left '+' '-';
%left '*' '/';
exp: exp '+' exp   { $$ = $1 + $3; }
   | exp '-' exp   { $$ = $1 - $3; }
   | exp '*' exp   { $$ = $1 * $3; }
   | exp '/' exp   { $$ = $1 / $3; }
   | "identifier"  { $$ = driver.variables[*$1]; }
   | "number"      { $$ = $1; };
%%

Finally the error member function registers the errors to the driver.

 
void
yy::calcxx_parser::error (const yy::calcxx_parser::location_type& l,
                          const std::string& m)
{
  driver.error (l, m);
}

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2.4 Calc++ Scanner

The Flex scanner first includes the driver declaration, then the parser's to get the set of defined tokens.

 
%{                                            /* -*- C++ -*- */
# include <cstdlib>
# include <errno.h>
# include <limits.h>
# include <string>
# include "calc++-driver.hh"
# include "calc++-parser.hh"

/* Work around an incompatibility in flex (at least versions
   2.5.31 through 2.5.33): it generates code that does
   not conform to C89.  See Debian bug 333231
   <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=333231>.  */
# undef yywrap
# define yywrap() 1

/* By default yylex returns int, we use token_type.
   Unfortunately yyterminate by default returns 0, which is
   not of token_type.  */
#define yyterminate() return token::END
%}

Because there is no #include-like feature we don't need yywrap, we don't need unput either, and we parse an actual file, this is not an interactive session with the user. Finally we enable the scanner tracing features.

 
%option noyywrap nounput batch debug

Abbreviations allow for more readable rules.

 
id    [a-zA-Z][a-zA-Z_0-9]*
int   [0-9]+
blank [ \t]

The following paragraph suffices to track locations accurately. Each time yylex is invoked, the begin position is moved onto the end position. Then when a pattern is matched, the end position is advanced of its width. In case it matched ends of lines, the end cursor is adjusted, and each time blanks are matched, the begin cursor is moved onto the end cursor to effectively ignore the blanks preceding tokens. Comments would be treated equally.

 
%{
# define YY_USER_ACTION  yylloc->columns (yyleng);
%}
%%
%{
  yylloc->step ();
%}
{blank}+   yylloc->step ();
[\n]+      yylloc->lines (yyleng); yylloc->step ();

The rules are simple, just note the use of the driver to report errors. It is convenient to use a typedef to shorten yy::calcxx_parser::token::identifier into token::identifier for instance.

 
%{
  typedef yy::calcxx_parser::token token;
%}
           /* Convert ints to the actual type of tokens.  */
[-+*/]     return yy::calcxx_parser::token_type (yytext[0]);
":="       return token::ASSIGN;
{int}      {
  errno = 0;
  long n = strtol (yytext, NULL, 10);
  if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE))
    driver.error (*yylloc, "integer is out of range");
  yylval->ival = n;
  return token::NUMBER;
}
{id}       yylval->sval = new std::string (yytext); return token::IDENTIFIER;
.          driver.error (*yylloc, "invalid character");
%%

Finally, because the scanner related driver's member function depend on the scanner's data, it is simpler to implement them in this file.

 
void
calcxx_driver::scan_begin ()
{
  yy_flex_debug = trace_scanning;
  if (!(yyin = fopen (file.c_str (), "r")))
    error (std::string ("cannot open ") + file);
}

void
calcxx_driver::scan_end ()
{
  fclose (yyin);
}

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2.5 Calc++ Top Level

The top level file, `calc++.cc', poses no problem.

 
#include <iostream>
#include "calc++-driver.hh"

int
main (int argc, char *argv[])
{
  calcxx_driver driver;
  for (++argv; argv[0]; ++argv)
    if (*argv == std::string ("-p"))
      driver.trace_parsing = true;
    else if (*argv == std::string ("-s"))
      driver.trace_scanning = true;
    else
      {
	driver.parse (*argv);
	std::cout << driver.result << std::endl;
      }
}

[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated using texi2html 1.76.