[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8. Debugging Your Parser

Developing a parser can be a challenge, especially if you don't understand the algorithm (see section The Bison Parser Algorithm). Even so, sometimes a detailed description of the automaton can help (see section Understanding Your Parser), or tracing the execution of the parser can give some insight on why it behaves improperly (see section Tracing Your Parser).


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.1 Understanding Your Parser

As documented elsewhere (see section The Bison Parser Algorithm) Bison parsers are shift/reduce automata. In some cases (much more frequent than one would hope), looking at this automaton is required to tune or simply fix a parser. Bison provides two different representation of it, either textually or graphically (as a DOT file).

The textual file is generated when the options `--report' or `--verbose' are specified, see See section Invoking Bison. Its name is made by removing `.tab.c' or `.c' from the parser output file name, and adding `.output' instead. Therefore, if the input file is `foo.y', then the parser file is called `foo.tab.c' by default. As a consequence, the verbose output file is called `foo.output'.

The following grammar file, `calc.y', will be used in the sequel:

 
%token NUM STR
%left '+' '-'
%left '*'
%%
exp: exp '+' exp
   | exp '-' exp
   | exp '*' exp
   | exp '/' exp
   | NUM
   ;
useless: STR;
%%

bison reports:

 
calc.y: warning: 1 nonterminal and 1 rule useless in grammar
calc.y:11.1-7: warning: nonterminal useless in grammar: useless
calc.y:11.10-12: warning: rule useless in grammar: useless: STR
calc.y: conflicts: 7 shift/reduce

When given `--report=state', in addition to `calc.tab.c', it creates a file `calc.output' with contents detailed below. The order of the output and the exact presentation might vary, but the interpretation is the same.

The first section includes details on conflicts that were solved thanks to precedence and/or associativity:

 
Conflict in state 8 between rule 2 and token '+' resolved as reduce.
Conflict in state 8 between rule 2 and token '-' resolved as reduce.
Conflict in state 8 between rule 2 and token '*' resolved as shift.
…

The next section lists states that still have conflicts.

 
State 8 conflicts: 1 shift/reduce
State 9 conflicts: 1 shift/reduce
State 10 conflicts: 1 shift/reduce
State 11 conflicts: 4 shift/reduce

The next section reports useless tokens, nonterminal and rules. Useless nonterminals and rules are removed in order to produce a smaller parser, but useless tokens are preserved, since they might be used by the scanner (note the difference between "useless" and "unused" below):

 
Nonterminals useless in grammar:
   useless

Terminals unused in grammar:
   STR

Rules useless in grammar:
#6     useless: STR;

The next section reproduces the exact grammar that Bison used:

 
Grammar

  Number, Line, Rule
    0   5 $accept -> exp $end
    1   5 exp -> exp '+' exp
    2   6 exp -> exp '-' exp
    3   7 exp -> exp '*' exp
    4   8 exp -> exp '/' exp
    5   9 exp -> NUM

and reports the uses of the symbols:

 
Terminals, with rules where they appear

$end (0) 0
'*' (42) 3
'+' (43) 1
'-' (45) 2
'/' (47) 4
error (256)
NUM (258) 5

Nonterminals, with rules where they appear

$accept (8)
    on left: 0
exp (9)
    on left: 1 2 3 4 5, on right: 0 1 2 3 4

Bison then proceeds onto the automaton itself, describing each state with it set of items, also known as pointed rules. Each item is a production rule together with a point (marked by `.') that the input cursor.

 
state 0

    $accept  ->  . exp $   (rule 0)

    NUM         shift, and go to state 1

    exp         go to state 2

This reads as follows: "state 0 corresponds to being at the very beginning of the parsing, in the initial rule, right before the start symbol (here, exp). When the parser returns to this state right after having reduced a rule that produced an exp, the control flow jumps to state 2. If there is no such transition on a nonterminal symbol, and the lookahead is a NUM, then this token is shifted on the parse stack, and the control flow jumps to state 1. Any other lookahead triggers a syntax error."

Even though the only active rule in state 0 seems to be rule 0, the report lists NUM as a lookahead token because NUM can be at the beginning of any rule deriving an exp. By default Bison reports the so-called core or kernel of the item set, but if you want to see more detail you can invoke bison with `--report=itemset' to list all the items, include those that can be derived:

 
state 0

    $accept  ->  . exp $   (rule 0)
    exp  ->  . exp '+' exp   (rule 1)
    exp  ->  . exp '-' exp   (rule 2)
    exp  ->  . exp '*' exp   (rule 3)
    exp  ->  . exp '/' exp   (rule 4)
    exp  ->  . NUM   (rule 5)

    NUM         shift, and go to state 1

    exp         go to state 2

In the state 1...

 
state 1

    exp  ->  NUM .   (rule 5)

    $default    reduce using rule 5 (exp)

the rule 5, `exp: NUM;', is completed. Whatever the lookahead token (`$default'), the parser will reduce it. If it was coming from state 0, then, after this reduction it will return to state 0, and will jump to state 2 (`exp: go to state 2').

 
state 2

    $accept  ->  exp . $   (rule 0)
    exp  ->  exp . '+' exp   (rule 1)
    exp  ->  exp . '-' exp   (rule 2)
    exp  ->  exp . '*' exp   (rule 3)
    exp  ->  exp . '/' exp   (rule 4)

    $           shift, and go to state 3
    '+'         shift, and go to state 4
    '-'         shift, and go to state 5
    '*'         shift, and go to state 6
    '/'         shift, and go to state 7

In state 2, the automaton can only shift a symbol. For instance, because of the item `exp -> exp . '+' exp', if the lookahead if `+', it will be shifted on the parse stack, and the automaton control will jump to state 4, corresponding to the item `exp -> exp '+' . exp'. Since there is no default action, any other token than those listed above will trigger a syntax error.

The state 3 is named the final state, or the accepting state:

 
state 3

    $accept  ->  exp $ .   (rule 0)

    $default    accept

the initial rule is completed (the start symbol and the end of input were read), the parsing exits successfully.

The interpretation of states 4 to 7 is straightforward, and is left to the reader.

 
state 4

    exp  ->  exp '+' . exp   (rule 1)

    NUM         shift, and go to state 1

    exp         go to state 8

state 5

    exp  ->  exp '-' . exp   (rule 2)

    NUM         shift, and go to state 1

    exp         go to state 9

state 6

    exp  ->  exp '*' . exp   (rule 3)

    NUM         shift, and go to state 1

    exp         go to state 10

state 7

    exp  ->  exp '/' . exp   (rule 4)

    NUM         shift, and go to state 1

    exp         go to state 11

As was announced in beginning of the report, `State 8 conflicts: 1 shift/reduce':

 
state 8

    exp  ->  exp . '+' exp   (rule 1)
    exp  ->  exp '+' exp .   (rule 1)
    exp  ->  exp . '-' exp   (rule 2)
    exp  ->  exp . '*' exp   (rule 3)
    exp  ->  exp . '/' exp   (rule 4)

    '*'         shift, and go to state 6
    '/'         shift, and go to state 7

    '/'         [reduce using rule 1 (exp)]
    $default    reduce using rule 1 (exp)

Indeed, there are two actions associated to the lookahead `/': either shifting (and going to state 7), or reducing rule 1. The conflict means that either the grammar is ambiguous, or the parser lacks information to make the right decision. Indeed the grammar is ambiguous, as, since we did not specify the precedence of `/', the sentence `NUM + NUM / NUM' can be parsed as `NUM + (NUM / NUM)', which corresponds to shifting `/', or as `(NUM + NUM) / NUM', which corresponds to reducing rule 1.

Because in LALR(1) parsing a single decision can be made, Bison arbitrarily chose to disable the reduction, see Shift/Reduce Conflicts. Discarded actions are reported in between square brackets.

Note that all the previous states had a single possible action: either shifting the next token and going to the corresponding state, or reducing a single rule. In the other cases, i.e., when shifting and reducing is possible or when several reductions are possible, the lookahead is required to select the action. State 8 is one such state: if the lookahead is `*' or `/' then the action is shifting, otherwise the action is reducing rule 1. In other words, the first two items, corresponding to rule 1, are not eligible when the lookahead token is `*', since we specified that `*' has higher precedence than `+'. More generally, some items are eligible only with some set of possible lookahead tokens. When run with `--report=lookahead', Bison specifies these lookahead tokens:

 
state 8

    exp  ->  exp . '+' exp   (rule 1)
    exp  ->  exp '+' exp .  [$, '+', '-', '/']   (rule 1)
    exp  ->  exp . '-' exp   (rule 2)
    exp  ->  exp . '*' exp   (rule 3)
    exp  ->  exp . '/' exp   (rule 4)

    '*'         shift, and go to state 6
    '/'         shift, and go to state 7

    '/'         [reduce using rule 1 (exp)]
    $default    reduce using rule 1 (exp)

The remaining states are similar:

 
state 9

    exp  ->  exp . '+' exp   (rule 1)
    exp  ->  exp . '-' exp   (rule 2)
    exp  ->  exp '-' exp .   (rule 2)
    exp  ->  exp . '*' exp   (rule 3)
    exp  ->  exp . '/' exp   (rule 4)

    '*'         shift, and go to state 6
    '/'         shift, and go to state 7

    '/'         [reduce using rule 2 (exp)]
    $default    reduce using rule 2 (exp)

state 10

    exp  ->  exp . '+' exp   (rule 1)
    exp  ->  exp . '-' exp   (rule 2)
    exp  ->  exp . '*' exp   (rule 3)
    exp  ->  exp '*' exp .   (rule 3)
    exp  ->  exp . '/' exp   (rule 4)

    '/'         shift, and go to state 7

    '/'         [reduce using rule 3 (exp)]
    $default    reduce using rule 3 (exp)

state 11

    exp  ->  exp . '+' exp   (rule 1)
    exp  ->  exp . '-' exp   (rule 2)
    exp  ->  exp . '*' exp   (rule 3)
    exp  ->  exp . '/' exp   (rule 4)
    exp  ->  exp '/' exp .   (rule 4)

    '+'         shift, and go to state 4
    '-'         shift, and go to state 5
    '*'         shift, and go to state 6
    '/'         shift, and go to state 7

    '+'         [reduce using rule 4 (exp)]
    '-'         [reduce using rule 4 (exp)]
    '*'         [reduce using rule 4 (exp)]
    '/'         [reduce using rule 4 (exp)]
    $default    reduce using rule 4 (exp)

Observe that state 11 contains conflicts not only due to the lack of precedence of `/' with respect to `+', `-', and `*', but also because the associativity of `/' is not specified.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.2 Tracing Your Parser

If a Bison grammar compiles properly but doesn't do what you want when it runs, the yydebug parser-trace feature can help you figure out why.

There are several means to enable compilation of trace facilities:

the macro YYDEBUG

Define the macro YYDEBUG to a nonzero value when you compile the parser. This is compliant with POSIX Yacc. You could use `-DYYDEBUG=1' as a compiler option or you could put `#define YYDEBUG 1' in the prologue of the grammar file (see section The Prologue).

the option `-t', `--debug'

Use the `-t' option when you run Bison (see section Invoking Bison). This is POSIX compliant too.

the directive `%debug'

Add the %debug directive (see section Bison Declaration Summary). This is a Bison extension, which will prove useful when Bison will output parsers for languages that don't use a preprocessor. Unless POSIX and Yacc portability matter to you, this is the preferred solution.

We suggest that you always enable the debug option so that debugging is always possible.

The trace facility outputs messages with macro calls of the form YYFPRINTF (stderr, format, args) where format and args are the usual printf format and variadic arguments. If you define YYDEBUG to a nonzero value but do not define YYFPRINTF, <stdio.h> is automatically included and YYFPRINTF is defined to fprintf.

Once you have compiled the program with trace facilities, the way to request a trace is to store a nonzero value in the variable yydebug. You can do this by making the C code do it (in main, perhaps), or you can alter the value with a C debugger.

Each step taken by the parser when yydebug is nonzero produces a line or two of trace information, written on stderr. The trace messages tell you these things:

To make sense of this information, it helps to refer to the listing file produced by the Bison `-v' option (see section Invoking Bison). This file shows the meaning of each state in terms of positions in various rules, and also what each state will do with each possible input token. As you read the successive trace messages, you can see that the parser is functioning according to its specification in the listing file. Eventually you will arrive at the place where something undesirable happens, and you will see which parts of the grammar are to blame.

The parser file is a C program and you can use C debuggers on it, but it's not easy to interpret what it is doing. The parser function is a finite-state machine interpreter, and aside from the actions it executes the same code over and over. Only the values of variables show where in the grammar it is working.

The debugging information normally gives the token type of each token read, but not its semantic value. You can optionally define a macro named YYPRINT to provide a way to print the value. If you define YYPRINT, it should take three arguments. The parser will pass a standard I/O stream, the numeric code for the token type, and the token value (from yylval).

Here is an example of YYPRINT suitable for the multi-function calculator (see section Declarations for mfcalc):

 
%{
  static void print_token_value (FILE *, int, YYSTYPE);
  #define YYPRINT(file, type, value) print_token_value (file, type, value)
%}

… %% … %% …

static void
print_token_value (FILE *file, int type, YYSTYPE value)
{
  if (type == VAR)
    fprintf (file, "%s", value.tptr->name);
  else if (type == NUM)
    fprintf (file, "%d", value.val);
}

[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated on February, 15 2009 using texi2html 1.76.