| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The Bison parser is actually a C function named yyparse. Here we
describe the interface conventions of yyparse and the other
functions that it needs to use.
Keep in mind that the parser uses many C identifiers starting with `yy' and `YY' for internal purposes. If you use such an identifier (aside from those in this manual) in an action or in epilogue in the grammar file, you are likely to run into trouble.
4.1 The Parser Function yyparse | How to call yyparse and what it returns.
| |
4.2 The Lexical Analyzer Function yylex | You must supply a function yylex
which reads tokens.
| |
4.3 The Error Reporting Function yyerror | You must supply a function yyerror.
| |
| 4.4 Special Features for Use in Actions | Special features for use in actions. |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
yyparse You call the function yyparse to cause parsing to occur. This
function reads tokens, executes actions, and ultimately returns when it
encounters end-of-input or an unrecoverable syntax error. You can also
write an action which directs yyparse to return immediately
without reading further.
The value returned by yyparse is 0 if parsing was successful (return
is due to end-of-input).
The value is 1 if parsing failed (return is due to a syntax error).
In an action, you can cause immediate return from yyparse by using
these macros:
Return immediately with value 0 (to report success).
Return immediately with value 1 (to report failure).
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
yylex The lexical analyzer function, yylex, recognizes tokens from
the input stream and returns them to the parser. Bison does not create
this function automatically; you must write it so that yyparse can
call it. The function is sometimes referred to as a lexical scanner.
In simple programs, yylex is often defined at the end of the Bison
grammar file. If yylex is defined in a separate source file, you
need to arrange for the token-type macro definitions to be available there.
To do this, use the `-d' option when you run Bison, so that it will
write these macro definitions into a separate header file
`name.tab.h' which you can include in the other source files
that need it. See section Invoking Bison.
4.2.1 Calling Convention for yylex | How yyparse calls yylex.
| |
| 4.2.2 Semantic Values of Tokens | How yylex must return the semantic value
of the token it has read.
| |
| 4.2.3 Textual Positions of Tokens | How yylex must return the text position
(line number, etc.) of the token, if the
actions want that.
| |
| 4.2.4 Calling Conventions for Pure Parsers | How the calling convention differs in a pure parser (see section A Pure (Reentrant) Parser). |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
yylex The value that yylex returns must be the positive numeric code
for the type of token it has just found; a zero or negative value
signifies end-of-input.
When a token is referred to in the grammar rules by a name, that name
in the parser file becomes a C macro whose definition is the proper
numeric code for that token type. So yylex can use the name
to indicate that type. See section Symbols, Terminal and Nonterminal.
When a token is referred to in the grammar rules by a character literal,
the numeric code for that character is also the code for the token type.
So yylex can simply return that character code, possibly converted
to unsigned char to avoid sign-extension. The null character
must not be used this way, because its code is zero and that
signifies end-of-input.
Here is an example showing these things:
int
yylex (void)
{
…
if (c == EOF) /* Detect end-of-input. */
return 0;
…
if (c == '+' || c == '-')
return c; /* Assume token type for `+' is '+'. */
…
return INT; /* Return the type of the token. */
…
}
|
This interface has been designed so that the output from the lex
utility can be used without change as the definition of yylex.
If the grammar uses literal string tokens, there are two ways that
yylex can determine the token type codes for them:
yylex can use these symbolic names like
all others. In this case, the use of the literal string tokens in
the grammar file has no effect on yylex.
yylex can find the multicharacter token in the yytname
table. The index of the token in the table is the token type's code.
The name of a multicharacter token is recorded in yytname with a
double-quote, the token's characters, and another double-quote. The
token's characters are not escaped in any way; they appear verbatim in
the contents of the string in the table.
Here's code for looking up a token in yytname, assuming that the
characters of the token are stored in token_buffer.
for (i = 0; i < YYNTOKENS; i++)
{
if (yytname[i] != 0
&& yytname[i][0] == '"'
&& ! strncmp (yytname[i] + 1, token_buffer,
strlen (token_buffer))
&& yytname[i][strlen (token_buffer) + 1] == '"'
&& yytname[i][strlen (token_buffer) + 2] == 0)
break;
}
|
The yytname table is generated only if you use the
%token-table declaration. See section Bison Declaration Summary.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In an ordinary (non-reentrant) parser, the semantic value of the token must
be stored into the global variable yylval. When you are using
just one data type for semantic values, yylval has that type.
Thus, if the type is int (the default), you might write this in
yylex:
… yylval = value; /* Put value onto Bison stack. */ return INT; /* Return the type of the token. */ … |
When you are using multiple data types, yylval's type is a union
made from the %union declaration (see section The Collection of Value Types). So when you store a token's value, you
must use the proper member of the union. If the %union
declaration looks like this:
%union {
int intval;
double val;
symrec *tptr;
}
|
then the code in yylex might look like this:
… yylval.intval = value; /* Put value onto Bison stack. */ return INT; /* Return the type of the token. */ … |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
If you are using the `@n'-feature (see section Tracking Locations) in actions to keep track of the
textual locations of tokens and groupings, then you must provide this
information in yylex. The function yyparse expects to
find the textual location of a token just parsed in the global variable
yylloc. So yylex must store the proper data in that
variable.
By default, the value of yylloc is a structure and you need only
initialize the members that are going to be used by the actions. The
four members are called first_line, first_column,
last_line and last_column. Note that the use of this
feature makes the parser noticeably slower.
The data type of yylloc has the name YYLTYPE.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When you use the Bison declaration %pure-parser to request a
pure, reentrant parser, the global communication variables yylval
and yylloc cannot be used. (See section A Pure (Reentrant) Parser.) In such parsers the two global variables are replaced by
pointers passed as arguments to yylex. You must declare them as
shown here, and pass the information back by storing it through those
pointers.
int
yylex (YYSTYPE *lvalp, YYLTYPE *llocp)
{
…
*lvalp = value; /* Put value onto Bison stack. */
return INT; /* Return the type of the token. */
…
}
|
If the grammar file does not use the `@' constructs to refer to
textual positions, then the type YYLTYPE will not be defined. In
this case, omit the second argument; yylex will be called with
only one argument.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
yyerror The Bison parser detects a syntax error or parse error
whenever it reads a token which cannot satisfy any syntax rule. An
action in the grammar can also explicitly proclaim an error, using the
macro YYERROR (see section Special Features for Use in Actions).
The Bison parser expects to report the error by calling an error
reporting function named yyerror, which you must supply. It is
called by yyparse whenever a syntax error is found, and it
receives one argument. For a syntax error, the string is normally
"syntax error".
If you invoke the directive %error-verbose in the Bison
declarations section (see section The Bison Declarations Section), then Bison provides a more verbose and specific error message
string instead of just plain "syntax error".
The parser can detect one other kind of error: stack overflow. This
happens when the input contains constructions that are very deeply
nested. It isn't likely you will encounter this, since the Bison
parser extends its stack automatically up to a very large limit. But
if overflow happens, yyparse calls yyerror in the usual
fashion, except that the argument string is "parser stack
overflow".
The following definition suffices in simple programs:
void
yyerror (char const *s)
{
fprintf (stderr, "%s\n", s);
}
|
After yyerror returns to yyparse, the latter will attempt
error recovery if you have written suitable error recovery grammar rules
(see section Error Recovery). If recovery is impossible, yyparse will
immediately return 1.
Obviously, in location tracking pure parsers, yyerror should have
an access to the current location. This is indeed the case for the GLR
parsers, but not for the Yacc parser, for historical reasons. I.e., if
`%locations %pure-parser' is passed then the prototypes for
yyerror are:
void yyerror (char const *msg); /* Yacc parsers. */ void yyerror (YYLTYPE *locp, char const *msg); /* GLR parsers. */ |
The prototypes are only indications of how the code produced by Bison
uses yyerror. Bison-generated code always ignores the returned
value, so yyerror can return any type, including void.
Also, yyerror can be a variadic function; that is why the
message is always passed last.
Traditionally yyerror returns an int that is always
ignored, but this is purely for historical reasons, and void is
preferable since it more accurately describes the return type for
yyerror.
The variable yynerrs contains the number of syntax errors
encountered so far. Normally this variable is global; but if you
request a pure parser (see section A Pure (Reentrant) Parser)
then it is a local variable which only the actions can access.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Here is a table of Bison constructs, variables and macros that are useful in actions.
Acts like a variable that contains the semantic value for the grouping made by the current rule. See section Actions.
Acts like a variable that contains the semantic value for the nth component of the current rule. See section Actions.
Like $$ but specifies alternative typealt in the union
specified by the %union declaration. See section Data Types of Values in Actions.
Like $n but specifies alternative typealt in the
union specified by the %union declaration.
See section Data Types of Values in Actions.
Return immediately from yyparse, indicating failure.
See section The Parser Function yyparse.
Return immediately from yyparse, indicating success.
See section The Parser Function yyparse.
Unshift a token. This macro is allowed only for rules that reduce a single value, and only when there is no look-ahead token. It is also disallowed in GLR parsers. It installs a look-ahead token with token type token and semantic value value; then it discards the value that was going to be reduced by this rule.
If the macro is used when it is not valid, such as when there is a look-ahead token already, then it reports a syntax error with a message `cannot back up' and performs ordinary error recovery.
In either case, the rest of the action is not executed.
Value stored in yychar when there is no look-ahead token.
Cause an immediate syntax error. This statement initiates error
recovery just as if the parser itself had detected an error; however, it
does not call yyerror, and does not print any message. If you
want to print an error message, call yyerror explicitly before
the `YYERROR;' statement. See section Error Recovery.
This macro stands for an expression that has the value 1 when the parser is recovering from a syntax error, and 0 the rest of the time. See section Error Recovery.
Variable containing the current look-ahead token. (In a pure parser,
this is actually a local variable within yyparse.) When there is
no look-ahead token, the value YYEMPTY is stored in the variable.
See section Look-Ahead Tokens.
Discard the current look-ahead token. This is useful primarily in error rules. See section Error Recovery.
Resume generating error messages immediately for subsequent syntax errors. This is useful primarily in error rules. See section Error Recovery.
Acts like a structure variable containing information on the textual position of the grouping made by the current rule. See section Tracking Locations.
Acts like a structure variable containing information on the textual position of the nth component of the current rule. See section Tracking Locations.
| [ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated
using texi2html 1.76.