NAME
yacc —
an LALR(1) parser generator
SYNOPSIS
yacc |
[-BdgilLPrtvVy] [-b
file_prefix] [-o
output_file] [-p
symbol_prefix] filename |
DESCRIPTION
yacc reads the grammar specification in
the file filename and generates an LALR(1) parser for
it. The parsers consist of a set of LALR(1) parsing tables and a driver
routine written in the C programming language. yacc
normally writes the parse tables and the driver routine to the file
y.tab.c.
The following options are available:
-bfile_prefix- The
-boption changes the prefix prepended to the output file names to the string denoted by file_prefix. The default prefix is the character y. -B- Create a backtracking parser (compile-type configuration for
yacc). -d- The
-doption causes the header file y.tab.h to be written. It contains #define's for the token identifiers. -g- The
-goption causes a graphical description of the generated LALR(1) parser to be written to the file y.dot in graphviz format, ready to be processed by dot(1). -i- The
-ioption causes a supplementary header file y.tab.i to be written. It contains extern declarations and supplementary #define's as needed to map the conventionalyaccyy-prefixed names to whatever the-poption may specify. The code file, e.g., y.tab.c is modified to #include this file as well as the y.tab.h file, enforcing consistent usage of the symbols defined in those files. The supplementary header file makes it simpler to separate compilation of lex- and yacc-files. -l- If the
-loption is not specified,yaccwill insert #line directives in the generated code. The #line directives let the C compiler relate errors in the generated code to the user's original code. If the-loption is specified,yaccwill not insert the #line directives. #line directives specified by the user will be retained. -L- Enable position processing, e.g., “%locations” (compile-type
configuration for
yacc). -ooutput_file- Specify the filename for the parser file. If this option is not given, the
output filename is the file prefix concatenated with the file suffix, e.g.
y.tab.c. This overrides the
-boption. -P- The
-Poptions instructsyaccto create a reentrant parser, like “%pure-parser” does. -psymbol_prefix- The
-poption changes the prefix prepended to yacc-generated symbols to the string denoted by symbol_prefix. The default prefix is the string yy. -r- The
-roption causesyaccto produce separate files for code and tables. The code file is named y.code.c, and the tables file is named y.tab.c. The prefix “y”. can be overridden using the-boption. -s- Suppress “#define” statements generated for string literals
in a “%token” statement, to more closely match original
yaccbehavior.Normally when
yaccsees a line such as “%token OP_ADD ADD” it notices that the quoted “ADD” is a valid C identifier, and generates a #define not only forOP_ADD, but forADDas well, e.g.,The original#define OP_ADD 257 #define ADD 258
yaccdoes not generate the second “#define”. The-soption suppresses this “#define”.IEEE Std 1003.1 (“POSIX.1”) documents only names and numbers for “%token”, though the original
yaccand bison(1) also accept string literals. -t- The
-toption changes the preprocessor directives generated byyaccso that debugging statements will be incorporated in the compiled code. -V- The
-Voption prints the version number to the standard output. -v- The
-voption causes a human-readable description of the generated parser to be written to the file y.output. -yyaccignores this option, which bison(1) supports for ostensible POSIX compatibility.
EXTENSIONS
yacc provides some extensions for
compatibility with
bison(1) and other implementations of yacc. The
“%destructor” and “%locations” features are
available only if yacc has been configured and
compiled to support the back-tracking functionality. The remaining features
are always available:
%destructor { code
} symbol+
Defines code that is invoked when a symbol is automatically discarded during error recovery. This code can be used to reclaim dynamically allocated memory associated with the corresponding semantic value for cases where user actions cannot manage the memory explicitly.
On encountering a parse error, the generated parser discards symbols on the stack and input tokens until it reaches a state that will allow parsing to continue. This error recovery approach results in a memory leak if the “YYSTYPE” value is, or contains, pointers to dynamically allocated memory.
The bracketed code is invoked whenever the
parser discards one of the symbols. Within code,
“$$” or “$<tag>$” designates the semantic
value associated with the discarded symbol, and “@$”
designates its location (see “%locations” directive).
A per-symbol destructor is defined by listing a grammar symbol in
symbol+. A per-type destructor is defined by listing
a semantic type tag (e.g., “<some_tag>”) in
symbol+; in this case, the parser will invoke
code whenever it discards any grammar symbol that
has that semantic type tag, unless that symbol has its own per-symbol
destructor.
Two categories of default destructor are supported that are invoked when discarding any grammar symbol that has no per-symbol and no per-type destructor:
The code for “<*>” is used for grammar symbols that have an explicitly declared semantic type tag (via “%type”);
the code for “<>” is used for grammar symbols that have no declared semantic type tag.
%expectnumber- Tell
yaccthe expected number of shift/reduce conflicts. That makes it only report the number if it differs. %expect-rrnumber- Tell
yaccthe expected number of reduce/reduce conflicts. That makes it only report the number if it differs. This is (unlike bison(1)) allowable in LALR(1) parsers. %locations- Tell
yaccto enable management of position information associated with each token, provided by the lexer in the global variableyylloc, similar to management of semantic value information provided inyylval.As for semantic values, locations can be referenced within actions using
@$to refer to the location of the left hand side symbol, and@N(Nan integer) to refer to the location of one of the right hand side symbols. Also as for semantic values, when a rule is matched, a default action is used the compute the location represented by@$as the beginning of the first symbol and the end of the last symbol in the right hand side of the rule. This default computation can be overridden by explicit assignment to@$in a rule action.The type of
yyllocisYYLTYPE, which is defined by default as:typedef struct YYLTYPE { int first_line; int first_column; int last_line; int last_column; } YYLTYPE;YYLTYPEcan be redefined by the user (YYLTYPE_IS_DEFINEDmust be defined, to inhibit the default) in the declarations section of the specification file. As in bison(1), the macroYYLLOC_DEFAULTis invoked each time a rule is matched to calculate a position for the left hand side of the rule, before the associated action is executed; this macro can be redefined by the user.This directive adds a
YYLTYPEparameter toyyerror(). If the “%pure-parser” directive is present, aYYLTYPEparameter is added toyylex() calls. %lex-param{ argument-declaration }- By default, the lexer accepts no parameters, e.g.,
yylex(). Use this directive to add parameter declarations for your customized lexer. %parse-param{ argument-declaration }- By default, the parser accepts no parameters, e.g.,
yyparse(). Use this directive to add parameter declarations for your customized parser. %pure-parser- Most variables (other than yydebug and
yynerrs) are allocated on the stack within
yyparse(), making the parser reasonably reentrant. %token-table- Make the parser's names for tokens available in the
yytnamearray. However,yaccyacc does not predefine “$end”, “$error” or “$undefined” in this array.
PORTABILITY
According to Robert Corbett:
The rationale in http://pubs.opengroup.org/onlinepubs/9699919799/utilities/yacc.html documents some features of AT&T yacc which are no longer required for POSIX compliance.
That said, you may be interested in reusing grammar files with
some other implementation which is not strictly compatible with AT&T
yacc. For instance, there is
bison(1). Here are a few differences: yacc
accepts an equals mark preceding the left curly brace of an action (as in
the original grammar file ftp.y):
| STAT CRLF
= {
statcmd();
}
yacc and
bison(1) emit code in different order, and in particular
bison(1) makes forward reference to common functions such as
yylex(),
yyparse() and yyerror()
without providing prototypes.
bison(1) support for “%expect” is broken in more than one release. For best results using bison(1), delete that directive.
bison(1) no equivalent for some of 's
command-line options, relying on directives embedded in the grammar
file.
bison(1) -y option does not affect
bison's lack of support for features of AT&T yacc which were deemed
obsolescent.
yacc accepts multiple parameters with
“%lex-param” and “%parse-param” in two forms
{type1 name1} {type2 name2} ...
{type1 name1, type2 name2 ...}
bison(1) accepts the latter (though undocumented), but depending on the release may generate bad code.
Like
bison(1), yacc will add parameters specified
via “%parse-param” to
yyparse(),
yyerror() and (if configured for back-tracking) to
the destructor declared using “%destructor”.
bison(1) puts the additional parameters
first for
yyparse()
and yyerror() but last for
destructors. yacc matches this behavior.
ENVIRONMENT
The following environment variable is referenced by
yacc:
TMPDIR- If the environment variable
TMPDIRis set, the string denoted byTMPDIRwill be used as the name of the directory where the temporary files are created.
TABLES
The names of the tables generated by this version of
yacc are “yylhs”,
“yylen”, “yydefred”, “yydgoto”,
“yysindex”, “yyrindex”,
“yygindex”, “yytable”, and
“yycheck”. Two additional tables, “yyname” and
“yyrule”, are created if YYDEBUG is
defined and non-zero.
FILES
- y.code.c
- y.tab.c
- y.tab.h
- y.output
- /tmp/yacc.aXXXXXX
- /tmp/yacc.tXXXXXX
- /tmp/yacc.uXXXXXX
DIAGNOSTICS
If there are rules that are never reduced, the number of such rules is written to the standard error. If there are any LALR(1) conflicts, the number of conflicts is also written to the standard error.
STANDARDS
The yacc utility conforms to
IEEE Std 1003.2 (“POSIX.2”).