（轉）Understanding C parsers generated by GNU Bison

時間 2019-11-19

標籤 understanding parsers generated gnu bison 欄目 C&C++ 简体版

原文原文鏈接

原文連接：https://www.cs.uic.edu/~spopuri/cparser.htmlhtml

Satya Kiran Popuri
Graduate Student
University of Illinois at Chicago
Chicago, IL 60607
spopur2 [at] uic [dot] edu
Wed Sep 13 12:24:25 CDT 2006redis

Introduction

Bison is a general-purpose parser generator that converts an annotated context-free grammar into an LALR(1) or GLR parser for that grammar. The generated parser is implemented as a C or C++ program with a parsing function that can be called from application programs.express

It can be useful to understand the generated parser code in some situations. You may want to hand optimize the parser, or split the big yyparse() function into several smaller functions, or you want to understand how a real world parser is implemented as compared to the `theoretical' ways of compiler design books.app

This document is an attempt to describe the implementation of an LALR(1) parser in C as generated by Bison 2.3. I used a simple grammar to demonstrate the working of the parser and the nature of parsing tables. You will also find a comparison of these tables to the uncompressed tabular scheme given in the popular book "Compilers - Principles, Techniques and Tools" by Aho, Sethi and Ullman, also called as the "Dragon Book" and many other books on compiler design.less

Prerequisites

I am assuming that you are familiar with context free grammars and LR parsing techniques; a little hands on experience with Bison will also help. You must be familiar with the theory and the usual terminology like shifts, reductions, parse stacks and error handling. If you are not familiar with Bison at all, I strongly recommend that you read the Bison user manual before reading this document.dom

The LR parser

An LR parser is essentially a shift-reduce (bottom up) parsing algorithm driven by parsing tables and a stack. For an overview of the LR parsing algorithm, you can refer to the Dragon book or this excellent wikipedia entry.electron

The parsing algorithm is same (at least in theory) for all LR parsers. They differ only in the tables generated by the parser generator. In our case, Bison generates LALR(1) parsing tables in the form of C/C++ arrays. The generated yyparse() function implements a parsing routine that uses these tables and a stack to guide the parse.ide

An example grammar

I will use the following little grammar as an example to demonstrate the internals of a C parser generated by Bison. It will also be used to show the traditional "theoretical" tables and the real world parsing tables and the relationship between them.oop

(1) L → L;E

(2) L → E

(3) E → E,P

(4) E → P

(5) P → a

(6) P → (M)

(7) M → ε

(8) M → L

A Bison grammar specification for this grammar is like this:ui

%%
L : L ';' E
  | E
;
E : E ',' P
  | P
;
P : 'a'
  | (M)
;
M : /* nothing */
  | L
;
%%

Dragon Book style tables

The (uncompressed) tabular scheme described in the popular Dragon book and many other books on compiler theory serves well to gain a theoretical understanding of the working of an LR parser. Let us examine this scheme for the grammar given in the previous section.

The action and GOTO parts of the table are shown below. This LALR(1) table is calculated by hand.

	action						GOTO
state	;	,	a	(	)	$	L	E	P	M
0			s12	s6			1	8	9
1	s2					acc
2			s12	s6				3	9
3	r1	s4			r1	r1
4			s12	s6					5
5	r3	r3			r3	r3
6			s12	s6	r7		7	8	9	10
7	s2				r8
8	r2	s4			r2	r2
9	r4	r4			r4	r4
10					s11
11	r6	r6				r6
12	r5	r5			r5	r5

The resulting automaton has 13 states. A string belongs to the language generated by this grammar if we encounter the end of string while in state 1. The working of an LR parsing algorithm based on this type of table is fairly simple and is clearly explained in the wikipedia entry and in standard compiler theory books;

We can see that most of the table is either empty or filled with repeated entries. The empty slots in the action part of the table are errors; The GOTO part of the table has only 4 valid entries with the rest of the cells empty. These empty slots are not errors.. they are simply unreachable; It is impossible to detect an error on a non-terminal symbol because of the correctness of the LR algorithm.

A real world useful grammar would end up with hundreds of states and thousands of empty slots in such a table. There is a lot of scope for optimization in a practical parser generator. In fact many books including the Dragon book suggest some scheme for compression of parsing tables. Bison takes a different approach to table compression and so far I haven't come across this method in any of the standard text books.

Tables generated by Bison

Instead of generating a big sparse matrix like the one shown above, Bison generates several smaller one dimensional tables as C arrays. Not all tables are used directly by the parsing routine - some are used to print debugging information and error recovery. Before diving into the essential tables, I must describe some unique concepts utilized by Bison that are not used in theoretical table computing methods.

Bison's Augmented Grammar

Bison augments your input grammar with the following rule:

$accept : <start-symbol> $end

$accept and $end are fictitious symbols. This is somewhat similar to augmenting the grammar with a new start symbol as described by most books. In addition to the new start symbol $accept, Bison also has a terminal symbol $end which will help generate a "final" state of the automaton. This final state always has just one production:

$accept : <start-symbol> $end .

with the dot at the end of the augmented rule. This helps the parsing routine recognize when to accept the input string just by looking at the current state. Also it obviates the necessity to represent "accept" as an entry in the LALR tables. There is no notion of "final state" in theoretical LALR parsing (although it exists in an LR(0) characteristic FSM), but Bison uses one state as the final state by introducing the $end symbol.

One important implication of this augmentation is that the "Rule numbering" will now change for Bison generated tables. The $accept rule is considered rule #1, so in the tables produced by Bison, an r6 will be denoted by r7 and so on.

Default Reductions

In states where there is a reduction, the action for many symbols will be the same. For example in the table above, states 5, 8, 9, and 11 have a common reduce action for different terminal symbols. Some substantial space saving is made by making the most common reduction in that state a "default" action. For example state 5 can have the default action as r3. The parser generated by Bison will do a reduction using rule #3 if it finds itself in state 5 regardless of the look-ahead symbol! This might lead to some illegal reductions. But it will not matter because the error will be caught ultimately before another shift can take place.

Modified traditional tables

Taking the above two aspects into consideration, our dragon book style table will now change considerably. There are now 14 states due to the additional $end terminal symbol. And there is no "accept" entry in the table any more. State 8 simply is an accepting state.

The table given below is constructed directly from the .output report file generated by Bison 2.3. You can generate this report file by using the command line option --report=all. This is not the table that will be used for parsing in a Bison generated parser. This table here is only to gain a perspective on how the final tables will look like.

Table constructed from Bison report

Hand calculated table - rearranged

	action						GOTO
state	;	,	a	(	)	$end	L	E	P	M
0			s1	s2			3	4	5
1	r5	r5	r5	r5	r5	r5
2	r7	r7	s1	s2	r7	r7	6	4	5	7
3	s9					s8
4	r2	s10	r2	r2	r2	r2
5	r4	r4	r4	r4	r4	r4
6	s9	r8	r8	r8	r8	r8
7					s11
8*	acc	acc	acc	acc	acc	acc
9			s1	s2				12	5
10			s1	s2	s11				13
11	r6	r6	r6	r6	r6	r6
12	r1	s10	r1	r1	r1	r1
13	r3	r3	r3	r3	r3	r3

	action						GOTO
state	;	,	a	(	)	$	L	E	P	M
0			s12	s6			1	8	9
12	r5	r5			r5	r5
6			s12	s6	r7		7	8	9	10
1	s2					acc
8	r2	s4			r2	r2
9	r4	r4			r4	r4
7	s2				r8
10					s11
-
2			s12	s6				3	9
4			s12	s6					5
11	r6	r6				r6
3	r1	s4			r1	r1
5	r3	r3			r3	r3

The table constructed from the Bison report file is matched with our hand computed table. State 1 generated by Bison corresponds to state 12 in our table and so on. Of course there is no state equivalent to state 8 in the hand computed table because we didn't have the $end symbol and the notion of a final state.

Symbol numbers

Bison assigns a symbol number to each grammar symbol (terminal or non-terminal). Of course Bison has its own symbol table like any other compiler (Bison is a compiler compiler remember). As a rule, Bison always assigns numbers to terminal symbols first and then proceeds to non-terminals. i.e Non-terminals symbols always have higher numbers than terminals. Also, Bison introduces 4 new symbols in the symbol table:

$accept : The new start (non-terminal) symbol of the grammar.
$end : The fictitious end symbol. Symbol number 0 is reserved for this symbol.
error : The 'built-in' error symbol used for error productions. Symbol number 1 is reserved for it.
$undefined : A symbol that represents a token that Bison cannot recognize. (It is not a part of the grammar or not defined using %token). Symbol number 2 is reserved for this symbol.

Numbers for terminals begin from 3 onwards. You can check the symbol numbers assigned to various symbols by looking at the yytname array generated in the output parser. For our example grammar, the symbol numbers look like this:

Symbol	Number
$end	0
error	1
$undefined	2
';'	3
','	4
'a'	5
'('	6
')'	7
$accept	8
L	9
E	10
P	11
M	12

Symbols 0 through 7 are terminals and the rest are non-terminals. These symbol numbers are used to index various tables as we will see in a moment.

Compressing the parsing table

Let us look at how we can optimally represent our 'modified traditional table'. The table is still a sparse matrix with a lot of white space and repeated information. We can start by collecting all repeated data at one place. The default reductions are our prime targets. They can all be listed in an array indexed by state number:

default_reductions[] = {none,r5,r7,none,r2,r4,r8,none,none,none,none,r6,r1,r3}

We did waste some space for states where there are no default reductions, but that is far less than all the locations used up for repeating reduce actions. Some of the states have only reduce actions and nothing else. So, these rows are pretty much "taken care of" by the above array. Observe that we have listed a default reduction for state 2 (r7) even though it had shift actions on a couple of symbols. Hence, this array may be used only after ensuring that there are no shift actions for the current state on the current look-ahead symbol.

The next obvious target would be the GOTO part of the table which is mostly empty. Since most states do not have a GOTO part, the best space saving scheme here would be to have an array indexed by non-terminal (column) instead of state (row). But each non-terminal can have a different GOTO state depending on the current state of the automaton. So, we choose to list out the most popular GOTO state of each non-terminal in an array like this:

default_gotos[] = { 3, 4, 5, 7 }

That's one entry each for non-terminals L, E, P, M. We still have to take care of the other GOTO states for L, E and P. For now this table has saved us a lot of wasted GOTO white space.

The task now has been reduced to representing the shift actions properly for each state and some GOTOs. This is what is left of the action and GOTO tables. I have zapped default reductions and also states that have only default reductions. Also the terminals and non-terminals are now replaced by their symbol numbers and instead of representing shifts as s1, s2 and so on, I have used only the state numbers as there are no reduction entries now:

	action
state	3	4	5	6	7	0
0			1	2
2			1	2
3	9					8
4		10
6	9
7					11
9			1	2
10			1	2	11
12		10

	GOTO
state	9	10	11
2	6
9		12
10			13

To compress this part of the table, Bison follows a method described by Tarjan and Yao [3]. It is a fairly complex method combining the idea of Trie data structures and "double displacement" with some of the authors' own ideas thrown in to improve time and space complexity results.

Double displacement is very straight forward. We flatten out the above 2-D table into a simple one dimensional array by mapping all non-white-space entries into some location of the array. We keep the order of elements in each row intact and displace each row by some amount such that no two non-white-space entries in each row occupy the same position in the one-dimensional array. Our mapping will be defined by a set of displacements D(i) for each row i; position (i,j) in the above table is mapped to position D(i) + j in our one-dimensional array. Here is a sample displacement table and the corresponding one dimensional table:

D = {4,6,0,0,12,9,8,0,13} and
T = {9,10,1,2,11,8,1,2,1,2,1,2,9,11,10}

The displacement table D will act as a directory to one dimensional table T that contains the actual entries from the 2-D table; the offset into T for element (i,j) is calculated by D[i] + j. e.g., to find the entry at (3,1)(action for state 4 on symbol 4) in the 2-D table above, we compute D[3] + 1 = 0+1 = 1 and hence T[1] = 10 is the entry we really want.

Even with the above scheme we have a lot of repeated entries in the table T which are really the same states (e.g, states 1 and 2). So this method is combined with column displacements and "Tries" to obtain a more compressed table organization. If this discussion has whet your appetite for more, you can refer to [3] for more details.

The final objective of Bison is to build a one-dimensional table T that specifies what to do in each state. This table will be indexed by a (one dimensional) directory table D. In the above discussion, displacements into T are indexed by the 2-D array index (i,j). But we would rather want to index into T by state number and symbol number instead (since the rows and columns of the 2D table are headed by states and symbols). Bison indexes D by state number and displacement into T by symbol number. If the next look-ahead symbol has number k, table entry for current state can be retrieved as follows:

Action for state n on symbol y = T[ D[n] + k ]

As an example, the action for state 0 is s1 on symbol number 5 ('a'). So if T[0] = 1 then D[0] = -5; to account for action s2 on symbol number 6, we would have T[ D[0] + 6] = T[1] = 2 and so on;

But there is a special case that we need to take care of. There can be some 'explicit' errors in the action table that cannot be over written by default reductions. These errors are due to any %nonassoc declarations in the grammar. In that case, we would have some reductions in the action part "not taken care of" by the default_reductions[] array.

To represent these reductions in the same table T, Bison generates negated rule numbers in T. The negative sign is just to differentiate the shifts (which are positive and represent state numbers) from the reductions. The explicit errors will be represented by another negative number which is out of range of rule numbers and defined explicitly by a macro (see YYTABLE_NINF below). We do not have this situation in our grammar.

Also D will have a specially defined negative value that will indicate that the current state has only a default reduction (like state 1 for example). The parser would always go for a default reduction if this value is encountered in D.

A similar directory table ND (say) can be built for the non-terminals symbols that have non-default GOTO states. This table will contain one reference entry for each non-terminal symbol. These entries are displacements in T, but indexed by state number.

If ND[P] = 2, and the current state is 10, we refer to T[12] to get the GOTO of state 10 on P. But given that current state S, the parser needs to choose between ND and default_gotos to pick the GOTO state. Here we can make ND[S] to be a value that is out of bounds of table T and hence force the parser to pick the GOTO from default_gotos.

Bison also has a guard table that is checked to see if we are within legal bounds of table T. Read on for the actual descriptions of these tables to connect all the dots.

Table descriptions

Bison generates parsing tables as linear arrays as seen in previous sections. All variable names in the output parser begin with 'yy' to avoid naming conflicts with user programs. What follows is a brief description of each table with some examples referring to the sample grammar.

yytranslate

This table maps lexical token numbers to their symbol numbers. If you have %token declarations in your grammar, Bison assigns token numbers to the different tokens; If you just use character representations like we have done in our grammar above, Bison just maps their ASCII values to the symbol numbers. So, in our case, yytranslate[97] = 5 which is 'a' (see symbol table above). The full listing of yytranslate is given below.

/* YYTRANSLATE[YYLEX] -- Bison symbol number corresponding to YYLEX.  */
static const yytype_uint8 yytranslate[] =
{
       0,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       6,     7,     2,     2,     4,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     3,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     5,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     2,     2,     2,     2,
       2,     2,     2,     2,     2,     2,     1,     2
};

Most entries point to symbol #2 which is $undefined. Only the symbols defined in our grammar have been given valid symbol numbers. Whenever yyparse() requires a token, it calls yylex and then translates the returned token number into its internal symbol number using this table.

yydefact

This table lists default reductions for each state. yydefact[state] = rule number to use for a default reduction in that state. Here is the yydefact table produced for our grammar:

/* YYDEFACT[STATE-NAME] -- Default rule to reduce with in state
   STATE-NUM when YYTABLE doesn't specify something else to do.  Zero
   means the default is an error.  */
static const yytype_uint8 yydefact[] =
{
       0,     6,     8,     0,     3,     5,     9,     0,     1,     0,
       0,     7,     2,     4
};

I will get to YYTABLE (mentioned in the comment) in a moment. Note that a rule number 0 in this table means error. Rule number 0 is not used internally by Bison because of some limitations of the internal representation of the input grammar.

If you take a look back at our modified dragon book style table, it is easy see what yydefact is trying to say. One important observation is in order here - rule numbers are incremented by 1 in this table because of the additional rule ($accept → L $end). So what was really r5 for state 1, has become r6 in yydefact.

yydefgoto

This is a compressed form of the GOTO part of our traditional table. It has as many entries as there are non-terminals in the grammar. Each entry specifies the stateto transition to on each non-terminal. So here goes:

static const yytype_int8 yydefgoto[] =
{
      -1,     3,     4,     5,     7
};

yydefgoto[n^th non-terminal] = most common GOTO state for the n^th non-terminal.

n starts from zero. An index into this array is obtained by subtracting the number of terminal symbols from the symbol number of the non-terminal. For example, the symbol number for E is 10 and number of tokens in the grammar is 8, so:

Thus yydefgoto[E] = yydefgoto[10-8] = state 4.

yydefgoto is consulted whenever the parser reduces stack contents using a rule. Later we will see that yydefgoto is consulted only after checking with another goto table, and that will explain how we manage to go to state 6 from state 2 on L instead of going to state 3 as this table specifies for L.

As a final note observe that the entry for the zeroth non-terminal ($accept) is -1. The stack will never be reduced with the $accept rule.

yyr1 and yyr2

yyr1 specifies the symbol number of the LHS of each rule. Remember that 0 is never used as a rule number, so this table has NRULES + 1 entries, where NRULES is the number of rules in the grammar (Which is 9 in our case). Here is the listing:

/* YYR1[YYN] -- Symbol number of symbol that rule YYN derives.  */
static const yytype_uint8 yyr1[] =
{
       0,     8,     9,     9,    10,    10,    11,    11,    12,    12
};

So, rule #1 has $accept as LHS, and hence yyr1[1] = 8 (see symbol table given previously) and so on. When a reduction takes place, We need to know the LHS symbol of the rule used for reduction to transition to an appropriate state. That is where this table comes into use.

yyr2 specifies the length (number of symbols) of the right hand side of each rule. Here is a listing produced by Bison:

/* YYR2[YYN] -- Number of symbols composing right hand side of rule YYN.  */
static const yytype_uint8 yyr2[] =
{
       0,     2,     3,     1,     3,     1,     1,     3,     0,     1
};

Rule #2 (L → L;E) has 3 symbols on the RHS, and hence yyr2[2] = 3. This table is also used at the time of a reduction. The number of states to be popped off the stack is same as the number of symbols on the right hand side of the reducing rule.

yytable

This table is a mixed bag of state numbers and rule numbers in some pre-calculated order. This is the table T we discussed in the previous section. yytable works closely with yycheck, yypact and yypgoto tables (described below) to indicate to the parser either the state to pushed next on to the parse stack or a rule to use for reduction. I will just dump the table listing here. The entries won't make sense until we look at yypact and yypgoto.

/* YYTABLE[YYPACT[STATE-NUM]].  What to do in state STATE-NUM.  If
   positive, shift that token.  If negative, reduce the rule which
   number is the opposite.  If zero, do what YYDEFACT says.
   If YYTABLE_NINF, syntax error.  */
#define YYTABLE_NINF -1
static const yytype_uint8 yytable[] =
{
       8,     1,     2,     9,    11,    10,     9,     6,    12,     0,
       0,     0,    13
};

One thing worth noting here is the definition of YYTABLE_NINF - the "negative infinity" value for yytable is the highest negative entry which in our case is -1 (since there are no negative values in yytable). This value is used to determine explicit error situations.

yypgoto

This table gives a reference to GOTO entries for non-terminals that can transition the automaton to different states based on previous state. For example, the symbol L can take the automaton to state 3 if the present state is 0, but the GOTO on L for state 2 is state 6. Similarly for symbols E and P there are different GOTO states based on the current state. The most common GOTO is already defined in YYDEFGOTO. The job of YYPGOTO is to indicate the anomalies. Lets take a look:

/* YYPGOTO[NTERM-NUM].  */
static const yytype_int8 yypgoto[] =
{
      -5,     5,    -1,     2,    -5
};

That's our ND table that we discussed in the compressing parsing tables section. It is indexed by non-terminal symbol number; One entry each for $accept, L, E, P, M.

Let us say the current state on top of the stack after reduction by rule #5 (P → a) is state 10 (see example parse in the appendix). Now the GOTO transition for state 10 on P is really state 13 (according to our traditional tables), which is not the yydefgoto entry for P.

The parser adds yypgoto[P] i.e yypgoto[3] to the current exposed state number (state 10). The result is 12 (since yypgoto[3]=2). Now yytable[12] happens to be 13, so this is the new state to be pushed on to the stack.

How does the parser know that it has to pick the state value from yytable (via yypgoto) and not from yydefgoto? For this, there is a guard table called yycheck that will indicate this fact. It is only after checking with this table that a proper pick is made. We will see this operation in the section describing the parsing routine yyparse().

yypact

This table specifies the portion of yytable that describes what to do in state S. It is indexed by the symbol number of the token symbols. This is like the directory table D that was described in the previous section on compressing parsing tables.

#define YYPACT_NINF -5
static const yytype_int8 yypact[] =
{
      -4,    -5,    -4,     0,     1,    -5,     3,    -3,    -5,    -4,
      -4,    -5,     1,    -5
};

This is the first table consulted by the parsing loop. If yypact[cur-state] = YYPACT_NINF, its time for a default reduction using yydefact. This means that the state has only reductions as in state 1 (r5); otherwise the entry in yypact[cur-state] is to be added to the symbol number of the current look-ahead token and the resulting number is used as an index into yytable to get the next action (see comments in yytable code).An example:

Suppose we are in state zero and the look-ahead token is 'a'(symbol number 5). Now yypact[0] is -4, so the index into yytable is 5-4 = 1. yytable[1] is 1 which means "shift this symbol and go to state 1". See yycheck below for some more checking information. If the yytable entry specifies a negative value say -3, it means that we should be reducing the stack with rule #3.

yycheck

This is like a guard table. This table is used for various checks. Again this table is another mixed bag - of symbol numbers and state numbers. There is a very good explanation for this table inside Bison source code (src/tables.h). So I will just quote that here:

   YYCHECK = a vector indexed in parallel with YYTABLE.  It indicates,
   in a roundabout way, the bounds of the portion you are trying to
   examine.

   Suppose that the portion of YYTABLE starts at index P and the index
   to be examined within the portion is I.  Then if YYCHECK[P+I] != I,
   I is outside the bounds of what is actually allocated, and the
   default (from YYDEFACT or YYDEFGOTO) should be used.  Otherwise,
   YYTABLE[P+I] should be used.

Here is the table listing:

static const yytype_int8 yycheck[] =
{
       0,     5,     6,     3,     7,     4,     3,     2,     9,    -1,
      -1,    -1,    10
};

Examples of usage:

Suppose we are in state 0 and the look-ahead token in 'a'. yypact[0] = -4 and this, added to symbol number of 'a' (5) gives the index in yytable as we have seen before. Before accessing yytable[1] and proceeding to shift the token, the parser must check that yycheck[1] has the symbol number for the current token. If the test fails, it is time for a default reduction! But in our case, yycheck[1] contains 5 which really is the symbol number 'a', so we can safely consult yytable for what to do next.

yycheck also helps with special case reductions where the GOTO on a non-terminal for the current state is not the most common state (specified by yydefgoto); Consider the same example given in the yypgoto section: we were in state #10 and the reduction rule was rule #5 (P → a). We added yypgoto[P]=2 to current state number (10) to get 12. Before consulting yytable the parser checks with yycheck[12]. If this value is 10, then we know that state# 10 is a special case, otherwise, we use yydefgoto to decide the transition.

Other helper tables

There are other tables that Bison will output to help with printing debug information and parsing error recovery and verbose output. Here is a list with brief descriptions.

yyrhs - A -1 separated list of RHS symbol numbers of all rules. yyrhs[n] = first symbol on the RHS of rule #n.

yyprhs[n] - Index in yyrhs of the first RHS symbol of rule n.

yyrline[n] - Line # in the .y grammar source file where rule n is defined.

yytname[n] - A string specifying the symbol for symbol number n. yytoknum[n] - Token number of token n.

Summary of tables

Here is a one line summary of the main tables:

yytranslate: maps token numbers returned by yylex() to Bison's internal symbol numbers.
yydefact: lists default reduction rules for each state. 0 represents an error.
yydefgoto: lists default GOTOs for each non-terminal symbol. It is only used after checking with yypgoto.
yyr1: symbol number of lhs of each rule. Used at the time of a reduction to find the next state.
yyr2: length of rhs of each rule. Used at the time of reduction to pop the stack.
yytable: a highly compressed representation of the actions in each state. Negative entries represent reductions. There is a negative infinity to detect errors.
yypgoto: accounts for non-default GOTOs for all non-terminal symbols.
yypact: directory into yytable indexed by state number. The displacements in yytable are indexed by symbol number.
yycheck: guard table used to check legal bounds within portions of yytable.

Bison generated parsing routine - yyparse()

Lets get down to the operation of yyparse - the LR parsing routine based on the tables just described. To demystify the routine, I have copied here snippets of code from the generated C parser file and altered them slightly. The most important change here is that I have cut out error handling to make the basic operation very clear. Many checks and macro calls have been removed to expose the bare parsing algorithm. All explanation of the code is found in comments.

/*These are global variables*/

/* The look-ahead symbol.  */
int yychar;

/* The semantic value of the look-ahead symbol.  */
YYSTYPE yylval;

int 
yyparse()
{
	int yystate; 	/* current state */
	int yyn;	/* This is an all purpose variable! One time it may represent a state, the next time, it might be a rule! */
	
	int yyresult;	/* Result of parse to be returned to the called */
	
	int yytoken=0;	/* current token */
	
	/* The state stack: This parser does not shift symbols on to the stack. Only a stack of states is maintained. */	
	
	 int yyssa[YYINITDEPTH];	/*YYINITDEPTH is 200 */
	 int *yyss = yyssa	/* Bottom of state stack */
	 int *yyssp;	/* Top of state stack */
	 
	 /* The semantic value stack: This stack grows parallel to the state stack. At each reduction, semantic values are popped 
	  * off this stack and the semantic action is executed
	  */
	  YYSTYPE yyvsa[YYINITDEPTH];
	  YYSTYPE *yyvs = yyvsa;	/* Bottom of semantic stack */
	  YYSTYPE *yyvsp;	/* Top of semantic stack */
	
	  /* POP the state and semantic stacks by N symbols - useful for reduce actions */
	  #define YYPOPSTACK(N)   (yyvsp -= (N), yyssp -= (N))
	  
	  
	  int yylen = 0;	/* This variable is used in reduce actions to store the length of RHS of a rule; */
	  
	  /* Ok done declaring variables. Set the ball rolling */
	  
	  yystate = 0;	/* Initial state */
	  yychar = YYEMPTY /* YYEMPTY is -2 */
	  
	  yyssp = yyss; 	/* Top = bottom for state stack */
	  yyvsp = yyvs;	/* Same for semantic stack */
	  
	  goto yysetstate; 	/* Well, gotos are used for extracting maximum performance. */
	  
	  
	  /* Each label can be thought of as a function */
	  
	  yynewstate:  /* Push a new state on the stack */
	  
	  		yyssp ++;	/*Just increment the stack top; actual 'pushing' will happen in yysetstate */
	  		
	  		
	  yysetstate:
	  		
	  		*yyssp = yystate;	/* Ok pushed state on state stack top */
	  		
	  		goto yybackup;	/* This is where you will find some action */
	  		
	  		
	  yybackup:	/* The main parsing code starts here */
	  	
	  		yyn = yypact[yystate];	/* Refer to what yypact is saying about the current state */
	  		
	  		if ( yyn == YYPACT_NINF) 	/* If negative infinity its time for a default reduction */
	  		
	  			goto yydefault;	/* This label implements default reductions; see below */
	  			
	  		
	  		/* Check if we have a look-ahead token ready. This is LALR(1) parsing */
	  		
	  		if (yychar == YYEMPTY)
	  			
	  			yychar = YYLEX; /* Macro YYLEX is defined as yylex() */
	  				
	  		if (yychar <= YYEOF) 	/* YYEOF is 0 - the token returned by lexer at end of input */
	  			
	  			yychar = yytoken = YYEOF; /* set all to EOF */
	  				
	  		else
	  		
	  			yytoken = yytranslate[yychar];	/* Translate the lexer token into internal symbol number */
	  				
	  		
	  		/* Now we have a look-ahead token. Let the party begin ! */
	  		
	  		
	  		yyn = yyn + yytoken;	/* This is yypact[yystate] + yytoken */
	  		
	  		
	  		/* Observe this check carefully. We are checking that yyn is within the bounds of yytable
	  		 * and also if yycheck contains the current token number.
	  		 */
	  		if ( yyn < 0 || YYLAST < yyn  || yycheck[yyn] != yytoken )	/* YYLAST is the highest index in yytable */
	  			  			 
	  			goto yydefault; 	/* Its time for a default reduction */
	  				
	  		/* Ok, yyn is within bounds of yytable */
	  		
	  		yyn = yytable[yyn];	/* This is yytable[ yypact[yystate] + yytoken ] */
	  		
	  		if (yyn <= 0)	/* If yytable happens to contain a -ve value, its not a shift - its a reduce */
	  		{
	  			if (yyn == 0 || yyn == YYTABLE_NINF)	/* But check for out of bounds condition*/
	  				goto yyerrlab;	/* Label to handle errors */
	  		
	  			/* Other wise reduce with rule # -yyn */
	  			
	  			yyn = -yyn;
	  			goto yyreduce; 	/* Label to implement reductions */
	  		}
	  		
	  		/* Last check: See if we reached final state! */
	  		if (yyn == YYFINAL)	/* YYFINAL is 8 in our case */
	  			YYACCEPT;	/* macro deined as 'goto acceptlab - a label to finish up */
	  			
	  		/* That completes all checks; If we reached here, there is no other option but to shift */
	  		
	  		yystate = yyn;	/* Now, yyn (= yytable[ yypact[yystate] + yytoken ]) is a state that has to be pushed */
	  		
	  		*++yyvsp = yylval; /* Push the semantic value of the symbol on the semantic stack */
	  		
	  		goto yynewstate;	/* This will increment state stack top and the following yysetstate that will do the pushing */
	  		
	  
	  yydefault:	/* A label to implement default reductions */
	  	
	  		yyn = yydefact[yystate];	/* Get the default reduction rule for this state */
	  		
	  		if ( yyn == 0 )	/* This state has no default reduction. Something is wrong */
	  		
	  			goto yyerrlab;
	  			
	  		goto yyreduce;	/* Ok, got the default reduction rule # in yyn; go ahead and reduce the stack */
	  		
	  	
	  
	  yyreduce:	/* A lablel that implements reductions on stack. */
	  	
	  		/* By the time we are here, yyn contains the rule# to use for reducing the stack. */
	  		
	  		/* Steps for reduction:
	  		 * 1. Find the length of RHS of rule #yyn
	  		 * 2. Execute any semantic actions by taking the values from the semantic stack
	  		 * 3. POP 'length' symbols from the state stack and 'length' values from semantic stack
	  		 * 4. Find the LHS of rule #yyn
	  		 * 5. Find the GOTO of state currently on top of stack on LHS symbol
	  		 * 6. Push that state on top of stack
	  		 * 
	  		 */
	  		 
	  		 yylen = yyr2[yyn];	/* Get length of RHS */
	  		 
	  		 /* Default semantic action - $$=$1 */
	  		 yyval = yyvsp[1-yylen];
	  		 
	  		 /* Execute semantic actions */
	  		 switch ( yyn )	/* Each rule has its own semantic action */
	  		 {
	  		 
	  		 	default:	break;	/* We didn't have any semantic actions in the grammar.*/
	  		 	
	  		 }
	  		 
	  		 YYPOPSTACK (yylen);	/* This will pop both state and semantic stacks. See definition of this macro above */
	  		 
	  		 yylen = 0;	/* re-initialize yylen */
	  		 
	  		 *++yyvsp  = yyval;	/* Push the result of semantic evaluation on top of semantic stack */
	  		 
	  		 
	  		 /* Now shift the result of reduction (steps 4 - 6) */
	  		 
	  		 yyn = yyr1[yyn];	/* Reuse yyn at every opportunity.  For now, yyn is the LHS symbol (number) of the rule */
	  		 
			 /* First check for anomalous GOTOs, otherwise use Default GOTO (YYDEFGOTO)
			  * 
			  * Observe that if we subtract no. of terminals (YYNTOKENS) from symbol number of a nonterminal, we get
			  * an index into yypgoto or yydefgoto for that non-terminal.
			  */
			  
	  		 yystate = yypgoto[yyn - YYNTOKENS] + *yyssp;
	  		 
	  		 /* A couple of checks are needed before we know this is not a default GOTO
	  		  * 1. yystate must be within bounds of yytable. ( 0 to YYLAST )
	  		  * 2. yycheck must contain the state currently on top of the stack
	  		  */	  		 
	  		 if ( 0 <= yystate && yystate <= YYLAST && yycheck[yystate] = *yyssp)
	  		 
	  		 	yystate = yytable[yystate];	/* Take the GOTO from yytable */
	  		 	
	  		 else
	  		 	
	  		 	yystate = yydefgoto[yyn - YYNTOKENS];	/* Otherwise use the default GOTO */
	  		 	
	  		 goto yynewstate;	/* Simply push the newly found state on top of stack and continue */
	  		 
}	/* End of yyparse() */

That completes a very basic and simplified parsing routine based on the tables produced by Bison. The actual generated parser does other things like checking the stack for overflows, relocating the stacks when necessary and a lot of error checking. If you use error productions, the parser has to get back to a sane state where the error token can be shifted next. Also, it has to print appropriate verbose output and be able to call yyerror(); There is also cleanup code for symbols that are popped off the semantic stack to avoid memory leaks. I have ignored all this code here to focus on the outline of the parsing routine yyparse().

An example parse

Bison has a facility to enable parse tracing. If you set yydebug=1 before calling yyparse(), the parser will produce a trace of shifts and reductions as the input string is being processed. I produced such a trace for a sample input string in the language generated by our example grammar. Following this trace with the parsing routine and the tables can help in appreciating the table scheme and the operation of yyparse() function.

Sample string: a,a;a,a$
$ is my EOF character. Here is the trace listing.

Starting parse
Entering state 0
Reading a token: Next token is token 'a' ()
Shifting token 'a' ()
Entering state 1
Reducing stack by rule 5 (line 26):
   $1 = token 'a' ()
-> $$ = nterm P ()
Stack now 0
Entering state 5
Reducing stack by rule 4 (line 24):
   $1 = nterm P ()
-> $$ = nterm E ()
Stack now 0
Entering state 4
Reading a token: Next token is token ',' ()
Shifting token ',' ()
Entering state 10
Reading a token: Next token is token 'a' ()
Shifting token 'a' ()
Entering state 1
Reducing stack by rule 5 (line 26):
   $1 = token 'a' ()
-> $$ = nterm P ()
Stack now 0 4 10
Entering state 13
Reducing stack by rule 3 (line 23):
   $1 = nterm E ()
   $2 = token ',' ()
   $3 = nterm P ()
-> $$ = nterm E ()
Stack now 0
Entering state 4
Reading a token: Next token is token ';' ()
Reducing stack by rule 2 (line 21):
   $1 = nterm E ()
-> $$ = nterm L ()
Stack now 0
Entering state 3
Next token is token ';' ()
Shifting token ';' ()
Entering state 9
Reading a token: Next token is token 'a' ()
Shifting token 'a' ()
Entering state 1
Reducing stack by rule 5 (line 26):
   $1 = token 'a' ()
-> $$ = nterm P ()
Stack now 0 3 9
Entering state 5
Reducing stack by rule 4 (line 24):
   $1 = nterm P ()
-> $$ = nterm E ()
Stack now 0 3 9
Entering state 12
Reading a token: Next token is token ',' ()
Shifting token ',' ()
Entering state 10
Reading a token: Next token is token 'a' ()
Shifting token 'a' ()
Entering state 1
Reducing stack by rule 5 (line 26):
   $1 = token 'a' ()
-> $$ = nterm P ()
Stack now 0 3 9 12 10
Entering state 13
Reducing stack by rule 3 (line 23):
   $1 = nterm E ()
   $2 = token ',' ().
   $3 = nterm P ()
-> $$ = nterm E ()
Stack now 0 3 9
Entering state 12
Reading a token: Now at end of input.
Reducing stack by rule 1 (line 20):
   $1 = nterm L ()
   $2 = token ';' ()
   $3 = nterm E ()
-> $$ = nterm L ()
Stack now 0
Entering state 3
Now at end of input.
Stack now 0 3
Cleanup: popping nterm L ()

References

[1] Nigel P. Chapman "LR Parsing Theory and practice". Cambridge University Press 1987. This is an excellent book on the theory and implementation of LR parsers. Unfortunately this book is out of print now. The example grammar used through out this article is taken from this book.

[2] Aho, Sethi, Ullman "Compilers - Principles Techniques and Tools". Pearson Education. The "red Dragon book" is a classic. It is the best book I have read on compiler design.

[3] Robert Endre Tarjan and Andrew Chi-Chih Yao in their paper "Storing a Sparse Table", Communications of the ACM, Volume 22, issue 11, pages 606-611. Availablehere.

[4] GNU Bison source code. http://ftp.gnu.org/pub/gnu/bison/

[5] GNU Bison user manual. http://www.gnu.org/software/bison/manual

Copying this document

Copyright ©2006 Satya Kiran Popuri.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled ``GNU Free Documentation License''.

GNU Free Documentation License

		GNU Free Documentation License
		  Version 1.2, November 2002


 Copyright (C) 2000,2001,2002  Free Software Foundation, Inc.
     51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
 Everyone is permitted to copy and distribute verbatim copies
 of this license document, but changing it is not allowed.


0. PREAMBLE

The purpose of this License is to make a manual, textbook, or other
functional and useful document "free" in the sense of freedom: to
assure everyone the effective freedom to copy and redistribute it,
with or without modifying it, either commercially or noncommercially.
Secondarily, this License preserves for the author and publisher a way
to get credit for their work, while not being considered responsible
for modifications made by others.

This License is a kind of "copyleft", which means that derivative
works of the document must themselves be free in the same sense.  It
complements the GNU General Public License, which is a copyleft
license designed for free software.

We have designed this License in order to use it for manuals for free
software, because free software needs free documentation: a free
program should come with manuals providing the same freedoms that the
software does.  But this License is not limited to software manuals;
it can be used for any textual work, regardless of subject matter or
whether it is published as a printed book.  We recommend this License
principally for works whose purpose is instruction or reference.


1. APPLICABILITY AND DEFINITIONS

This License applies to any manual or other work, in any medium, that
contains a notice placed by the copyright holder saying it can be
distributed under the terms of this License.  Such a notice grants a
world-wide, royalty-free license, unlimited in duration, to use that
work under the conditions stated herein.  The "Document", below,
refers to any such manual or work.  Any member of the public is a
licensee, and is addressed as "you".  You accept the license if you
copy, modify or distribute the work in a way requiring permission
under copyright law.

A "Modified Version" of the Document means any work containing the
Document or a portion of it, either copied verbatim, or with
modifications and/or translated into another language.

A "Secondary Section" is a named appendix or a front-matter section of
the Document that deals exclusively with the relationship of the
publishers or authors of the Document to the Document's overall subject
(or to related matters) and contains nothing that could fall directly
within that overall subject.  (Thus, if the Document is in part a
textbook of mathematics, a Secondary Section may not explain any
mathematics.)  The relationship could be a matter of historical
connection with the subject or with related matters, or of legal,
commercial, philosophical, ethical or political position regarding
them.

The "Invariant Sections" are certain Secondary Sections whose titles
are designated, as being those of Invariant Sections, in the notice
that says that the Document is released under this License.  If a
section does not fit the above definition of Secondary then it is not
allowed to be designated as Invariant.  The Document may contain zero
Invariant Sections.  If the Document does not identify any Invariant
Sections then there are none.

The "Cover Texts" are certain short passages of text that are listed,
as Front-Cover Texts or Back-Cover Texts, in the notice that says that
the Document is released under this License.  A Front-Cover Text may
be at most 5 words, and a Back-Cover Text may be at most 25 words.

A "Transparent" copy of the Document means a machine-readable copy,
represented in a format whose specification is available to the
general public, that is suitable for revising the document
straightforwardly with generic text editors or (for images composed of
pixels) generic paint programs or (for drawings) some widely available
drawing editor, and that is suitable for input to text formatters or
for automatic translation to a variety of formats suitable for input
to text formatters.  A copy made in an otherwise Transparent file
format whose markup, or absence of markup, has been arranged to thwart
or discourage subsequent modification by readers is not Transparent.
An image format is not Transparent if used for any substantial amount
of text.  A copy that is not "Transparent" is called "Opaque".

Examples of suitable formats for Transparent copies include plain
ASCII without markup, Texinfo input format, LaTeX input format, SGML
or XML using a publicly available DTD, and standard-conforming simple
HTML, PostScript or PDF designed for human modification.  Examples of
transparent image formats include PNG, XCF and JPG.  Opaque formats
include proprietary formats that can be read and edited only by
proprietary word processors, SGML or XML for which the DTD and/or
processing tools are not generally available, and the
machine-generated HTML, PostScript or PDF produced by some word
processors for output purposes only.

The "Title Page" means, for a printed book, the title page itself,
plus such following pages as are needed to hold, legibly, the material
this License requires to appear in the title page.  For works in
formats which do not have any title page as such, "Title Page" means
the text near the most prominent appearance of the work's title,
preceding the beginning of the body of the text.

A section "Entitled XYZ" means a named subunit of the Document whose
title either is precisely XYZ or contains XYZ in parentheses following
text that translates XYZ in another language.  (Here XYZ stands for a
specific section name mentioned below, such as "Acknowledgements",
"Dedications", "Endorsements", or "History".)  To "Preserve the Title"
of such a section when you modify the Document means that it remains a
section "Entitled XYZ" according to this definition.

The Document may include Warranty Disclaimers next to the notice which
states that this License applies to the Document.  These Warranty
Disclaimers are considered to be included by reference in this
License, but only as regards disclaiming warranties: any other
implication that these Warranty Disclaimers may have is void and has
no effect on the meaning of this License.


2. VERBATIM COPYING

You may copy and distribute the Document in any medium, either
commercially or noncommercially, provided that this License, the
copyright notices, and the license notice saying this License applies
to the Document are reproduced in all copies, and that you add no other
conditions whatsoever to those of this License.  You may not use
technical measures to obstruct or control the reading or further
copying of the copies you make or distribute.  However, you may accept
compensation in exchange for copies.  If you distribute a large enough
number of copies you must also follow the conditions in section 3.

You may also lend copies, under the same conditions stated above, and
you may publicly display copies.


3. COPYING IN QUANTITY

If you publish printed copies (or copies in media that commonly have
printed covers) of the Document, numbering more than 100, and the
Document's license notice requires Cover Texts, you must enclose the
copies in covers that carry, clearly and legibly, all these Cover
Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
the back cover.  Both covers must also clearly and legibly identify
you as the publisher of these copies.  The front cover must present
the full title with all words of the title equally prominent and
visible.  You may add other material on the covers in addition.
Copying with changes limited to the covers, as long as they preserve
the title of the Document and satisfy these conditions, can be treated
as verbatim copying in other respects.

If the required texts for either cover are too voluminous to fit
legibly, you should put the first ones listed (as many as fit
reasonably) on the actual cover, and continue the rest onto adjacent
pages.

If you publish or distribute Opaque copies of the Document numbering
more than 100, you must either include a machine-readable Transparent
copy along with each Opaque copy, or state in or with each Opaque copy
a computer-network location from which the general network-using
public has access to download using public-standard network protocols
a complete Transparent copy of the Document, free of added material.
If you use the latter option, you must take reasonably prudent steps,
when you begin distribution of Opaque copies in quantity, to ensure
that this Transparent copy will remain thus accessible at the stated
location until at least one year after the last time you distribute an
Opaque copy (directly or through your agents or retailers) of that
edition to the public.

It is requested, but not required, that you contact the authors of the
Document well before redistributing any large number of copies, to give
them a chance to provide you with an updated version of the Document.


4. MODIFICATIONS

You may copy and distribute a Modified Version of the Document under
the conditions of sections 2 and 3 above, provided that you release
the Modified Version under precisely this License, with the Modified
Version filling the role of the Document, thus licensing distribution
and modification of the Modified Version to whoever possesses a copy
of it.  In addition, you must do these things in the Modified Version:

A. Use in the Title Page (and on the covers, if any) a title distinct
   from that of the Document, and from those of previous versions
   (which should, if there were any, be listed in the History section
   of the Document).  You may use the same title as a previous version
   if the original publisher of that version gives permission.
B. List on the Title Page, as authors, one or more persons or entities
   responsible for authorship of the modifications in the Modified
   Version, together with at least five of the principal authors of the
   Document (all of its principal authors, if it has fewer than five),
   unless they release you from this requirement.
C. State on the Title page the name of the publisher of the
   Modified Version, as the publisher.
D. Preserve all the copyright notices of the Document.
E. Add an appropriate copyright notice for your modifications
   adjacent to the other copyright notices.
F. Include, immediately after the copyright notices, a license notice
   giving the public permission to use the Modified Version under the
   terms of this License, in the form shown in the Addendum below.
G. Preserve in that license notice the full lists of Invariant Sections
   and required Cover Texts given in the Document's license notice.
H. Include an unaltered copy of this License.
I. Preserve the section Entitled "History", Preserve its Title, and add
   to it an item stating at least the title, year, new authors, and
   publisher of the Modified Version as given on the Title Page.  If
   there is no section Entitled "History" in the Document, create one
   stating the title, year, authors, and publisher of the Document as
   given on its Title Page, then add an item describing the Modified
   Version as stated in the previous sentence.
J. Preserve the network location, if any, given in the Document for
   public access to a Transparent copy of the Document, and likewise
   the network locations given in the Document for previous versions
   it was based on.  These may be placed in the "History" section.
   You may omit a network location for a work that was published at
   least four years before the Document itself, or if the original
   publisher of the version it refers to gives permission.
K. For any section Entitled "Acknowledgements" or "Dedications",
   Preserve the Title of the section, and preserve in the section all
   the substance and tone of each of the contributor acknowledgements
   and/or dedications given therein.
L. Preserve all the Invariant Sections of the Document,
   unaltered in their text and in their titles.  Section numbers
   or the equivalent are not considered part of the section titles.
M. Delete any section Entitled "Endorsements".  Such a section
   may not be included in the Modified Version.
N. Do not retitle any existing section to be Entitled "Endorsements"
   or to conflict in title with any Invariant Section.
O. Preserve any Warranty Disclaimers.

If the Modified Version includes new front-matter sections or
appendices that qualify as Secondary Sections and contain no material
copied from the Document, you may at your option designate some or all
of these sections as invariant.  To do this, add their titles to the
list of Invariant Sections in the Modified Version's license notice.
These titles must be distinct from any other section titles.

You may add a section Entitled "Endorsements", provided it contains
nothing but endorsements of your Modified Version by various
parties--for example, statements of peer review or that the text has
been approved by an organization as the authoritative definition of a
standard.

You may add a passage of up to five words as a Front-Cover Text, and a
passage of up to 25 words as a Back-Cover Text, to the end of the list
of Cover Texts in the Modified Version.  Only one passage of
Front-Cover Text and one of Back-Cover Text may be added by (or
through arrangements made by) any one entity.  If the Document already
includes a cover text for the same cover, previously added by you or
by arrangement made by the same entity you are acting on behalf of,
you may not add another; but you may replace the old one, on explicit
permission from the previous publisher that added the old one.

The author(s) and publisher(s) of the Document do not by this License
give permission to use their names for publicity for or to assert or
imply endorsement of any Modified Version.


5. COMBINING DOCUMENTS

You may combine the Document with other documents released under this
License, under the terms defined in section 4 above for modified
versions, provided that you include in the combination all of the
Invariant Sections of all of the original documents, unmodified, and
list them all as Invariant Sections of your combined work in its
license notice, and that you preserve all their Warranty Disclaimers.

The combined work need only contain one copy of this License, and
multiple identical Invariant Sections may be replaced with a single
copy.  If there are multiple Invariant Sections with the same name but
different contents, make the title of each such section unique by
adding at the end of it, in parentheses, the name of the original
author or publisher of that section if known, or else a unique number.
Make the same adjustment to the section titles in the list of
Invariant Sections in the license notice of the combined work.

In the combination, you must combine any sections Entitled "History"
in the various original documents, forming one section Entitled
"History"; likewise combine any sections Entitled "Acknowledgements",
and any sections Entitled "Dedications".  You must delete all sections
Entitled "Endorsements".


6. COLLECTIONS OF DOCUMENTS

You may make a collection consisting of the Document and other documents
released under this License, and replace the individual copies of this
License in the various documents with a single copy that is included in
the collection, provided that you follow the rules of this License for
verbatim copying of each of the documents in all other respects.

You may extract a single document from such a collection, and distribute
it individually under this License, provided you insert a copy of this
License into the extracted document, and follow this License in all
other respects regarding verbatim copying of that document.


7. AGGREGATION WITH INDEPENDENT WORKS

A compilation of the Document or its derivatives with other separate
and independent documents or works, in or on a volume of a storage or
distribution medium, is called an "aggregate" if the copyright
resulting from the compilation is not used to limit the legal rights
of the compilation's users beyond what the individual works permit.
When the Document is included in an aggregate, this License does not
apply to the other works in the aggregate which are not themselves
derivative works of the Document.

If the Cover Text requirement of section 3 is applicable to these
copies of the Document, then if the Document is less than one half of
the entire aggregate, the Document's Cover Texts may be placed on
covers that bracket the Document within the aggregate, or the
electronic equivalent of covers if the Document is in electronic form.
Otherwise they must appear on printed covers that bracket the whole
aggregate.


8. TRANSLATION

Translation is considered a kind of modification, so you may
distribute translations of the Document under the terms of section 4.
Replacing Invariant Sections with translations requires special
permission from their copyright holders, but you may include
translations of some or all Invariant Sections in addition to the
original versions of these Invariant Sections.  You may include a
translation of this License, and all the license notices in the
Document, and any Warranty Disclaimers, provided that you also include
the original English version of this License and the original versions
of those notices and disclaimers.  In case of a disagreement between
the translation and the original version of this License or a notice
or disclaimer, the original version will prevail.

If a section in the Document is Entitled "Acknowledgements",
"Dedications", or "History", the requirement (section 4) to Preserve
its Title (section 1) will typically require changing the actual
title.


9. TERMINATION

You may not copy, modify, sublicense, or distribute the Document except
as expressly provided for under this License.  Any other attempt to
copy, modify, sublicense or distribute the Document is void, and will
automatically terminate your rights under this License.  However,
parties who have received copies, or rights, from you under this
License will not have their licenses terminated so long as such
parties remain in full compliance.


10. FUTURE REVISIONS OF THIS LICENSE

The Free Software Foundation may publish new, revised versions
of the GNU Free Documentation License from time to time.  Such new
versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns.  See
http://www.gnu.org/copyleft/.

Each version of the License is given a distinguishing version number.
If the Document specifies that a particular numbered version of this
License "or any later version" applies to it, you have the option of
following the terms and conditions either of that specified version or
of any later version that has been published (not as a draft) by the
Free Software Foundation.  If the Document does not specify a version
number of this License, you may choose any version ever published (not
as a draft) by the Free Software Foundation.


ADDENDUM: How to use this License for your documents

To use this License in a document you have written, include a copy of
the License in the document and put the following copyright and
license notices just after the title page:

    Copyright (c)  YEAR  YOUR NAME.
    Permission is granted to copy, distribute and/or modify this document
    under the terms of the GNU Free Documentation License, Version 1.2
    or any later version published by the Free Software Foundation;
    with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
    A copy of the license is included in the section entitled "GNU
    Free Documentation License".

If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
replace the "with...Texts." line with this:

    with the Invariant Sections being LIST THEIR TITLES, with the
    Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST.

If you have Invariant Sections without Cover Texts, or some other
combination of the three, merge those two alternatives to suit the
situation.

If your document contains nontrivial examples of program code, we
recommend releasing these examples in parallel under your choice of
free software license, such as the GNU General Public License,
to permit their use in free software.