Top-down parsing can be viewed as the problem of constructing a parse tree for the input string, starting from the root and creating the nodes of the parse tree in preorder (depth-first, as discussed in Section 2.3.4). Equivalently, top-down parsing can be viewed as finding a leftmost derivation for an input string.node
Example 4.27: The sequence of parse trees in Fig. 4.12 for the input id+ id * id is a top-down parse according to grammar (4.2), repeated here:express
E → T E’app E’ → + T E’ | ϵ ide T → F T’oop T’ → * F T’ | ϵpost F → ( E ) | idui |
(4.28)this |
This sequence of trees corresponds to a leftmost derivation of the input. □idea
At each step of a top-down parse, the key problem is that of determining the production to be applied for a nonterminal, say A. Once an A-production is chosen, the rest of the parsing process consists of 「matching」 the terminal symbols in the production body with the input string.spa
The section begins with a general form of top-down parsing, called recursive-descent parsing, which may require backtracking to find the correct A-production to be applied. Section 2.4.2 introduced predictive parsing, a special case of recursive-descent parsing, where no backtracking is required. Predictive parsing chooses the correct A-production by looking ahead at the input a fixed number of symbols, typically we may look only at one (that is, the next input symbol).
|
Figure 4.12: Top-down parse for id + id * id |
For example, consider the top-down parse in Fig. 4.12, which constructs a tree with two nodes labeled E’. At the first E’ node (in preorder), the production E’ → +T E’ is chosen; at the second E’ node, the production E’ → ϵ is chosen. A predictive parser can choose between E’-productions by looking at the next input symbol.
The class of grammars for which we can construct predictive parsers looking k symbols ahead in the input is sometimes called the LL(k) class. We discuss the LL(1) class in Section 4.4.3, but intro duce certain computations, called FIRST and FOLLOW , in a preliminary Section 4.4.2. From the FIRST and FOLLOW sets for a grammar, we shall construct 「predictive parsing tables,」 which make explicit the choice of production during top-down parsing. These sets are also useful during bottom-up parsing, as we shall see.
In Section 4.4.4 we give a non-recursive parsing algorithm that maintains a stack explicitly, rather than implicitly via recursive calls. Finally, in Section 4.4.5 we discuss error recovery during top-down parsing.
|
void A() { |
1) |
Choose an A-production, A → X1 X2 … Xk; |
2) |
for ( i = 1 to k ) { |
3) |
if ( X i is a nonterminal ) |
4) |
call procedure X i (); |
5) |
else if ( X i equals the current input symbol a ) |
6) |
advance the input to the next symbol; |
7) |
else /* an error has occurred */; |
|
} } |
Figure 4.13: A typical procedure for a nonterminal in a top-down parser |
A recursive-descent parsing program consists of a set of procedures, one for each nonterminal. Execution begins with the procedure for the start symbol, which halts and announces success if its procedure body scans the entire input string. Pseudocode for a typical nonterminal appears in Fig. 4.13. Note that this pseudocode is nondeterministic, since it begins by choosing the A-production to apply in a manner that is not specified.
General recursive-descent may require backtracking; that is, it may require repeated scans over the input. However, backtracking is rarely needed to parse programming language constructs, so backtracking parsers are not seen frequently. Even for situations like natural language parsing, backtracking is not very efficient, and tabular methods such as the dynamic programming algorithm of Exercise 4.4.9 or the method of Earley (see the bibliographic notes) are preferred.
To allow backtracking, the co de of Fig. 4.13 needs to be modified. First, we cannot choose a unique A-production at line (1), so we must try each of several productions in some order. Then, failure at line (7) is not ultimate failure, but suggests only that we need to return to line (1) and try another A-production. Only if there are no more A-productions to try do we declare that an input error has been found. In order to try another A-production, we need to be able to reset the input pointer to where it was when we first reached line (1). Thus, a local variable is needed to store this input pointer for future use.
Example 4.29: Consider the grammar
S → c A d
A → a b | a
To construct a parse tree top-down for the input string w = cad, begin with a tree consisting of a single node labeled S, and the input pointer pointing to c, the first symbol of w. S has only one production, so we use it to expand S and obtain the tree of Fig. 4.14(a). The leftmost leaf, labeled c, matches the first symbol of input w , so we advance the input pointer to a, the second symbol of w , and consider the next leaf, labeled A.
|
|
|
(a) |
(b) |
(c) |
Figure 4.14: Steps in a top-down parse |
Now, we expand A using the first alternative A → a b to obtain the tree of Fig. 4.14(b). We have a match for the second input symbol, a, so we advance the input pointer to d, the third input symbol, and compare d against the next leaf, labeled b. Since b does not match d, we rep ort failure and go back to A to see whether there is another alternative for A that has not been tried, but that might produce a match.
In going back to A, we must reset the input pointer to position 2, the position it had when we first came to A, which means that the procedure for A must store the input pointer in a local variable.
The second alternative for A produces the tree of Fig. 4.14(c). The leaf a matches the second symbol of w and the leaf d matches the third symbol. Since we have produced a parse tree for w, we halt and announce successful completion of parsing. □
A left-recursive grammar can cause a recursive-descent parser, even one with backtracking, to go into an infinite lo op. That is, when we try to expand a nonterminal A, we may eventually find ourselves again trying to expand A without having consumed any input.
The construction of both top-down and bottom-up parsers is aided by two functions, FIRST and FOLLOW, associated with a grammar G. During top-down parsing, FIRST and FOLLOW allow us to choose which production to apply, based on the next input symbol. During panic-mode error recovery, sets of tokens produced by FOLLOW can be used as synchronizing tokens.
Define FIRST (α), where α is any string of grammar symbols, to be the set of terminals that begin strings derived from α. If α *⇒ϵ, then ϵ is also in FIRST (α). For example, in Fig. 4.15, A *⇒cγ, so c is in FIRST (A).
For a preview of how FIRST can be used during predictive parsing, consider two A-productions A → α|β, where FIRST (α) and FIRST (β) are disjoint sets. We can then choose between these A-productions by looking at the next input symbol a, since a can be in at most one of FIRST (α) and FIRST (β), not both. For instance, if a is in FIRST (β) choose the production A → β. This idea will be explored when LL(1) grammars are defined in Section 4.4.3.
|
Figure 4.15: Terminal c is in FIRST(A) and a is in FOLLOW (A) |
Define FOLLOW(A), for nonterminal A, to be the set of terminals a that can appear immediately to the right of A in some sentential form; that is, the set of terminals a such that there exists a derivation of the form S *⇒αAaβ , for some α and β , as in Fig. 4.15. Note that there may have been symbols between A and a, at some time during the derivation, but if so, they derived ϵ and disappeared. In addition, if A can be the rightmost symbol in some sentential form, then $ is in FOLLOW (A); recall that $ is a special 「endmarker」 symbol that is assumed not to be a symbol of any grammar.
To compute FIRST (X) for all grammar symbols X, apply the following rules until no more terminals or can be added to any FIRST set.
1. If X is a terminal, then FIRST(X) = {X}.
2. If X is a nonterminal and X → Y1 Y2 … Yk is a production for some k≥1, then place a in FIRST(X) if for some i, a is in FIRST (Yi), and is in all of FIRST(Y1 ), … , FIRST(Yi-1 ); that is, Y1 … Yi-1 *⇒ϵ. If ϵ is in FIRST (Yj) for all j = 1, 2, … k, then add ϵ to FIRST (X). For example, everything in FIRST (Y1) is surely in FIRST(X). If Y1 does not derive ϵ, then we add nothing more to FIRST(X), but if Y1 *⇒ϵ, then we add FIRST (Y2), and so on.
3. If X → ϵ is a production, then add ϵ to FIRST(X).
Now, we can compute FIRST for any string X1 X2 … Xn as follows. Add to FIRST (X1 X2 … Xn) all non-ϵ symbols of FIRST(X1). Also add the non-symbols of FIRST (X2), if is in FIRST (X1); the non-ϵ symbols of FIRST (X3), if is in FIRST (X1) and FIRST(X2); and so on. Finally, add ϵ to FIRST (X1 X2 … Xn) if, for all i, ϵ is in FIRST (Xi).
To compute FOLLOW (A) for all nonterminals A, apply the following rules until nothing can be added to any FOLLOW set.
1. Place $ in FOLLOW (S), where S is the start symbol, and $ is the input right endmarker.
2. If there is a production A → αBβ, then everything in FIRST (β) except ϵ is in FOLLOW (B).
3. If there is a production A →αB, or a production A →αBβ, where FIRST (β) contains ϵ, then everything in FOLLOW (A) is in FOLLOW (B).
Example 4.30: Consider again the non-left-recursive grammar (4.28). Then:
1. FIRST (F) = FIRST (T) = FIRST (E) = {(, id}. To see why, note that the two productions for F have bodies that start with these two terminal symbols, id and the left parenthesis. T has only one production, and its body starts with F. Since F does not derive ϵ, FIRST (T) must be the same as FIRST (F). The same argument covers FIRST (E).
2. FIRST (E’) = {+, ϵ}. The reason is that one of the two productions for E’ has a body that begins with terminal +, and the other's body is ϵ. Whenever a nonterminal derives ϵ, we place in FIRST for that nonterminal.
3. FIRST (T’) = {*, ϵ}. The reasoning is analogous to that for FIRST (E’).
4. FOLLOW (E) = FOLLOW (E’) = {), $}. Since E is the start symbol, FOLLOW (E) must contain $. The production body (E) explains why the right parenthesis is in FOLLOW (E). For E’, note that this nonterminal appears only at the ends of b o dies of E-productions. Thus, FOLLOW (E’) must be the same as FOLLOW (E).
5. FOLLOW (T) = FOLLOW (T’) = {+, ), $}. Notice that T appears in bodies only followed by E’. Thus, everything except ϵ that is in FIRST (E’) must be in FOLLOW (T); that explains the symbol +. However, since FIRST (E’) contains ϵ (i.e., E’ *⇒ϵ), and E’ is the entire string following T in the bodies of the E-productions, everything in FOLLOW (E) must also be in FOLLOW (T). That explains the symbols $ and the right parenthesis. As for T’, since it appears only at the ends of the T -productions, it must be that FOLLOW (T’) = FOLLOW (T).
6. FOLLOW (F) = {+, *, ), $}. The reasoning is analogous to that for T in point (5).
□
Predictive parsers, that is, recursive-descent parsers needing no backtracking, can be constructed for a class of grammars called LL(1). The first 「L」 in LL(1) stands for scanning the input from left to right, the second 「L」 for producing a leftmost derivation, and the 「1」 for using one input symbol of lookahead at each step to make parsing action decisions.
Transition Diagrams for Predictive ParsersTransition diagrams are useful for visualizing predictive parsers. For example, the transition diagrams for nonterminals E and E’ of grammar (4.28) appear in Fig. 4.16(a). To construct the transition diagram from a grammar, first eliminate left recursion and then left factor the grammar. Then, for each nonterminal A, 1. Create an initial and final (return) state. 2. For each production A →X1 X2 … Xk, create a path from the initial to the final state, with edges labeled X1, X2, …, Xk. If A → ϵ, the path is an edge labeled ϵ. Transition diagrams for predictive parsers differ from those for lexical analyzers. Parsers have one diagram for each nonterminal. The labels of edges can be tokens or nonterminals. A transition on a token (terminal) means that we take that transition if that token is the next input symbol. A transition on a nonterminal A is a call of the procedure for A. With an LL(1) grammar, the ambiguity of whether or not to take an -edge can be resolved by making -transitions the default choice. Transition diagrams can be simplified, provided the sequence of grammar symbols along paths is preserved. We may also substitute the diagram for a nonterminal A in place of an edge labeled A. The diagrams in Fig. 4.16(a) and (b) are equivalent: if we trace paths from E to an accepting state and substitute for E’, then, in both sets of diagrams, the grammar symbols along the paths make up strings of the form T + T + … + T . The diagram in (b) can be obtained from (a) by transformations akin to those in Section 2.5.4, where we used tail-recursion removal and substitution of procedure b o dies to optimize the procedure for a nonterminal. |
The class of LL(1) grammars is rich enough to cover most programming constructs, although care is needed in writing a suitable grammar for the source language. For example, no left-recursive or ambiguous grammar can be LL(1).
A grammar G is LL(1) if and only if whenever A → α|β are two distinct productions of G, the following conditions hold:
1. For no terminal a do both α and β derive strings beginning with a.
2. At most one of and fi can derive the empty string.
3. If β *⇒ϵ, then α does not derive any string beginning with a terminal in FOLLOW (A). Likewise, if α *⇒ϵ, then β does not derive any string beginning with a terminal in FOLLOW (A).
|
|
(a) |
(b) |
Figure 4.16: Transition diagrams for nonterminals E and E’ of grammar 4.28 |
The first two conditions are equivalent to the statement that FIRST (α) and FIRST (β) are disjoint sets. The third condition is equivalent to stating that if ϵ is in FIRST (β), then FIRST (α) and FOLLOW(A) are disjoint sets, and likewise if ϵ is in FIRST(α).
Predictive parsers can be constructed for LL(1) grammars since the proper production to apply for a nonterminal can be selected by looking only at the current input symbol. Flow-of-control constructs, with their distinguishing keywords, generally satisfy the LL(1) constraints. For instance, if we have the productions
stmt |
→ if ( expr ) stmt else stmt | while ( expr ) stmt | { stmt_list } |
then the keywords if, while, and the symbol { tell us which alternative is the only one that could possibly succeed if we are to find a statement.
The next algorithm collects the information from FIRST and FOLLOW sets into a predictive parsing table M [A, a], a two-dimensional array, where A is a nonterminal, and a is a terminal or the symbol $, the input endmarker. The algorithm is based on the following idea: the production A → α is chosen if the next input symbol a is in FIRST (α). The only complication occurs when α = ϵ or, more generally, α *⇒ϵ. In this case, we should again choose A → α, if the current input symbol is in FOLLOW (A), or if the $ on the input has been reached and $ is in FOLLOW (A).
Algorithm 4.31: Construction of a predictive parsing table.
INPUT: Grammar G.
OUTPUT: Parsing table M.
METHOD: For each production A → α of the grammar, do the following:
1. For each terminal a in FIRST (A), add A → α to M [A, a].
2. If ϵ is in FIRST (α), then for each terminal b in FOLLOW (A), add A → α to M [A, b]. If ϵ is in FIRST (α) and $ is in FOLLOW (A), add A → α to M [A, $] as well.
If, after performing the above, there is no production at all in M [A, a], then set M [A, a] to error (which we normally represent by an empty entry in the table). □
Example 4.32: For the expression grammar (4.28), Algorithm 4.31 produces the parsing table in Fig. 4.17. Blanks are error entries; non-blanks indicate a production with which to expand a nonterminal.
NON- TERMINAL |
INPUT SYMBOL |
|||||
id |
+ |
* |
( |
) |
$ |
|
E |
E → T E’ |
|
|
E → T E’ |
|
|
E’ |
|
E’→ +T E’ |
|
|
E’→ ϵ |
E’→ ϵ |
T |
T → F T’ |
|
|
T → F T’ |
|
|
T’ |
|
T’→ ϵ |
T’→ *F T’ |
|
T’→ ϵ |
T’→ ϵ |
F |
F → id |
|
|
F → (E) |
|
|
Figure 4.17: Parsing table M for Example 4.32
Consider production E → T E’. Since
FIRST (T E’) = FIRST (T) = {(, id}
this production is added to M [E, (] and M [E, id]. production E’→ +T E’ is added to M [E’, +] since FIRST (+T E’) = {+}. Since FOLLOW (E’) = {), $}, production E’→ ϵ is added to M [E’, )] and M [E’, $]. □
Algorithm 4.31 can be applied to any grammar G to produce a parsing table M. For every LL(1) grammar, each parsing-table entry uniquely identifies a production or signals an error. For some grammars, however, M may have some entries that are multiply defined. For example, if G is left-recursive or ambiguous, then M will have at least one multiply defined entry. Although left-recursion elimination and left factoring are easy to do, there are some grammars for which no amount of alteration will produce an LL(1) grammar.
The language in the following example has no LL(1) grammar at all.
Example 4.33: The following grammar, which abstracts the dangling-else problem, is repeated here from Example 4.22:
S → iEtSS’ | a
S’→ eS | ϵ
E → b
The parsing table for this grammar app ears in Fig. 4.18. The entry for M [S’, e] contains both S’→ eS and S’→ ϵ.
The grammar is ambiguous and the ambiguity is manifested by a choice in what production to use when an e (else) is seen. We can resolve this ambiguity
NON-TERMINAL |
INPUT SYMBOL |
|||||
a |
b |
e |
i |
t |
$ |
|
S |
S → a |
|
|
S → iEtSS’ |
|
|
S’ |
|
|
S’→ ϵ S’→ eS |
|
|
S’→ ϵ |
E |
|
E → b |
|
|
|
|
Figure 4.18: Parsing table M for Example 4.33
by choosing S’→ eS. This choice corresponds to associating an else with the closest previous then. Note that the choice S’→ ϵ would prevent e from ever being put on the stack or removed from the input, and is surely wrong. □
A nonrecursive predictive parser can be built by maintaining a stack explicitly, rather than implicitly via recursive calls. The parser mimics a leftmost derivation. If ω is the input that has been matched so far, then the stack holds a sequence of grammar symbols α such that
S *lm⇒ ωα
The table-driven parser in Fig. 4.19 has an input buffer, a stack containing a sequence of grammar symbols, a parsing table constructed by Algorithm 4.31, and an output stream. The input buffer contains the string to be parsed, followed by the endmarker $. We reuse the symbol $ to mark the bottom of the stack, which initially contains the start symbol of the grammar on top of $.
The parser is controlled by a program that considers X, the symbol on top of the stack, and a, the current input symbol. If X is a nonterminal, the parser chooses an X -production by consulting entry M [X, a] of the parsing table M.
(Additional co de could be executed here, for example, code to construct a node in a parse tree.) Otherwise, it checks for a match between the terminal X and current input symbol a.
The behavior of the parser can be described in terms of its configurations, which give the stack contents and the remaining input. The next algorithm describes how configurations are manipulated.
Algorithm 4.34: Table-driven predictive parsing.
INPUT: A string w and a parsing table M for grammar G.
OUTPUT: If ω is in L(G), a leftmost derivation of ω; otherwise, an error indication.
|
Figure 4.19: Model of a table-driven predictive parser |
METHOD: Initially, the parser is in a configuration with ω$ in the input buffer and the start symbol S of G on top of the stack, above $. The program in Fig. 4.20 uses the predictive parsing table M to produce a predictive parse for the input. □
let a be the first symbol of w ; let X be the top stack symbol; while ( X≠$ ) { /* stack is not empty */ if ( X = a ) pop the stack and let a be the next symbol of w ; else if ( X is a terminal ) error(); else if ( M [X, a] is an error entry ) error(); else if ( M [X, a] = X → Y1 Y2 … Yk ) { output the production X → Y1 Y2 … Yk; pop the stack; push Yk, Yk-1, … Y1 onto the stack, with Y1 on top; } let X be the top stack symbol; } |
Figure 4.20: Predictive parsing algorithm |
Example 4.35: Consider grammar (4.28); we have already seen it’s the parsing table in Fig. 4.17. On input id + id * id, the nonrecursive predictive parser of Algorithm 4.34 makes the sequence of moves in Fig. 4.21. These moves correspond to a leftmost derivation (see Fig. 4.12 for the full derivation):
E lm⇒ T E’ lm⇒ F T’E’ lm⇒ id T’E’ lm⇒ id E’ lm⇒ id + T E’ lm⇒ …
MATCHED |
STACK |
INPUT |
ACTION |
|
E $ |
id + id * id$ |
|
|
T E’$ |
id + id * id$ |
output E → T E’ |
|
F T’E’$ |
id + id * id$ |
output T → F T’ |
|
id T’E’$ |
id + id * id$ |
output F → id |
id |
T’E’$ |
+ id * id$ |
match id |
id |
E’$ |
+ id * id$ |
output T’→ ϵ |
id |
+ T E’$ |
+ id * id$ |
output E’→ + T E’ |
id + |
T E’$ |
id * id$ |
match + |
id + |
F T’E’$ |
id * id$ |
output T → F T’ |
id + |
id T’E’$ |
id * id$ |
output F → id |
id + id |
T’E’$ |
* id$ |
match id |
id + id |
* F T’E’$ |
* id$ |
output T’→ * F T’ |
id + id * |
F T’E’$ |
id$ |
match * |
id + id * |
id T’E’$ |
id$ |
output F → id |
id + id * id |
T’E’$ |
$ |
match id |
id + id * id |
E’$ |
$ |
output T’→ ϵ |
id + id * id |
$ |
$ |
output E’→ ϵ |
Figure 4.21: Moves made by a predictive parser on input id + id * id
Note that the sentential forms in this derivation correspond to the input that has already been matched (in column MATCHED) followed by the stack contents. The matched input is shown only to highlight the correspondence. For the same reason, the top of the stack is to the left; when we consider bottom-up parsing, it will be more natural to show the top of the stack to the right. The input pointer points to the leftmost symbol of the string in the INPUT column. □
This discussion of error recovery refers to the stack of a table-driven predictive parser, since it makes explicit the terminals and nonterminals that the parser hopes to match with the remainder of the input; the techniques can also be used with recursive-descent parsing.
An error is detected during predictive parsing when the terminal on top of the stack does not match the next input symbol or when nonterminal A is on top of the stack, a is the next input symbol, and M [A, a] is error (i.e., the parsing-table entry is empty).
Panic-mode error recovery is based on the idea of skipping over symbols on the input until a token in a selected set of synchronizing tokens app ears. Its effectiveness depends on the choice of synchronizing set. The sets should be chosen so that the parser recovers quickly from errors that are likely to occur in practice. Some heuristics are as follows:
1. As a starting point, place all symbols in FOLLOW (A) into the synchronizing set for nonterminal A. If we skip tokens until an element of FOLLOW (A) is seen and pop A from the stack, it is likely that parsing can continue.
2. It is not enough to use FOLLOW (A) as the synchronizing set for A. For example, if semicolons terminate statements, as in C, then keywords that begin statements may not app ear in the FOLLOW set of the nonterminal representing expressions. A missing semicolon after an assignment may therefore result in the keyword beginning the next statement being skipped. Often, there is a hierarchical structure on constructs in a language; for example, expressions app ear within statements, which appear within blocks, and so on. We can add to the synchronizing set of a lower-level construct the symbols that begin higher-level constructs. For example, we might add keywords that begin statements to the synchronizing sets for the nonterminals generating expressions.
3. If we add symbols in FIRST(A) to the synchronizing set for nonterminal A, then it may be possible to resume parsing according to A if a symbol in FIRST (A) app ears in the input.
4. If a nonterminal can generate the empty string, then the production deriving can be used as a default. Doing so may postpone some error detection, but cannot cause an error to be missed. This approach reduces the number of nonterminals that have to be considered during error recovery.
5. If a terminal on top of the stack cannot be matched, a simple idea is to pop the terminal, issue a message saying that the terminal was inserted, and continue parsing. In effect, this approach takes the synchronizing set of a token to consist of all other tokens.
Example 4.36: Using FIRST and FOLLOW symbols as synchronizing tokens works reasonably well when expressions are parsed according to the usual grammar (4.28). The parsing table for this grammar in Fig. 4.17 is repeated in Fig. 4.22, with 「synch」 indicating synchronizing tokens obtained from the FOLLOW set of the nonterminal in question. The FOLLOW sets for the nonterminals are obtained from Example 4.30.
The table in Fig. 4.22 is to be used as follows. If the parser looks up entry M [A, a] and finds that it is blank, then the input symbol a is skipped. If the entry is 「synch,」 then the nonterminal on top of the stack is popped in an attempt to resume parsing. If a token on top of the stack does not match the input symbol, then we pop the token from the stack, as mentioned above.
NON- TERMINAL |
INPUT SYMBOL |
|||||
id |
+ |
* |
( |
) |
$ |
|
E |
E → T E’ |
|
|
E → T E’ |
synch |
synch |
E’ |
|
E’→ +T E’ |
|
|
E’→ ϵ |
E’→ ϵ |
T |
T → F T’ |
synch |
|
T → F T’ |
synch |
synch |
T’ |
|
T’→ ϵ |
T’→ *F T’ |
|
T’→ ϵ |
T’→ ϵ |
F |
F → id |
synch |
synch |
F → (E) |
synch |
synch |
Figure 4.22: Synchronizing tokens added to the parsing table of Fig. 4.17
On the erroneous input ) id * + id, the parser and error recovery mechanism of Fig. 4.22 behave as in Fig. 4.23. □
STACK |
INPUT |
REMARK |
E $ |
) id * + id $ |
error, skip ) |
E $ |
id * + id $ |
id is in FIRST (E) |
T E’$ |
id * + id $ |
|
F T’E’$ |
id * + id $ |
|
id T’E’$ |
id * + id $ |
|
T’E’$ |
* + id $ |
|
* F T’E’$ |
* + id $ |
|
F T’E’$ |
+ id $ |
error, M [F, +] = synch |
T’E’$ |
+ id $ |
F has been popped |
E’$ |
+ id $ |
|
+ T E’$ |
+ id $ |
|
T E’$ |
id $ |
|
F T’E’$ |
id $ |
|
id T’E’$ |
id $ |
|
T’E’$ |
$ |
|
E’$ |
$ |
|
$ |
$ |
|
Figure 4.23: Parsing and error recovery moves made by a predictive parser
The above discussion of panic-mode recovery does not address the important issue of error messages. The compiler designer must supply informative error messages that not only describe the error, they must draw attention to where the error was discovered.
Phrase-level error recovery is implemented by filling in the blank entries in the predictive parsing table with pointers to error routines. These routines may change, insert, or delete symbols on the input and issue appropriate error messages. They may also pop from the stack. Alteration of stack symbols or the pushing of new symbols onto the stack is questionable for several reasons. First, the steps carried out by the parser might then not correspond to the derivation of any word in the language at all. Second, we must ensure that there is no possibility of an infinite loop. Checking that any recovery action eventually results in an input symbol being consumed (or the stack being shortened if the end of the input has been reached) is a good way to protect against such loops.