next up previous contents
Next: A Top-Down Parser Up: Parsing Previous: Context Free Grammars

Simple Parsing Methods

Parsing methods can be divided into top-down methods and bottom-up methods. These are illustrated schematically in figure 2.2, for a tiny grammar of English. The illustration for bottom up parsing starts with the sentence itself, and uses the grammar rules to determine how combinations of words can be rewritten as more abstract grammatical categories (e.g., ``The dog'' is a noun phrase). The parse succeeds if this rewriting process concludes with the whole sentence rewritten to the start symbol (sentence). Top down parsing, on the other hand, starts with this start symbol, and rewrites/expands symbols until it finally gets down to a sequence matching the sequence being parsed.

  figure1848
Figure 5.2: Top Down and Bottom up Parses, for a fragment of English

Both top-down and bottom-up parsing are based on the idea of repeatedly modifying a sequence of symbols (based on allowed rewrites in the grammar rules), with the aim of getting from a sequence consisting of just the start symbol, to our sequence to be parsed. This can be illustrated as follows:

Top Down

(Sentence)                       apply rule 1
(NounPhrase VerbPhrase)          apply rule 2
(Article Noun VerbPhrase)        apply rule 4
(the Noun VerbPhrase)            apply rule 5
(the dog VerbPhrase)             apply rule 3
(the dog Verb)                   apply rule 6
(the dog jumps)                  SUCCEEDS

Bottom up

(the dog jumps)                  apply rule 6
(the dog Verb)                   apply rule 3
(the dog VerbPhrase)             apply rule 5
(the Noun VerbPhrase)            apply rule 4
(Article Noun VerbPhrase)        apply rule 2
(NounPhrase VerbPhrase)          apply rule 1
(Sentence)                       SUCEEDS

Now, of course, the grammar in the figure is a bit limited! In fact, it can ONLY recognise the sentence 'The dog jumps'. There are no options or alternatives. In any real grammar there will be many alternative structures to explore, and that's where the difficulty in parsing arises. This is particularly a problem when parsing natural language. For artificial languages such as programming languages it is not so bad, but we still need ways of choosing between alternatives. For example, if we have the rule:

<factor>     ::= ( <expression> ) | v

and we are doing a top-down parse, how do we know whether to rewrite <factor> to ( <expression> ) or to v (letter/digit). For programming languages, it is often possible to do this by looking ahead one character. In the above rule, if the next character in the sequence you are parsing is '(' then the first option is correct, otherwise it is the second option. Often, in order to make this one-char lookahead work, the grammar must be modified into an equivalent, but easier to process form.





Alison Cawsey
Fri Aug 28 16:25:58 BST 1998