The Bison -v/--verbose option

Franz Klein

Rate me:

3.75/5 (4 votes)

14 Oct 20035 min read

46.1K

419

Use the -v option to aid in debugging shift/reduce and reduce/reduce conflicts - An introduction.

Download source files - 20.8 Kb

Introduction

With Bison, the GNU version of Yacc, one is able to generate a C program that parses a given LALR(1) context-free grammar. Often when creating the grammar and running it through Bison, one gets warnings about shift/reduce and reduce/reduce conflicts. Although a functional parser is created, there is a possibility that any given text is parsed incorrectly as the parser may make incorrect choices/operations when it comes to resolving the conflicts. If you are an experienced user of Flex and Bison then stop reading and click on the Back button. If you are a beginner then continue reading.

To help resolve these conflicts, one can make use of the -v or --verbose option when invoking Bison. This results in a *.output file being created, which can be used to find and resolve these conflicts. I have looked for documentation that describes the output file in more detail but haven't been able to find anything decent. In this article I am putting forward my interpretation of the contents of the *.output file.

Just a warning!!! - The grammar found in the structgrammar.y file is very badly written but is done so that the output file can contain all the relevant sections, especially those dealing with conflicts.

Background

This wasn't what I was hoping to be my first article for Code Project. I was hoping my first article would be an introduction to one of the applications that I have been working on and off for the past three years. But at least this way I start writing. OK, enough procrastination.

I have been trying on and off to create a parser with the help of GNU tools Flex (Lex) and Bison (Yacc). My attempts have all failed so far, that is until I bought the book Lex & Yacc. Die Profitools zur lexikalischen und syntaktischen Textanalyse by Helmut Herold.

Contents of the output file

The output file provides descriptions of the parser states and can consist of up to 7 sections.

Useless non-terminals
Useless rules
Conflicts
The Grammar
List of terminals
List of non-terminals
The various parser states

Useless non-terminals

This section lists all of the non-terminals that have been defined but that are not used nor referred to in any other rules. That is to say that the non-terminal doesn't occur on the right side of any of the other rules.

Useless nonterminals:

   template

Useless rules

This section lists the rules that are not used in the definition of any other rules and is similar to the Useless non-terminals section above.

Useless rules:

#45    template :  TEMPLATE;
#46    template :  epsilon;

Conflicts

The conflicts section says where the shift/reduce conflicts and reduce/reduce conflicts occur in the various parser states, e.g.

State 43 contains 4 reduce/reduce conflicts.

The Grammar

The fourth section contains the grammar which is divided up into rules, e.g.

rule 1    class_specifier -> class_head OPENBRACE 
                                  member_specification CLOSEBRACE

while the corresponding line from the context-free grammar is as follows:

class_specifier : class_head OPENBRACE member_specification CLOSEBRACE

List of terminals

The fifth section provides a list of all terminals that have been defined in the grammar. As can be seen in the example below, the name of each terminal is followed by two numbers, one that is enclosed in brackets and one that is not.

STRUCT (258) 9

The first number, 258, refers to the number generated by the Bison program and is used in the C definition found in the structgrammar.tab.c file.

#define STRUCT = 258

while the second number, 9, is the number of the rule where the terminal is used.

rule 9    class_key -> STRUCT

List of non-terminals

The sixth section provides a list of all the non-terminals that have been defined in the grammar.

class_head (50)
    on left: 3 4 5 6 7 8, on right: 1 2

Here again there are a few numbers that are displayed. I think the number enclosed in brackets directly after the non-terminal is the index into the yytname array found in the structgrammar.tab.c file. The numbers found after on left indicate the rules where the non-terminal occurs on the left side of the rule, while the numbers found after on right indicate the rules where the non-terminal occurs on the right side of the rule.

on left:
rule 3    class_head -> class_key
rule 4    class_head -> class_key IDENTIFIER
rule 5    class_head -> class_key base_clause
rule 6    class_head -> class_key IDENTIFIER base_clause
rule 7    class_head -> class_key nested_name_specifier IDENTIFIER
rule 8    class_head -> class_key nested_name_specifier 
                                              IDENTIFIER base_clause

on right:
rule 1    class_specifier -> class_head OPENBRACE 
                                   member_specification CLOSEBRACE
rule 2    class_specifier -> class_head 
                                   OPENBRACE CLOSEBRACE

The various parser states

The seventh section contains details about each parser state. The structgrammar.output file contains a total of 219 states. A lot of information is contained in each state description. First, the rules that the parser has worked through to get to the current state are listed. Some rules listed haven't been parsed totally. How far the parser has worked through each rule is indicated by the full stop (.). The rest of the state description shows the states to which the parser will proceed or which operations will occur, depending on the value of the lookahead token. Conflicts are demarcated by square brackets.

state 43

    class_name  ->  IDENTIFIER .   (rule 17)
    template_id  ->  IDENTIFIER . LESSTHAN 
                           template_argument_list GREATERTHAN   (rule 59)
    type_name  ->  IDENTIFIER .   (rule 92)

    LESSTHAN shift, and go to state 76

    COMMA reduce using rule 17 (class_name)
    COMMA [reduce using rule 92 (type_name)]
    OPENBRACKET reduce using rule 17 (class_name)
    OPENBRACKET [reduce using rule 92 (type_name)]
    SEMICOLON reduce using rule 17 (class_name)
    SEMICOLON [reduce using rule 92 (type_name)]
    OPERATOR reduce using rule 17 (class_name)
    OPERATOR [reduce using rule 92 (type_name)]
    $default reduce using rule 92 (type_name)

As an example, state 43 (the conflicts section above, refers to state 43 having 4 reduce/reduce conflicts) is used. To reach state 43 the token IDENTIFIER has been parsed and is situated on top of the stack, as is seen in all three rules by the position of the full stop after the IDENTIFIER token. If the lookahead token is LESSTHAN then a shift operation occurs and the parser goes to state 76. If the lookahead token is COMMA then a reduce operation will occur, and it is here that a reduce/reduce conflict arises. The COMMA lookahead token can be reduced either by rule 17 or rule 92 and this is where the conflict arises, the parser doesn't know which rule to use in the reduction. The following is a guess but I think that the line beginning with $default indicates the action taken when the lookahead token doesn't match any of the given tokens. An example of a shift/reduction conflict as it appears in the state description is as follows:

state 79

    ...

    IDENTIFIER shift, and go to state 70

    IDENTIFIER [reduce using rule 14 (nested_name_specifier)]

If the lookahead token is IDENTIFIER then either a shift or reduce operation can occur. The conflict here again is that the parser doesn't know which operation to perform.

Comments

I will gladly appreciate any comments on this article.

History

October 2003 - First public release.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Written By

Franz Klein

United Kingdom

I am a qualified Veterinary Surgeon who prefers treating computers with viruses than animals with viruses. I have recently completed a MEng German Informatics degree at the University of Reading with a 2:1. I also have the ISEB Foundation Certificate in Software Testing.

Currently I am umemployed and desparately looking for a job in the IT industry.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.