[climacs-devel] grammar definition

Robert Strandh strandh at labri.fr
Thu Jan 12 05:18:00 UTC 2006


Hello Nicolas, 

Sorry about the late answer.  I am trying to cope with some email
backlog, and I am now just staring to catch up. 

Nicolas Sceaux writes:
 > 
 > I've been experimenting a bit with various ways of defining a grammar
 > like lisp-syntax' one for the LR parser, but I'm not really satisfied and
 > I'm sure gurus here have some better thoughts about it.
 > 
 >   (define-grammar lisp-syntax (some options)

You may want to start with a protocol (in the sense of Keene) for
manipulating grammars; then you can define syntax to simplify common
cases.  For instance, you most likely would want the grammar to be
able to evolve programmatically.  There should be some examples of
`add-rule' in the other syntax modules that show this idea. 

 >     ;; Expand to define-parser-state
 >     ;; (<state> => <specialized-state>)
 > 
 >     ;; Expand to define-new-parser-state:
 >     ;; (<new-state> => <state> <parse-tree>)
 >   
 >     ;; Expand to defclass of nonterminals:
 >     ;; (<nonterminal> -> <specialized-nonterminal>)
 > 
 >     ;; Expand to define-action:
 >     ;; (<nonterminal> -> <state> <lexeme> <how-to-reduce>)
 >     
 >     ;;;;;;; List
 >     (form               -> list-form)
 >     (list-form          -> complete-list-form)
 >     (complete-list-form -> |( form* ) | t
 >      :reduce-until-type left-parenthesis-lexeme)
 > 
 >     (lexer-list-state     => |( form* |)
 >     (form-may-follow      => |( form* |)
 >     (parser-state         => |( form* ) |)
 >     (lexer-toplevel-state => |( form* ) |)
 >     (|( form* |   => form-may-follow left-parenthesis-lexeme)
 >     (|( form* |   => |( form* | form)
 >     (|( form* |   => |( form* | comment)
 >     (|( form* ) | => |( form* | right-parenthesis-lexeme)
 > 
 >     (list-form             -> incomplete-list-form)
 >     (incomplete-form-mixin -> incomplete-list-form)
 >     (incomplete-list-form  -> |( form* | nil 
 >      :reduce-until-type left-parenthesis-lexeme))
 >   
 > But I'm not sure what kind of utility people expect. I'd be glad to be
 > put on the right track.

I am actually not too sure myself what to do about this, and I am not
sure I follow your notation.  On the one hand, there is standard
notation for specifying grammar rules (that are pretty much followed
in the syntax modules that use the Earley parser, but it has been
extended).  On the other hand, that notation does not allow for
specifying lexer states and parser states.

Perhaps a compromise would be to specify a rule pretty much the
standard way, but to allow for a state name to be inserted between two
parser symbols, for instance:

   (list -> left-parenthesis-lexeme {the-state-we-want-to-be-in} ...)

or something like that.  

This kind of practice will sometimes create conflicts when you submit
such rules to an LR parser generator.  For that reason, I would like
to use a Tomita parser instead, which can deal with such conflicts.
Again, on the other hand, I don't know how to make a Tomita parser
incremental, but I guess it is a straightforward (albeit messy)
extension of the incremental LR algorithm.  

I am sorry that I can't be of much more help here and now.  Parsing is
still an important issue that is on my back burner (and I have a
near-finished implementation of a Tomita parser), but I make very slow
progress.

Take care, 
-- 
Robert Strandh

---------------------------------------------------------------------
Greenspun's Tenth Rule of Programming: any sufficiently complicated C
or Fortran program contains an ad hoc informally-specified bug-ridden
slow implementation of half of Common Lisp.
---------------------------------------------------------------------



More information about the climacs-devel mailing list