[climacs-devel] prolog syntax

Mon Mar 28 16:18:45 UTC 2005

Robert Strandh <strandh at labri.fr> writes:

>  > This is not really true in Prolog's case; the obvious case is the
>  > bracketed /* */ comment, but quoted tokens likewise can contain
>  > whitespace.  (These cases are analogous, I think, to Common
>  > Lisp's #| |# comment and |Symbol With Spaces| -- one might argue
>  > over whether the comment is one lexeme or several -- or none --
>  > but I think it's clear that
>  > |Symbol With Spaces| is exactly one token).
>
> The obvious solution to this problem is to have grammar rules that
> make language tokens (that may contain whitespace) out of lexemes
> (that may not). 

It may be obvious to you, but I'm not sure it's obvious to me... how
do you get the whitespace into the language tokens if they're not in
the lexemes?  You'd have to go back to the buffer contents?

>  > The incremental lexer that I, erm, cut'n'pasted from html-syntax.lisp
>  > doesn't cut it for Prolog, for related reasons: the damaged region
>  > from deleting a quoting character is extensive.  I think I can fix
>  > this -- it's similar to a problem I've already dealt with in my
>  > Tabcode parser.
>
> That is a problem only if one sees a literal character string as a
> token.  A better view would be to see the quote character as a lexeme
> and the character string as a token produced by grammar rules. 

Possibly.  I'm slightly reluctant to go down this route for a number
of reasons, most importantly that of maintainability: at the moment,
my Prolog grammar and lexer are more-or-less direct transcriptions of
the grammar and token syntax in the Prolog standard (modulo the fact
that I've been lazy in the lexer with respect to some of the details).

As you know, this Prolog work doesn't exist in a vacuum; I am hoping
to hook this work into an adaptation of Peter Norvig's Prolog
implementation from _Paradigms of Artificial Intelligence
Programming_.  This means that the lexer and parser can't be simply
motivated by getting glyphs on the screen: and in particular I think
that keeping whitespace characters of language tokens separate will
cause more problems in terms of the multiple uses of this parser than
it will solve; consider the difficulty of a line comment (Prolog's
introduced by #\%, Common Lisp's by #\;), if whitespace is ignored by
the lexer: how will the grammar be able to tell when a line-oriented
comment ends?

In any case, if it is the case that there is no absolutely hardwired
assumption that whitespace cannot be part of tokens, I shall attempt
to see what temporary problems can be fixed.

>  > For a variety of reasons, I think, the incremental redisplay stuff
>  > doesn't work.  I'm not convinced I understand why. :-/ I encourage
>  > interested parties to have a play with Set Syntax SPC Prolog, and see
>  > if they can characterize bugs, problems.
>
> Are you talking about McCLIM here?  There are known problems in McCLIM
> incremental redisplay.  For one thing, it displays each output record
> twice, once with a clipping region and once without. 

Well, I was getting problems that seemed to be more climacs-related;
errors relating to *CURSOR-POSITIONS* array access in
HANDLE-WHITESPACE: something in the display code wanted to index off
the end of that array.  I couldn't reproduce this in HTML syntax,
though, and nor could I reproduce in HTML syntax the observation that,
with all the updating-output enabled, some of my tokens disappeared at
various points. :-/ Also, I believe that one of the problems I saw was
a result of one of my lexemes starting in the displayed window and
ending off it, but I haven't investigated further.

Cheers,

Christophe