[cl-unification-devel] Questions and patches involving cl-unification's MATCH* macros

Mon Jan 18 06:53:09 UTC 2010

Hello there.

thanks a lot for all the work.  I have read through the messages and I  
agree with your analysis with a few comments.

I think all the changes you propose are worth including.

Let me comment on a few of these.

On Jan 16, 2010, at 06:43 , Pixel // pinterface wrote:

> Greetings!  I was looking into improving the efficiency of MATCH- 
> CASE and
> while working on that I've ended up with a few questions regarding  
> some
> implementation decisions, and drafted up some patches both to help  
> you see
> my thoughts and for consideration of inclusion if you think they're  
> in the
> right direction.
>
> Patches are numbered in the order created, and generally depend on
> lower-numbered patches (not always intentionally--it's hard to avoid
> inter-patch dependencies when making multiple changes to a small  
> area).  I
> thought I made the patches in a reasonable order while I was making  
> them,
> but I find myself referring to them in a different sequence than  
> what I made
> them in, so apologies if that causes any confusion.
>
> I've also put up a darcs mirror (in darcs2 format) of the CVS  
> repository
> plus the attached patches, in case you find that more convenient.   
> You can
> find it at
>  http://repo.kepibu.org/cl-unification/
>
>
>
> ** Duplicate code
>
> There's a fair bit of code duplication between MATCH, MATCHF, and
> MATCHING in the form of GENERATE-VAR-BINDINGS and the production of
> the let forms that surround FORMS in those macros.  (Nevermind that
> MATCH and MATCHF are identical except in what they do with template.)

Yep.  Most of the code in the MATCH-BLOCK file is cut'n'paste.

>
> Patches 0 and 2 help somewhat.
>
> 0) Extract template handling of MATCH[ING] into %TEMPLATE-FOR-MATCH
>   Should be fairly self-explanatory; not much of a win in lines.
>
> 2) Extract the bits that wrap forms with bindings for template  
> variables
>   This patch extracts the assorted flet GENERATE-VAR-BINDINGS into a
>   single common function.  It's slightly less straightforward than  
> that,
>   however, because it also extracts the (let ...) forms which actually
>   used those bindings.
>
>   This results in a decent win for code size, but in some cases it
>   swaps the order of execution of %TEMPLATE-FOR-MATCH and
>   COLLECT-TEMPLATE-VARS.  I'm pretty sure this doesn't have any
>   noticable effect, but thorough testing is probably wise.

This is good.  However, I would set up a bunch of regression tests  
before moving on and change things.  I usually use the test suite from  
Franz, but I'd be willing to change to another test suite.

>
> ** (unify* ...) and (ignore-errors (unify ...))
>
> The MATCHING macro produces the form (ignore-errors (unify ...)),  
> presumably
> because it existed before the convenience function UNIFY*.  But I'm  
> curious
> about UNIFY* and/or the ignore-errors form in MATCHING.  Is ignore- 
> errors
> the proper form in either of these places, or was it just easier to  
> type
> than
>  (handler-case (unify ...)
>    (unification-failure () nil))
> ?

Most implementations expand IGNORE-ERRORS in something like that.  But  
I agree that (as you point out later on) that UNIFY* should be written  
as

(defmethod unify* (a b &optional (env (make-empty-environment)))
      (handler-case (unify a b env)
           (unification-failure () nil))

In this way other conditions are passed on.

>
> Looking at some of the assorted UNIFY methods, I see UNIFICATION- 
> FAILUREs
> aren't the only possible condition which might be thrown during  
> unification
> (e.g., LAMBDA-LIST-PARSING-ERROR is also a possibility).  However,  
> while
> MATCHING treats any error as a failure to unify, MATCH, MATCHF,  
> MATCH-CASE,
> and MATCHF-CASE all limit themselves to only UNIFICATION-FAILUREs.   
> This
> inconsistency leads me to wonder which is correct.
>
> I assume only UNIFICATION-FAILUREs should be treated as such in  
> patch 5
> because all ERRORs seems a bit heavy-handed in a language with such an
> excellent condition system, but can see the argument that any error / 
> is/ a
> failure to unify.
>
> 1) Use (unify* ...) rather than (ignore-errors (unify ...))
>   Same thing, so might as well use the convenience function.
>
>   Not particularly exciting.  And this bit is changed again in patch 5
>   to ignore only UNIFICATION-FAILUREs.

Yep.  See above.

> 5) Make MATCHING agree with MATCH[F][-CASE] about the conditions of  
> failure
>   Rather than skipping to the next clause on any error,
>   UNIFICATION-FAILUREs--and /only/ UNIFICATION-FAILUREs--skip to the  
> next
>   clause.
>
>   Under the assumption that UNIFY* is implemented as intended, this  
> creates
>   a new utility function UNIFY** (terrible name, I know!) which is  
> like
>   UNIFY* but ignores /only/ UNIFICATION-FAILUREs rather than all  
> ERRORs.
>
>   This of course results in a library-user-visible change to how  
> MATCHING
>   operates, which may or may not be considered acceptable.

Yep.

>
>
> ** (matching (otherwise ...))
>
> 3) Fix (matching (otherwise ...))
>   (matching (otherwise ...)) expands into (cond (otherwise ...)),  
> which
>   generates an unbound-variable error when executed, because COND does
>   not special-case OTHERWISE as CASE does.
>
>   Pretty self-explanatory and not very interesting.

Good catch.

> ** &body vs. &rest
>
> 4) Use &body instead of &rest for (arguably) prettier auto-indentation
>   I prefer the smaller indentation provided with using &body instead
>   of &rest.  Not a big deal either way, but I was in there so I
>   figured I'd throw it in the mix.  Fortunately, if you disagree,
>   this patch shouldn't affect any of the others. :)

I agree.  It is better self documentation as well.

> ** Multiple #:UNIFICATION-ENVs in MATCHING
>
> 6) Only use one variable to store the unification environment in  
> MATCHING
>   Because of the way MATCHING expands, and what UNIFY** returns, each
>     (setf #:env (unify** ...))
>   call will do one of two things: it will set #:env to NIL or it  
> will set
>   #:env to an ENVIRONMENT structure.
>
>   If #:env is set to NIL--the same value it entered the (setf)  
> with!--the
>   COND will continue on to the next clause.
>
>   If #:env is set to an ENVIRONMENT structure, none of the remaining  
> (setf)
>   clauses will be evaluated.
>
>   Thus, because the variable will only ever be set to a non-nil  
> value once,
>   this is perfectly safe.

I think this is ok as well.

>
> ** MATCH-CASE
>
> MATCH-CASE has the surprising behavior of
>  (ignore-errors
>    (match-case ("foo")
>      ("foo" (error 'unification-failure))
>      (t :default)))
>  => :default
> Hardly CASE-like behavior!
>
> I think it sensible to redefine MATCH-CASE in terms of MATCHING,  
> rather than
> MATCH.  E.g., so the above would expand into something like
>  (ignore-errors
>    (let ((#:object-var "foo"))
>      (matching (:errorp nil :default-substitution nil :matching- 
> named nil)
>        (("foo" #:object-var) (error 'unification-failure))
>        (t :default))))
> which greatly simplifies the definition of MATCH-CASE and makes it  
> much more
> CASE-like in behavior.
>
> 7) Redefine MATCH-CASE in terms of MATCHING
>   This both greatly simplifies the MATCH-CASE macro as well as its
>   expansion.
>
>   HOWEVER, this version is *NOT* 100% compatible with the previous
>   version.  Specifically, UNIFICATION-FAILUREs signalled from within
>   clause-forms will /not/ cause the next unification clause to be
>   attempted, but will instead propogate outward as the -case name
>   suggests they should.
>
>   That is,
>     (ignore-errors
>       (match-case ("foo")
>         ("foo" (error 'unification-failure ...))
>         (t :default)))
>     => :default                    ;; before patch
>     => nil, #<unification-failure> ;; after patch
>
> Patch 7 suggests that MATCHING, MATCH-CASE, MATCHF-CASE, and the  
> possible
> future MATCH-COND and MATCHF-COND are all fairly trivial variations  
> on each
> other.  I'll be happy to explore that possibility in future patches  
> if it's
> of interest, but I'm already asking you to look at too many patches  
> at once,
> so I'll refrain for now. :)

I agree with this as well.  The behavior you suggest is the one that's  
needed.

> ** MATCH
>
> Though documented, MATCH's behavior when its body forms signal
> UNIFICATION-FAILURE seems at odds with expectations, for pretty much  
> the
> same reason the behavior seems nonsensical in MATCH-CASE.  Was it a
> deliberate design decision, or a side-effect of how MATCH was  
> implemented?

Never attribute to malice what can be attributed to incompetence.  Of  
course it is a side-effect of the implementation :)  I knew I needed a  
test suite :)

>
> If a side effect of implementation, would it make sense to redefine  
> MATCH to
> have an expansion more like
>  (match ("foo" "foo" :errorp <e?> :error-value <ev>) <body>) =>
>  (m-v-b (#:env #:error) (unify** "foo" "foo")
>    (cond
>      ((and #:error <e?>) (error #:error))
>      (#:error <ev>)
>      (t (let* (#| template vars |#)
>           <body>))))
> so UNIFICATION-FAILUREs inside <body> can be independent of errorp?
>
> (No patch for this one yet.)
>
>
> ** :MATCH-NAMED, :MATCHING-NAMED, :MATCH-CASE-NAMED
>
> It seems a bit redundant to have <macro-name>-named arguments for  
> block
> names rather than simply having a :named argument as in, say, LOOP.   
> Any
> particular reason it wasn't done that way?

Having the :NAMED option would be preferable, with NIL or the macro  
name as fall-back.

> (No patch for this one, either.)
>
>
> ** Apologies
>
> Sorry about the massive brain dump.  I /may/ have gotten carried  
> away. :)

Absolutely not.  All of this is most welcome.  I am just a little  
swamped (what an euphemism!) and maybe not all that reactive.  But  
let's keep the discussion going.

all the best

Marco