[Ecls-list] Fwd: [ecls:bugs] #206 string-lessp does not behave as string< ignoring case

Fri Sep 21 21:47:50 UTC 2012

Juan Jose Garcia-Ripoll
<jjgarcia at users.sourceforge.net> writes:

> Could you people have a look at the bug report and my answer? I
> believe I am right, though SBCL is reporting a different order and
> this (once more) confuses users
>
>     bugs:206 string-lessp does not behave as string< ignoring case
>    
>     Status: open Created: Thu Sep 20, 2012 09:43 PM UTC by Rafael
>     Jesús Alcántara Pérez Last Updated: Thu Sep 20, 2012 09:43 PM
>     UTC Owner: nobody
>    
>     Ignoring case, this two function calls should behave the same
>     («string-lessp and string-greaterp are exactly like string< and
>     string>, respectively, except that distinctions between uppercase
>     and lowercase letters are ignored. It is as if char-lessp were
>     used instead of char< for comparing characters.»):
>    
>     (string< "a_" "aa") => 1
>     
>     (string-lessp "a_" "aa") => nil
>     
>     But this is not the case. Am I missing something?
>
> STRING-LESSP is indeed implemented using CHAR-LESSP but your
> expectations about how this function should behave are wrong. Note
> that the Common Lisp standard states that CHAR-LESSP ignores case but
> it must do so not only when both characters are alphabetic, but also
> when comparing alphabetic and non-alphabetic ones.
>
> An ordering must be transitive (A < B) and (B < X) must imply (A <
> X). Take for instance #\_, #\a and #\A. We have (< #\A #\_ #\a) Now
> you only want (CHAR-LESSP #\A #\a) to change value but this destroys
> transitivity.
>
> The only way to achieve this with CHAR-LESSP is to convert the
> character first to one case and then perform the comparison, which is
> what has been done for a long time with Common Lisps.

And let me add that it may be done one way or another, depending on the
implementation.  The only thing that clhs char-lessp says about it is in
the final note:

    The manner in which case is used by char-equal, char-not-equal,
    char-lessp, char-greaterp,  char-not-greaterp, and char-not-lessp
    implies an ordering for standard characters such that A=a, B=b, and
    so on, up to Z=z, and furthermore either 9<A or Z<0.

So you can have (char-lessp #\a #\_) in one implementation
and  (char-lessp #\_ #\a) in another.

$ clall -r '(char-lessp #\a #\_)'

International Allegro CL Free Express Edition --> T
Clozure Common Lisp            --> T
CLISP                          --> T
CMU Common Lisp                --> NIL
ECL                            --> T
SBCL                           --> NIL

CMU CL and its fork SBCL seem to favor lower case, while the other
implementations seem to favor upper case.

If the correctness of your program depends on one result or another from
(string-lessp "a_" "aa"), then your program is not conforming.

Similarly for (string< "a_" "aa") by the way.  It's not specified
whether (char< #\a #\_) or not.  It's entirely up to the implementation
what it returns there.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
A bad day in () is better than a good day in {}.