[slime-devel] CMUCL unicode strings breaks slime

Helmut Eller heller at common-lisp.net
Fri Oct 1 10:35:25 UTC 2010


* Raymond Toy [2010-10-01 10:20] writes:

>> What is the length of *s* or (prin1-to-string *s*) now?
>> Should it be 3 not 4?
>
> Good question.  The answer now is 4, not 3.  There are 4 code units in
> the string, so that is the length.  Length would be really slow if it
> had to scan the whole string looking for surrogate pairs and counting
> them as one instead of two.
>
> Is that the reason for the problem?  Confusion between emacs and lisp on
> the length of the string?  It does appear that the string only has 3
> characters, as displayed by emacs.

Very likely, Emacs uses something like utf-8 internally and counts code points
not code units (expect for line endings which is probably a different
issue).

> Doesn't acl have this problem too?  It also uses 16-bit strings like
> cmucl.

Allegro has no lisp:codepoint function and (code-char #x10000) 
returns nil.  Similar situation in ABCL just that it returns #\null.

In Java, strings have a length method which returns code units and a
codePointCount method for the other use.  Maybe CMUCL has something like
that and we should use it in SWANK.

Helmut





More information about the slime-devel mailing list