[slime-devel] New wire format

Mon Nov 7 08:11:50 UTC 2011

* Hugo Duncan [2011-11-07 04:04] writes:

> On Sun, 06 Nov 2011 12:13:07 -0500, Helmut Eller
> <heller at common-lisp.net> wrote:
>
>> Counting characters was problematic, especially with Lisps that use
>> UTF16 internally (Allegro, CMUCL, JVM based Lisps).  Emacs counts the
>> length of strings in Unicode code points, while in UTF16 a single code
>> point may occupy either 1 or 2 indexes (code units) and so CL:LENGTH may
>> return something different as Emacs expected.  For the same reason we
>> can't use READ-SEQUENCE to read a specified number of code points.
>>
>> The new format looks so:
>>
>>   | byte0 | 3 bytes length |
>>   |    ... payload ...     |
>>
>> The 3 bytes length header specify the length of the payload in bytes.
>
> Is there a reason to start using a binary encoding of the message
> length?  

No deep reason.  We actually used binary encoding before we used
hex-strings.  That worked fine with latin-1 but not with utf-8.  I guess
it's just instinct; now that we explicitly work on a byte stream it's
even more natural.  Should probably have used network byte order.

> This makes the messages less easy to inspect, and less easy
> to write integration tests for.

Only marginally.  Shifting 3 bytes together is not exactly rocket since.

>> The playload is an s-exp encoded as UTF8 text.
>
> Normalising on utf-8 and counting bytes sounds like it would solve the
> original issue without changing to a binary encoding of the message
> length.

Right.  It would not be backward compatible, tho.

Helmut