[armedbear-ticket] [armedbear] #65: UTF-32 strings support

armedbear armedbear-devel at common-lisp.net
Mon Mar 11 22:24:37 UTC 2013


#65: UTF-32 strings support
------------------------+---------------------------------------------------
 Reporter:  ehuelsmann  |       Owner:  nobody
     Type:  defect      |      Status:  new   
 Priority:  major       |   Milestone:  1.2.0 
Component:  libraries   |     Version:  1.1.0 
 Keywords:              |  
------------------------+---------------------------------------------------

Comment(by ehuelsmann):

 On #lisp, pjb writes on this subject:

 ... you must be careful that in most CL implementations, characters are
 unicode characters (not even code-points in a number of implementations!),
 and therefore we are talking of real strings of characters (32-bit each
 usually), not vector of utf-8 bytes. (For some things, you may need to
 deal with vectors of bytes instead of strings, and there, lisp macros and
 reader macros can come handy to ease manipulations of those vectors of
 bytes that usually represent ASCII or UTF-8 encoded characters).

 Where I ask:[[BR]]
 pjb: how's that possible? Some far-east "characters" will consist of
 multiple code points, with up to 6 or 7 "modifier" code points; how can
 all that fit into 32-bits, if each code point is 21-bit in itself?

 and pjb answers:[[BR]]
 ehu: that's what I mean, some implementation may choose to represent those
 characters as a pointer to a sequence of code points.

-- 
Ticket URL: <http://trac.common-lisp.net/armedbear/ticket/65#comment:6>
armedbear <http://common-lisp.net/project/armedbear>
armedbear


More information about the armedbear-ticket mailing list