[cl-typesetting-devel] Re: [cl-pdf-devel] Character encoding?

Peter Seibel peter at gigamonkeys.com
Tue Feb 26 05:48:25 UTC 2008


Marc Battyani wrote:
> Attila Lendvai wrote:
>>>  But my basic point is that cl-typesetting and/or cl-pdf should know what
>>>  encoding the Lisp is using (i.e. how should one interpret the values
>>>  returned by CHAR-CODE) and should know how to map those to the numeric
>>>     
>> the proper fix would be to refactor cl-pdf to write into binary
>> streams and do the character encoding itself (i'd use babel, but Marc
>> would prefer no external dependency). i've done that once (the branch
>> is still laying around on my harddrive), but after a day of work i
>> gave up. it produced a pdf that almost worked (the toc could display
>> unicode text) but i made a mistake somewhere in the process and it
>> produced corrupt files. as i don't know a bit about the pdf file
>> format, i gave up instead of debugging it.
>>   
> I think Peter is right here. It's a cl-typesetting issue and not a
> cl-pdf one because he wants to substitute another character that will
> result in the same glyph in the current selected font. So it's not an
> encoding problem and in fact other substitutions, such as ligatures for
> instance, would be useful.

Actually I now I'm thinking it *is* a cl-pdf issue. Assume for the 
moment that I'm using a Unicode lisp. (I.e. one whose CHAR-CODE returns 
Unicode code points.) I should be able to use cl-pdf directly and have 
it render  properly. That PDF under the covers encodes characters using 
octets that are really indices into an array that's part of the font 
should not be an issue I have to deal with.

And looking a bit at the cl-pdf code I see something that looks like 
it's sort of trying to do this--CHAR-EXTERNAL-CODE. But it also seems 
that that isn't always called. (For instance, never on SBCL in 
GET-CHAR-METRICS). Maybe it should be.

Then if cl-pdf assumes that all characters and strings it gets are or 
are made up of Unicode characters, then it seems there are a just a few 
places where it can convert the Unicode code-point to a code-point that 
can be used with the current font: get-char-metrics, show-char, and 
show-text may be it but I haven't done a careful check. It's a bit hinky 
that under the covers cl-pdf just converts them to different Lisp 
characters and then counts on using an 8-bit clean character encoding 
when writing the file but that's just an implementation detail at a 
level below the one I'm talking about.

Obviously we made this change to cl-pdf then cl-typesetting wouldn't 
have to worry about it at all.

Finally, to generalize a bit, for Lisps that don't use Unicode code 
points, cl-pdf should likewise know how to map from whatever character 
encoding they do use to the encoding used within a PDF file.

-Peter

-- 
Peter Seibel                     : peter at gigamonkeys.com
A Billion Monkeys Can't be Wrong : http://www.gigamonkeys.com/blog/
Practical Common Lisp            : http://www.gigamonkeys.com/book/
Coders at Work                   : http://www.codersatwork.com/



More information about the cl-typesetting-devel mailing list