[cl-typesetting-devel] Re: [cl-pdf-devel] Character encoding?

Tue Feb 26 17:58:03 UTC 2008

Dmitriy Ivanov wrote:
> Peter Seibel wrote on Mon, 25 Feb 2008 21:48:25 -0800 08:48:
> 
> | Actually I now I'm thinking it *is* a cl-pdf issue. Assume for the
> | moment that I'm using a Unicode lisp. (I.e. one whose CHAR-CODE returns
> | Unicode code points.) I should be able to use cl-pdf directly and have
> | it render  properly. That PDF under the covers encodes characters using
> | octets that are really indices into an array that's part of the font
> | should not be an issue I have to deal with.
> 
> Relying on an encoding only does not suffice. An additional notion of
> _charset_ is for this.

Okay. What's a charset then? Can you perhaps lay out a quick map of the 
concepts and related terms as used in the cl-pdf/cl-typesetting source?

> | And looking a bit at the cl-pdf code I see something that looks like
> | it's sort of trying to do this--CHAR-EXTERNAL-CODE. But it also seems
> | that that isn't always called. (For instance, never on SBCL in
> | GET-CHAR-METRICS). Maybe it should be.
> |
> | Then if cl-pdf assumes that all characters and strings it gets are or
> | are made up of Unicode characters, then it seems there are a just a few
> | places where it can convert the Unicode code-point to a code-point that
> | can be used with the current font: get-char-metrics, show-char, and
> | show-text may be it but I haven't done a careful check. It's a bit
> | hinky that under the covers cl-pdf just converts them to different Lisp
> | characters and then counts on using an 8-bit clean character encoding
> | when writing the file but that's just an implementation detail at a
> | level below the one I'm talking about.
> |
> | Obviously we made this change to cl-pdf then cl-typesetting wouldn't
> | have to worry about it at all.
> 
> In the latest revision, get-char-metrics and write-to-page do call
> char-external-code on SBCL. As far as I can guess, SBCL itself lacks an
> internal machinery for implementing char-external-code.

Okay, so it seems that my immediate problem would be fixed fairly simply 
by applying the attached patch which augments *char-single-byte-codes* 
to include mappings for all the characters that exist in cp-1252 but 
with different numeric values than the corresponding Unicode code points.

I'm not sure that that variable, or rather the way it is used, is 
actually 100% right. For instance a single-byte font that uses an 
encoding (or whatever you want to call it) other than cp-1252 probably 
needs a different mapping. Practically speaking, such fonts may simply 
not exist. The Unicode folks provide a set of mappings here:

   ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/

which is where I got the information about Unicode -> CP-1252.

-Peter

-- 
Peter Seibel                     : peter at gigamonkeys.com
A Billion Monkeys Can't be Wrong : http://www.gigamonkeys.com/blog/
Practical Common Lisp            : http://www.gigamonkeys.com/book/
Coders at Work                   : http://www.codersatwork.com/
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patch
URL: <https://mailman.common-lisp.net/pipermail/cl-typesetting-devel/attachments/20080226/2e33be3c/attachment.ksh>