[cl-typesetting-devel] Re: [cl-pdf-devel] Character encoding?
Dmitriy Ivanov
divanov at aha.ru
Wed Feb 27 11:30:21 UTC 2008
Peter Seibel wrote on Tue, 26 Feb 2008 09:58:03 -0800 20:58:
|> Relying on an encoding only does not suffice. An additional notion of
|> _charset_ is for this.
|
| Okay. What's a charset then? Can you perhaps lay out a quick map of the
| concepts and related terms as used in the cl-pdf/cl-typesetting source?
I have provided some explanation in my post "Mapping useful Unicode
characters to single-byte-encoding" recently.
In brief, charset is either an atom passed to some implementation-dependent
converter a la CHAR-EXTERNAL-CODE or an alist used to retrieve corresponding
codes via assoc. The charset "value" is returned by the charset generics
applied to an encoding object.
|> In the latest revision, get-char-metrics and write-to-page do call
|> char-external-code on SBCL. As far as I can guess, SBCL itself lacks
|> an internal machinery for implementing char-external-code.
|
| Okay, so it seems that my immediate problem would be fixed fairly
| simply by applying the attached patch which augments
| *char-single-byte-codes* to include mappings for all the characters
| that exist in cp-1252 but with different numeric values than the
| corresponding Unicode code points.
Yes, it should work for you in its simplest. But see below...
| I'm not sure that that variable, or rather the way it is used, is
| actually 100% right. For instance a single-byte font that uses an
| encoding (or whatever you want to call it) other than cp-1252 probably
| needs a different mapping. Practically speaking, such fonts may simply
| not exist. The Unicode folks provide a set of mappings here:
|
| ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/
|
| which is where I got the information about Unicode -> CP-1252.
The better solution would be introducing a custom encoding, named for
example, "Win1252Encoding", and specifying a charset for it as follows.
#+sbcl
(defparameter *sbcl-win-1252-charset*
(append
'((#.(code-char #x0152) . #x8C) ; LATIN_CAPITAL_LIGATURE_OE
(#.(code-char #x0153) . #x9C) ; LATIN_SMALL_LIGATURE_OE
...)
*char-single-byte-codes*))
(defparameter *win-1252-encoding*
(make-instance 'pdf::custom-encoding
:name "Win1252Encoding"
:keyword-name :win-1252-encoding
:base-encoding :standard-encoding
:charset #-sbcl :1252 #+sbcl *sbcl-win-1252-charset*
...)
Then, (setf *default-encoding* *win-1252-encoding*) or specify it explicitly
in get-font calls and so on.
--
Sincerely,
Dmitriy Ivanov
lisp.ystok.ru
More information about the cl-typesetting-devel
mailing list