[cl-typesetting-devel] Re: [cl-pdf-devel] Character encoding?

Dmitriy Ivanov divanov at aha.ru
Wed Feb 27 11:30:21 UTC 2008


Peter Seibel wrote on Tue, 26 Feb 2008 09:58:03 -0800 20:58:

|> Relying on an encoding only does not suffice. An additional notion of
|> _charset_ is for this.
|
| Okay. What's a charset then? Can you perhaps lay out a quick map of the
| concepts and related terms as used in the cl-pdf/cl-typesetting source?

I have provided some explanation in my post "Mapping useful Unicode
characters to single-byte-encoding" recently.

In brief, charset is either an atom passed to some implementation-dependent
converter a la CHAR-EXTERNAL-CODE or an alist used to retrieve corresponding
codes via assoc. The charset "value" is returned by the charset generics
applied to an encoding object.

|> In the latest revision, get-char-metrics and write-to-page do call
|> char-external-code on SBCL. As far as I can guess, SBCL itself lacks
|> an internal machinery for implementing char-external-code.
|
| Okay, so it seems that my immediate problem would be fixed fairly
| simply by applying the attached patch which augments
| *char-single-byte-codes* to include mappings for all the characters
| that exist in cp-1252 but with different numeric values than the
| corresponding Unicode code points.

Yes, it should work for you in its simplest. But see below...

| I'm not sure that that variable, or rather the way it is used, is
| actually 100% right. For instance a single-byte font that uses an
| encoding (or whatever you want to call it) other than cp-1252 probably
| needs a different mapping. Practically speaking, such fonts may simply
| not exist. The Unicode folks provide a set of mappings here:
|
|    ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/
|
| which is where I got the information about Unicode -> CP-1252.

The better solution would be introducing a custom encoding, named for
example, "Win1252Encoding", and specifying a charset for it as follows.

#+sbcl
(defparameter *sbcl-win-1252-charset*
  (append
   '((#.(code-char #x0152) . #x8C)   ; LATIN_CAPITAL_LIGATURE_OE
     (#.(code-char #x0153) . #x9C)   ; LATIN_SMALL_LIGATURE_OE
     ...)
   *char-single-byte-codes*))

(defparameter *win-1252-encoding*
  (make-instance 'pdf::custom-encoding
   :name "Win1252Encoding"
   :keyword-name :win-1252-encoding
   :base-encoding :standard-encoding
   :charset #-sbcl :1252 #+sbcl *sbcl-win-1252-charset*
  ...)

Then, (setf *default-encoding* *win-1252-encoding*) or specify it explicitly
in get-font calls and so on.
--
Sincerely,
Dmitriy Ivanov
lisp.ystok.ru






More information about the cl-typesetting-devel mailing list