[elephant-devel] new elephant and unicode troubles

Henrik Hjelte henrik at evahjelte.com
Sun Feb 25 09:43:13 UTC 2007


Hi Ties! Nothing beats a sunday morning bughunt!
Found the solution for you, see below.

Cheers, 
Henrik Hjelte

Add this testcase to testserializer.lisp:
(Sorry the code indentation looks funny when attached to an email)

(deftest hard-strings
    (are-not-null
     (in-out-equal (format nil "Mot~arhead is a hard rock
band." (code-char 246)))
     (in-out-equal (format nil "M~atley cr~ae is a hard string and was a
hard rock band." (code-char 246)
                 (char-code 252))))
  t t)

Had to change serialize-string this in unicode2.lisp to
look like this. Apparantly the term utf8 in this file has nothing at all
to do with utf8, rather it means a string of ascii chars. So
serialize-to-utf8 returns nil when it finds a code>127. Then it should
continue trying with two-byte char strings, which was not done in the
existing cvs version. 

(defun serialize-string (string bstream)
  "Try to write each format type and bail if code is too big"
  (or (serialize-to-utf8 string bstream)
      (serialize-to-utf16le string bstream)
      (serialize-to-utf32le string bstream)))

Old buggy version:
;;(defun serialize-string (string bstream)
;;  "Try to write each format type and bail if code is too big"
;;(declare (type buffer-stream bstream)
;;         (type string string))
;;  (cond ((and (not (string= "" string)) (< (char-code (char string 0))
#x7F))
;;	 (serialize-to-utf8 string bstream))
;;	;; Accelerate the common case where a character set is not Latin-1
;;	((and (not (string= "" string)) (< (char-code (char string 0))
#xFFFF))
;;	 (serialize-to-utf16le string bstream))
;;	;; Actually code pages > 0 are rare; so we can pay an extra cost
;;	(t (or (serialize-to-utf8 string bstream)
;;	       (serialize-to-utf16le string bstream)
;;	       (serialize-to-utf32le string bstream)))))




On Sun, 2007-02-25 at 00:50 +0100, Ties Stuij wrote:
> with the cvs elephant on sbcl on linux with bdb, with all tests
> passed, the following code:
> 
> (defclass crocodile ()
> ((belly :accessor belly-of :initform "järv"))
>   (:metaclass persistent-metaclass))
> 
> (defparameter *ben* (make-instance 'crocodile))
> 
> (belly-of *ben*)
> 
> gives:
> 
> deserialize of object tagged with 188 failed
> 
> as an error, which comes from %deserialize, from deserialize in
> serialize2.lisp. A string with 'safe' characters though is properly
> recognized as utf-8. The 188 can also be 132 or another value. The 6.1
> checkout renders the same result but i must say i did like the error
> message 'deserialize fubar!' more. Ideas?
> 
> Greets,
> Ties
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel
> 
> 




More information about the elephant-devel mailing list