From attila.lendvai at gmail.com Sun Sep 7 18:39:59 2008 From: attila.lendvai at gmail.com (Attila Lendvai) Date: Sun, 7 Sep 2008 20:39:59 +0200 Subject: [babel-devel] Re: enc- -> encoding- rename? In-Reply-To: References: Message-ID: > there are a million different ways to abbreviate something, so i > propose to rename the enc-* accessors of character-encoding to > encoding-*. > > the following dependencies have been identified that need to be changed: > - cffi > - cxml > - iolib hm, we could add a compiler-macro that converts to the new accessors and issues a deprecation style-warning at compile time. not that it's so important, but i've found this old mail laying around here and i've thought i share this idea i had recently. -- attila From attila.lendvai at gmail.com Sun Sep 7 18:40:55 2008 From: attila.lendvai at gmail.com (Attila Lendvai) Date: Sun, 7 Sep 2008 20:40:55 +0200 Subject: [babel-devel] string-to-octets optimization for base-string's Message-ID: Luis, please take a look at the attached patch as time permits. it optimizes string-to-octets when called with simple-base-string's but unfortunately it compiles with warnings which stops asdf:load-op... the problem is with calling instantiate-concrete-mappings with simple-base-string. do you have any ideas how to handle this? i have this patch hanging around locally for quite some time now... -- attila -------------- next part -------------- A non-text attachment was scrubbed... Name: babel-string-to-octets-optimization.patch Type: text/x-diff Size: 18197 bytes Desc: not available URL: From luismbo at gmail.com Mon Sep 8 11:03:57 2008 From: luismbo at gmail.com (=?ISO-8859-1?Q?Lu=EDs_Oliveira?=) Date: Mon, 8 Sep 2008 12:03:57 +0100 Subject: [babel-devel] Re: enc- -> encoding- rename? In-Reply-To: References: Message-ID: <391f79580809080403u47d8937ay22a0eca7b6988317@mail.gmail.com> On Sun, Sep 7, 2008 at 7:39 PM, Attila Lendvai wrote: > hm, we could add a compiler-macro that converts to the new accessors > and issues a deprecation style-warning at compile time. Ah, I like that idea. I guess a regular (generic) function would also need to be added since compiler-macros are not guaranteed to be used. What's the best way to do this with generic functions? (setf (fdefinition 'old-name) #'new-name) ;? ;; then use a compiler macro here to issue the style-warning. -- Lu?s Oliveira http://student.dei.uc.pt/~lmoliv/ From luismbo at gmail.com Mon Sep 8 11:56:35 2008 From: luismbo at gmail.com (=?ISO-8859-1?Q?Lu=EDs_Oliveira?=) Date: Mon, 8 Sep 2008 12:56:35 +0100 Subject: [babel-devel] string-to-octets optimization for base-string's In-Reply-To: References: Message-ID: <391f79580809080456m1d3421edyac8b8c43289f213a@mail.gmail.com> 2008/9/7 Attila Lendvai : > the problem is with calling instantiate-concrete-mappings with > simple-base-string. And they say Lisp has no static typing! :-) Actually it only complains for UTF-8B. It should probably signal an error for every other non-ascii (or non latin-1?) encoding, I think. The problem is that you're defining simple-base-string octet decoders which will try to fit unicode characters into simple-base-strings. SBCL caught this for the UTF-8B decoder, but it's probably an issue for other encodings as well. More importantly, this is an issue for Allegro, Lispworks, etc, because they have 16-bit chars and I don't think babel is dealing with this properly. Oh, and I really need to make Babel more debuggable... *sigh* > do you have any ideas how to handle this? i have this patch hanging > around locally for quite some time now... I suggest adding a simple option to instantiate-concrete-mappings to make it not generate the code-point-counters and decoders, which you don't need. -- Lu?s Oliveira http://student.dei.uc.pt/~lmoliv/ From attila.lendvai at gmail.com Tue Sep 9 20:42:11 2008 From: attila.lendvai at gmail.com (Attila Lendvai) Date: Tue, 9 Sep 2008 22:42:11 +0200 Subject: [babel-devel] a quick speed test Message-ID: thought i share it: it's 10000 octets-to-string call on the 14k long tests/utf-8.txt, using a recent sbcl x86 64. (sb-ext:octets-to-string octets :external-format :utf-8) Evaluation took: 11.451 seconds of real time 11.336708 seconds of total run time (11.192699 user, 0.144009 system) [ Run times consist of 0.692 seconds GC time, and 10.645 seconds non-GC time. ] 99.00% CPU 20,560,951,683 processor cycles 1,927,564,096 bytes consed (trivial-utf-8:utf-8-bytes-to-string octets): Evaluation took: 7.381 seconds of real time 7.300456 seconds of total run time (7.248453 user, 0.052003 system) [ Run times consist of 0.564 seconds GC time, and 6.737 seconds non-GC time. ] 98.90% CPU 13,252,911,624 processor cycles 1,524,620,560 bytes consed (babel:octets-to-string octets :encoding :utf-8 :errorp t) Evaluation took: 3.173 seconds of real time 3.144197 seconds of total run time (3.112195 user, 0.032002 system) [ Run times consist of 0.120 seconds GC time, and 3.025 seconds non-GC time. ] 99.09% CPU 5,697,700,848 processor cycles 305,120,336 bytes consed babel with instantiating the encodings using (safety 0) Evaluation took: 2.405 seconds of real time 2.380149 seconds of total run time (2.356148 user, 0.024001 system) [ Run times consist of 0.208 seconds GC time, and 2.173 seconds non-GC time. ] 98.96% CPU 4,318,993,638 processor cycles 305,120,000 bytes consed (deftest x () (let* ((*default-character-encoding* :utf-8) (octets (with-open-file (in (asdf:system-relative-pathname :babel "tests/utf-8.txt") :element-type '(unsigned-byte 8)) (let* ((data (loop for byte = (read-byte in nil nil) until (null byte) collect byte))) (make-array (length data) :element-type '(unsigned-byte 8) :initial-contents data))))) (cl:time (loop repeat 10000 do (octets-to-string octets :encoding :utf-8 :errorp t) ;;(trivial-utf-8:utf-8-bytes-to-string octets) ;;(sb-ext:octets-to-string octets :external-format :utf-8) )))) -- attila