From vseloved at gmail.com Wed Aug 4 14:07:18 2010 From: vseloved at gmail.com (Vsevolod Dyomkin) Date: Wed, 4 Aug 2010 17:07:18 +0300 Subject: [babel-devel] question about #\Nul char and Unicode Message-ID: Hi, I'm stuck with a problem: I'm using CL-ZMQ, that in turn uses CFFI, that in turn uses BABEL for such tasks as FOREIGN-STRING-TO-LISP conversion. There seams to be a problem with 0 (#\Nul) characters for such strings, which can be seen below: Illegal :UTF-8 character starting at position 328. [Condition of type BABEL-ENCODINGS:INVALID-UTF8-CONTINUATION-BYTE] Restarts: ... Backtrace: 0: ((LAMBDA (BABEL-ENCODINGS::SRC BABEL-ENCODINGS::START BABEL-ENCODINGS::END BABEL-ENCODINGS::DEST BABEL-ENCODINGS::D-START)) ..) 1: (CFFI:FOREIGN-STRING-TO-LISP #.(SB-SYS:INT-SAP #X0808E13C))[:EXTERNAL] ... The translated string in the current example is this: #(#\5 #\4 #\c #\6 #\7 #\5 #\5 #\b #\- #\9 #\6 #\2 #\8 #\- #\4 #\0 #\a #\4 #\- #\9 #\a #\2 #\d #\- #\c #\c #\8 #\2 #\a #\8 #\1 #\6 #\3 #\4 #\5 #\e #\ #\1 #\8 #\ #\/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #\ #\2 #\6 #\0 #\Space #\{ #\" #\P #\A #\T #\H #\" #\Space #\" #\/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #\" #\, #\" #\M #\E #\T #\H #\O #\D #\" #\Space #\" #\G #\E #\T #\" #\, #\" #\V #\E #\R #\S #\I #\O #\N #\" #\Space #\" #\H #\T #\T #\P #\/ #\1 #\. #\1 #\" #\, #\" #\U #\R #\I #\" #\Space #\" #\/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #\" #\, #\" #\P #\A #\T #\T #\E #\R #\N #\" #\Space #\" #\/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #\" #\, #\" #\A #\c #\c #\e #\p #\t #\" #\Space #\" #\* #\/ #\* #\" #\, #\" #\H #\o #\s #\t #\" #\Space #\" #\l #\o #\c #\a #\l #\h #\o #\s #\t #\Space #\6 #\7 #\6 #\7 #\" #\, #\" #\U #\s #\e #\r #\- #\A #\g #\e #\n #\t #\" #\Space #\" #\c #\u #\r #\l #\/ #\7 #\. #\2 #\0 #\. #\0 #\ #\( #\i #\4 #\8 #\6 #\- #\p #\c #\- #\l #\i #\n #\u #\x #\- #\g #\n #\u #\) #\ #\l #\i #\b #\c #\u #\r #\l #\/ #\7 #\. #\2 #\0 #\. #\0 #\ #\O #\p #\e #\n #\S #\S #\L #\/ #\0 #\. #\9 #\. #\8 #\n #\ #\z #\l #\i #\b #\/ #\1 #\. #\2 #\. #\3 #\. #\4 #\ #\l #\i #\b #\i #\d #\n #\/ #\1 #\. #\1 #\5 #\ #\l #\i #\b #\s #\s #\h #\2 #\/ #\1 #\. #\2 #\. #\4 #\" #\} #\, #\0 #\Space #\, #\n #\S #\S #\L #\/ #\0 #\. #\Nul #\Nul) Maybe, someone here can explain, why this 0-characters are not recognized as proper utf-8 ones? Thanks! Vsevolod -------------- next part -------------- An HTML attachment was scrubbed... URL: From luismbo at gmail.com Wed Aug 4 19:02:48 2010 From: luismbo at gmail.com (=?ISO-8859-1?Q?Lu=EDs_Oliveira?=) Date: Wed, 4 Aug 2010 20:02:48 +0100 Subject: [babel-devel] question about #\Nul char and Unicode In-Reply-To: References: Message-ID: On Wed, Aug 4, 2010 at 3:07 PM, Vsevolod Dyomkin wrote: > Maybe, someone here can explain, why this 0-characters are not recognized as > proper utf-8 ones? Seems to work for me. Can you come up with a short reproducible example? CL-USER> (defparameter *array* #(#\5 #\4 #\c #\6 #\7 #\5 #\5 #\b #\- #\9 #\6 #\2 #\8 #\- #\4 #\0 #\a #\4 #\- #\9 #\a #\2 #\d #\- #\c #\c #\8 #\2 #\a #\8 #\1 #\6 #\3 #\4 #\5 #\e #\ #\1 #\8 #\ #\/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #\ #\2 #\6 #\0 #\Space #\{ #\" #\P #\A #\T #\H #\" #\Space #\" #\/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #\" #\, #\" #\M #\E #\T #\H #\O #\D #\" #\Space #\" #\G #\E #\T #\" #\, #\" #\V #\E #\R #\S #\I #\O #\N #\" #\Space #\" #\H #\T #\T #\P #\/ #\1 #\. #\1 #\" #\, #\" #\U #\R #\I #\" #\Space #\" #\/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #\" #\, #\" #\P #\A #\T #\T #\E #\R #\N #\" #\Space #\" #\/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #\" #\, #\" #\A #\c #\c #\e #\p #\t #\" #\Space #\" #\* #\/ #\* #\" #\, #\" #\H #\o #\s #\t #\" #\Space #\" #\l #\o #\c #\a #\l #\h #\o #\s #\t #\Space #\6 #\7 #\6 #\7 #\" #\, #\" #\U #\s #\e #\r #\- #\A #\g #\e #\n #\t #\" #\Space #\" #\c #\u #\r #\l #\/ #\7 #\. #\2 #\0 #\. #\0 #\ #\( #\i #\4 #\8 #\6 #\- #\p #\c #\- #\l #\i #\n #\u #\x #\- #\g #\n #\u #\) #\ #\l #\i #\b #\c #\u #\r #\l #\/ #\7 #\. #\2 #\0 #\. #\0 #\ #\O #\p #\e #\n #\S #\S #\L #\/ #\0 #\. #\9 #\. #\8 #\n #\ #\z #\l #\i #\b #\/ #\1 #\. #\2 #\. #\3 #\. #\4 #\ #\l #\i #\b #\i #\d #\n #\/ #\1 #\. #\1 #\5 #\ #\l #\i #\b #\s #\s #\h #\2 #\/ #\1 #\. #\2 #\. #\4 #\" #\} #\, #\0 #\Space #\, #\n #\S #\S #\L #\/ #\0 #\. #\Nul #\Nul)) *ARRAY* CL-USER> (cffi:with-foreign-string (fs (coerce *array* 'string) :encoding :utf-8) (cffi:foreign-string-to-lisp fs :encoding :utf-8)) "54c6755b-9628-40a4-9a2d-cc82a816345e 18 /handlertest 260 {\"PATH\" \"/handlertest\",\"METHOD\" \"GET\",\"VERSION\" \"HTTP/1.1\",\"URI\" \"/handlertest\",\"PATTERN\" \"/handlertest\",\"Accept\" \"*/*\",\"Host\" \"localhost 6767\",\"User-Agent\" \"curl/7.20.0 (i486-pc-linux-gnu) libcurl/7.20.0 OpenSSL/0.9.8n zlib/1.2.3.4 libidn/1.15 libssh2/1.2.4\"},0 ,nSSL/0." 328 Thanks, -- Lu?s Oliveira http://r42.eu/~luis/ From luismbo at gmail.com Wed Aug 4 21:35:02 2010 From: luismbo at gmail.com (=?ISO-8859-1?Q?Lu=EDs_Oliveira?=) Date: Wed, 4 Aug 2010 22:35:02 +0100 Subject: [babel-devel] question about #\Nul char and Unicode In-Reply-To: References: Message-ID: Hello Vsevolod, On Wed, Aug 4, 2010 at 9:42 PM, Vsevolod Dyomkin wrote: > It's also worth mentioning, that I'm using babel-0.3. Perhaps you should try with the development versions of babel and CFFI. Let me know if that helps. Cheers, -- Lu?s Oliveira http://r42.eu/~luis/ From vseloved at gmail.com Wed Aug 4 20:42:03 2010 From: vseloved at gmail.com (Vsevolod Dyomkin) Date: Wed, 4 Aug 2010 23:42:03 +0300 Subject: [babel-devel] question about #\Nul char and Unicode In-Reply-To: References: Message-ID: Lu?s, thanks for the answer! The issue is connected with my recent experiment with creating CL bindings to the upcoming mongrel2 web-server. And it arises only sometimes. You can see the initial variant at http://github.com/vseloved/cl-mongrel2. If you are willing to dive in and spend some time, try to run the example code in http://github.com/vseloved/cl-mongrel2/blob/master/example.lisp It will also require you to install and run mongrel2 itself (see http://mongrel2.org/doc/tip/docs/manual/book.wiki for details), which will in turn require to setup a working Python environment (if you don't have one already, obviously). All the other instructions are in example.lisp. If something is unclear, feel free to write me. It's also worth mentioning, that I'm using babel-0.3. Looking forward for the results, Vsevolod On Wed, Aug 4, 2010 at 10:02 PM, Lu?s Oliveira wrote: > On Wed, Aug 4, 2010 at 3:07 PM, Vsevolod Dyomkin > wrote: > > Maybe, someone here can explain, why this 0-characters are not recognized > as > > proper utf-8 ones? > > Seems to work for me. Can you come up with a short reproducible example? > > CL-USER> (defparameter *array* #(#\5 #\4 #\c #\6 #\7 #\5 #\5 #\b #\- > #\9 #\6 #\2 #\8 #\- #\4 #\0 #\a #\4 #\- #\9 #\a #\2 #\d #\- #\c #\c > #\8 #\2 #\a #\8 #\1 #\6 #\3 #\4 #\5 #\e #\ #\1 #\8 #\ #\/ #\h #\a > #\n #\d #\l #\e #\r #\t #\e #\s #\t #\ #\2 #\6 #\0 #\Space #\{ #\" > #\P #\A #\T #\H #\" #\Space #\" #\/ #\h #\a #\n #\d #\l #\e #\r #\t > #\e #\s #\t #\" #\, #\" #\M #\E #\T #\H #\O #\D #\" #\Space #\" #\G > #\E #\T #\" #\, #\" #\V #\E #\R #\S #\I #\O #\N #\" #\Space #\" #\H > #\T #\T #\P #\/ #\1 #\. #\1 #\" #\, #\" #\U #\R #\I #\" #\Space #\" > #\/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #\" #\, #\" #\P #\A > #\T #\T #\E #\R #\N #\" #\Space #\" #\/ #\h #\a #\n #\d #\l #\e #\r > #\t #\e #\s #\t #\" #\, #\" #\A #\c #\c #\e #\p #\t #\" #\Space #\" > #\* #\/ #\* #\" #\, #\" #\H #\o #\s #\t #\" #\Space #\" #\l #\o #\c > #\a #\l #\h #\o #\s #\t #\Space #\6 #\7 #\6 #\7 #\" #\, #\" #\U #\s > #\e #\r #\- #\A #\g #\e #\n #\t #\" #\Space #\" #\c #\u #\r #\l #\/ > #\7 #\. #\2 #\0 #\. #\0 #\ #\( #\i #\4 #\8 #\6 #\- #\p #\c #\- #\l > #\i #\n #\u #\x #\- #\g #\n #\u #\) #\ #\l #\i #\b #\c #\u #\r #\l > #\/ #\7 #\. #\2 #\0 #\. #\0 #\ #\O #\p #\e #\n #\S #\S #\L #\/ #\0 > #\. #\9 #\. #\8 #\n #\ #\z #\l #\i #\b #\/ #\1 #\. #\2 #\. #\3 #\. > #\4 #\ #\l #\i #\b #\i #\d #\n #\/ #\1 #\. #\1 #\5 #\ #\l #\i #\b > #\s #\s #\h #\2 #\/ #\1 #\. #\2 #\. #\4 #\" #\} #\, #\0 #\Space #\, > #\n #\S #\S #\L #\/ #\0 #\. #\Nul #\Nul)) > *ARRAY* > CL-USER> (cffi:with-foreign-string (fs (coerce *array* 'string) > :encoding :utf-8) > (cffi:foreign-string-to-lisp fs :encoding :utf-8)) > "54c6755b-9628-40a4-9a2d-cc82a816345e 18 /handlertest 260 {\"PATH\" > \"/handlertest\",\"METHOD\" \"GET\",\"VERSION\" \"HTTP/1.1\",\"URI\" > \"/handlertest\",\"PATTERN\" \"/handlertest\",\"Accept\" > \"*/*\",\"Host\" \"localhost 6767\",\"User-Agent\" \"curl/7.20.0 > (i486-pc-linux-gnu) libcurl/7.20.0 OpenSSL/0.9.8n zlib/1.2.3.4 > libidn/1.15 libssh2/1.2.4\"},0 ,nSSL/0." > 328 > > Thanks, > > -- > Lu?s Oliveira > http://r42.eu/~luis/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vseloved at gmail.com Fri Aug 6 11:58:43 2010 From: vseloved at gmail.com (Vsevolod Dyomkin) Date: Fri, 6 Aug 2010 14:58:43 +0300 Subject: [babel-devel] question about #\Nul char and Unicode In-Reply-To: References: Message-ID: Hello Lu?s, after some more examination I've discovered, that the error really was not connected with Babel, but rather with some "impedance mismatch" between CFFI and ZMQ: CFFI currently is able to support only null-terminated strings, while ZMQ operates in blobs, so TRANSLATE-FROM-FOREIGN was fed with the data, it was not ready to handle. I've prepared a patch to CFFI, that can handle this situation (additional TRANSLATE- method for string blobs of known size) and will soon send it, if I no better solution will be found. Best regards, Vsevolod On Thu, Aug 5, 2010 at 12:35 AM, Lu?s Oliveira wrote: > Hello Vsevolod, > > On Wed, Aug 4, 2010 at 9:42 PM, Vsevolod Dyomkin > wrote: > > It's also worth mentioning, that I'm using babel-0.3. > > Perhaps you should try with the development versions of babel and > CFFI. Let me know if that helps. > > Cheers, > > -- > Lu?s Oliveira > http://r42.eu/~luis/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luismbo at gmail.com Sat Aug 7 06:35:41 2010 From: luismbo at gmail.com (=?ISO-8859-1?Q?Lu=EDs_Oliveira?=) Date: Sat, 7 Aug 2010 07:35:41 +0100 Subject: [babel-devel] question about #\Nul char and Unicode In-Reply-To: References: Message-ID: On Sat, Aug 7, 2010 at 7:23 AM, Vsevolod Dyomkin wrote: > Actually, the string type doesn't have such a slot now (and that was kind of > what I thought to add). > But the more simple solution (not involving the need to patch CFFI) was just > to use (foreign-string-to-lisp data :count size), which is also used > internally by translate-from-foreign (but without the count parameter). Sorry I wasn't clear. What I meant is that you could add such a slot, then use it in translate-from-foreign to feed foreign-string-to-lisp's count parameter. Cheers, -- Lu?s Oliveira http://r42.eu/~luis/ From luismbo at gmail.com Sat Aug 7 06:11:53 2010 From: luismbo at gmail.com (=?ISO-8859-1?Q?Lu=EDs_Oliveira?=) Date: Sat, 7 Aug 2010 07:11:53 +0100 Subject: [babel-devel] question about #\Nul char and Unicode In-Reply-To: References: Message-ID: On Fri, Aug 6, 2010 at 12:58 PM, Vsevolod Dyomkin wrote: > I've prepared a patch to CFFI, that can handle this situation (additional > TRANSLATE- method for string blobs of known size) and will soon send it, if > I no better solution will be found. Cool. I think you can support that through a :size argument for the :string type. Let me know if you need help implementing such a thing. -- Lu?s Oliveira http://r42.eu/~luis/ From vseloved at gmail.com Sat Aug 7 06:23:56 2010 From: vseloved at gmail.com (Vsevolod Dyomkin) Date: Sat, 7 Aug 2010 09:23:56 +0300 Subject: [babel-devel] question about #\Nul char and Unicode In-Reply-To: References: Message-ID: Actually, the string type doesn't have such a slot now (and that was kind of what I thought to add). But the more simple solution (not involving the need to patch CFFI) was just to use (foreign-string-to-lisp data :count size), which is also used internally by translate-from-foreign (but without the count parameter). Thanks! Vsevolod On Sat, Aug 7, 2010 at 9:11 AM, Lu?s Oliveira wrote: > On Fri, Aug 6, 2010 at 12:58 PM, Vsevolod Dyomkin > wrote: > > I've prepared a patch to CFFI, that can handle this situation (additional > > TRANSLATE- method for string blobs of known size) and will soon send it, > if > > I no better solution will be found. > > Cool. I think you can support that through a :size argument for the :string > type. Let me know if you need help implementing such a thing. > > -- > Lu?s Oliveira > http://r42.eu/~luis/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: