From edi at agharta.de Thu Feb 9 21:21:07 2012 From: edi at agharta.de (Edi Weitz) Date: Thu, 9 Feb 2012 22:21:07 +0100 Subject: [flexi-streams-devel] *substitution-char* does not suppress external-format-encoding-error In-Reply-To: References: Message-ID: Sorry for the delay. I think this is more or less "on purpose." (It's been a while since I wrote that stuff...) The recover-from-encoding-error helper function is used when during decoding we encounter something which "looks like" a character (so to say) but isn't one - in which case we can e.g. replace it with the substitution character. I think the error you mention happens earlier - when the length is checked. Of course, one could argue that one could just as well use the same restart here. Maybe you can just submit a patch (including documentation if needed and ideally with new tests) and convince Hans to make a new release? Thanks, Edi. On Sat, Jan 21, 2012 at 1:06 PM, Dmitriy Ivanov wrote: > Hello folks, > > I have bumped into the following error while playing with Hunchentoot. > (It is originated from url-decoding GET parameters with > ?*hunchentoot-default-external-format*.) > > (let ((flex:*substitution-char* #\?)) > ?(flex:octets-to-string #(#xC1 #xC2 #xC3 #xC4) :external-format :utf-8)) > => "??" > > (let ((flex:*substitution-char* #\?)) > ?(flex:octets-to-string #(#xC0 #xC1 #xC2 #xC3 #xC4) :external-format > :utf-8)) > -> signals: This sequence can't be decoded using UTF-8 as it is too short. > 1 > octet missing at then end. > > The reason is rather "simple": the decoder invokes the following chain of calls: > ?compute-number-of-chars -> check-end -> signal-encoding-error > > This contrasts to the most of decoder code, which directly calls > ? recover-from-encoding-error > instead of > ?signal-encoding-error. > -- > Sincerely, > Dmitriy Ivanov > lisp.ystok.ru > > > > > _______________________________________________ > flexi-streams-devel mailing list > flexi-streams-devel at common-lisp.net > http://lists.common-lisp.net/cgi-bin/mailman/listinfo/flexi-streams-devel > From avodonosov at yandex.ru Thu Feb 9 22:19:43 2012 From: avodonosov at yandex.ru (Anton Vodonosov) Date: Fri, 10 Feb 2012 02:19:43 +0400 Subject: [flexi-streams-devel] *substitution-char* does not suppress external-format-encoding-error In-Reply-To: References: Message-ID: <526721328825984@web28.yandex.ru> To make these two aspects - length calculation and error recovery - consistent, the following approach may be good: Length calculation never signals encoding error. Instead, it takes into account that wrong byte sequences may be replaced by a character, provided via *substitution-char* or use-value restart. I.e. every wrong byte sequence is counted as one character. In decoding process which follows the length calculation two cases are possible: 1. some error is not recovered (no *substitution-char* provided or use-value restait doesn't matter what length was calculated 2. 10.02.2012, 01:21, "Edi Weitz" : > Sorry for the delay. ?I think this is more or less "on purpose." > (It's been a while since I wrote that stuff...) > > The recover-from-encoding-error helper function is used when during > decoding we encounter something which "looks like" a character (so to > say) but isn't one - in which case we can e.g. replace it with the > substitution character. > > I think the error you mention happens earlier - when the length is checked. > > Of course, one could argue that one could just as well use the same > restart here. ?Maybe you can just submit a patch (including > documentation if needed and ideally with new tests) and convince Hans > to make a new release? > > Thanks, > Edi. > > On Sat, Jan 21, 2012 at 1:06 PM, Dmitriy Ivanov wrote: > >> ?Hello folks, >> >> ?I have bumped into the following error while playing with Hunchentoot. >> ?(It is originated from url-decoding GET parameters with >> ??*hunchentoot-default-external-format*.) >> >> ?(let ((flex:*substitution-char* #\?)) >> ??(flex:octets-to-string #(#xC1 #xC2 #xC3 #xC4) :external-format :utf-8)) >> ?=> "??" >> >> ?(let ((flex:*substitution-char* #\?)) >> ??(flex:octets-to-string #(#xC0 #xC1 #xC2 #xC3 #xC4) :external-format >> ?:utf-8)) >> ?-> signals: This sequence can't be decoded using UTF-8 as it is too short. >> ?1 >> ?octet missing at then end. >> >> ?The reason is rather "simple": the decoder invokes the following chain of calls: >> ??compute-number-of-chars -> check-end -> signal-encoding-error >> >> ?This contrasts to the most of decoder code, which directly calls >> ?? recover-from-encoding-error >> ?instead of >> ??signal-encoding-error. >> ?-- >> ?Sincerely, >> ?Dmitriy Ivanov >> ?lisp.ystok.ru >> >> ?_______________________________________________ >> ?flexi-streams-devel mailing list >> ?flexi-streams-devel at common-lisp.net >> ?http://lists.common-lisp.net/cgi-bin/mailman/listinfo/flexi-streams-devel > > _______________________________________________ > flexi-streams-devel mailing list > flexi-streams-devel at common-lisp.net > http://lists.common-lisp.net/cgi-bin/mailman/listinfo/flexi-streams-devel From avodonosov at yandex.ru Thu Feb 9 22:25:17 2012 From: avodonosov at yandex.ru (Anton Vodonosov) Date: Fri, 10 Feb 2012 02:25:17 +0400 Subject: [flexi-streams-devel] *substitution-char* does not suppress external-format-encoding-error In-Reply-To: References: Message-ID: <527191328826317@web28.yandex.ru> [Sorry, accidentially hit Enter and sent unfinished letter. So, once again: ] To make these two aspects - length calculation and error recovery - consistent, the following approach may be good: Length calculation never signals encoding error. Instead, it takes into account that wrong byte sequences may be replaced by a character, provided via *substitution-char* or use-value restart. I.e. every wrong byte sequence is counted as one character. In decoding process which follows the length calculation two cases are possible: 1. some error is not recovered (no *substitution-char* provided or use-value invoked). The decoding fails completely and it doesn't matter what length was calculated. 2. All the wrong sequences were substituted. In this case the length where all the wrong sequences are counted as one character exactly matches the need of decoding process. Unfortunately I can not work on patch for this now and in the near future. Best regards, - Anton 10.02.2012, 01:21, "Edi Weitz" : > Sorry for the delay. ?I think this is more or less "on purpose." > (It's been a while since I wrote that stuff...) > > The recover-from-encoding-error helper function is used when during > decoding we encounter something which "looks like" a character (so to > say) but isn't one - in which case we can e.g. replace it with the > substitution character. > > I think the error you mention happens earlier - when the length is checked. > > Of course, one could argue that one could just as well use the same > restart here. ?Maybe you can just submit a patch (including > documentation if needed and ideally with new tests) and convince Hans > to make a new release? > > Thanks, > Edi. > > On Sat, Jan 21, 2012 at 1:06 PM, Dmitriy Ivanov wrote: > >> ?Hello folks, >> >> ?I have bumped into the following error while playing with Hunchentoot. >> ?(It is originated from url-decoding GET parameters with >> ??*hunchentoot-default-external-format*.) >> >> ?(let ((flex:*substitution-char* #\?)) >> ??(flex:octets-to-string #(#xC1 #xC2 #xC3 #xC4) :external-format :utf-8)) >> ?=> "??" >> >> ?(let ((flex:*substitution-char* #\?)) >> ??(flex:octets-to-string #(#xC0 #xC1 #xC2 #xC3 #xC4) :external-format >> ?:utf-8)) >> ?-> signals: This sequence can't be decoded using UTF-8 as it is too short. >> ?1 >> ?octet missing at then end. >> >> ?The reason is rather "simple": the decoder invokes the following chain of calls: >> ??compute-number-of-chars -> check-end -> signal-encoding-error >> >> ?This contrasts to the most of decoder code, which directly calls >> ?? recover-from-encoding-error >> ?instead of >> ??signal-encoding-error. >> ?-- >> ?Sincerely, >> ?Dmitriy Ivanov >> ?lisp.ystok.ru >> >> ?_______________________________________________ >> ?flexi-streams-devel mailing list >> ?flexi-streams-devel at common-lisp.net >> ?http://lists.common-lisp.net/cgi-bin/mailman/listinfo/flexi-streams-devel > > _______________________________________________ > flexi-streams-devel mailing list > flexi-streams-devel at common-lisp.net > http://lists.common-lisp.net/cgi-bin/mailman/listinfo/flexi-streams-devel