[flexi-streams-devel] *substitution-char* does not suppress external-format-encoding-error
Anton Vodonosov
avodonosov at yandex.ru
Thu Feb 9 22:19:43 UTC 2012
To make these two aspects - length calculation and error recovery - consistent,
the following approach may be good:
Length calculation never signals encoding error. Instead, it takes into
account that wrong byte sequences may be replaced by a character,
provided via *substitution-char* or use-value restart. I.e. every wrong
byte sequence is counted as one character.
In decoding process which follows the length calculation two cases
are possible:
1. some error is not recovered (no *substitution-char* provided
or use-value
restait doesn't matter what length was calculated
2.
10.02.2012, 01:21, "Edi Weitz" <edi at agharta.de>:
> Sorry for the delay. I think this is more or less "on purpose."
> (It's been a while since I wrote that stuff...)
>
> The recover-from-encoding-error helper function is used when during
> decoding we encounter something which "looks like" a character (so to
> say) but isn't one - in which case we can e.g. replace it with the
> substitution character.
>
> I think the error you mention happens earlier - when the length is checked.
>
> Of course, one could argue that one could just as well use the same
> restart here. Maybe you can just submit a patch (including
> documentation if needed and ideally with new tests) and convince Hans
> to make a new release?
>
> Thanks,
> Edi.
>
> On Sat, Jan 21, 2012 at 1:06 PM, Dmitriy Ivanov <divanov11 at gmail.com> wrote:
>
>> Hello folks,
>>
>> I have bumped into the following error while playing with Hunchentoot.
>> (It is originated from url-decoding GET parameters with
>> *hunchentoot-default-external-format*.)
>>
>> (let ((flex:*substitution-char* #\?))
>> (flex:octets-to-string #(#xC1 #xC2 #xC3 #xC4) :external-format :utf-8))
>> => "??"
>>
>> (let ((flex:*substitution-char* #\?))
>> (flex:octets-to-string #(#xC0 #xC1 #xC2 #xC3 #xC4) :external-format
>> :utf-8))
>> -> signals: This sequence can't be decoded using UTF-8 as it is too short.
>> 1
>> octet missing at then end.
>>
>> The reason is rather "simple": the decoder invokes the following chain of calls:
>> compute-number-of-chars -> check-end -> signal-encoding-error
>>
>> This contrasts to the most of decoder code, which directly calls
>> recover-from-encoding-error
>> instead of
>> signal-encoding-error.
>> --
>> Sincerely,
>> Dmitriy Ivanov
>> lisp.ystok.ru
>>
>> _______________________________________________
>> flexi-streams-devel mailing list
>> flexi-streams-devel at common-lisp.net
>> http://lists.common-lisp.net/cgi-bin/mailman/listinfo/flexi-streams-devel
>
> _______________________________________________
> flexi-streams-devel mailing list
> flexi-streams-devel at common-lisp.net
> http://lists.common-lisp.net/cgi-bin/mailman/listinfo/flexi-streams-devel
More information about the Flexi-streams-devel
mailing list