A bug in functon parse-content-type.

Wed May 29 05:44:41 UTC 2013

Hi Hans,

The whole thing I want is a a stable hunchentoot server which will be compatible with other web clients
and a stable drakma client which will be be compatible with other web servers,whether the web clients/servers
follows http protocols well or not should not be the reason which makes hunchentoot/drakma failed directly.
I hope you can understand that if drakma/hunchentoot failed directly, my commercial business will fail too.

I must say that the codes from Edi has very high qulities,and I have high respect to Edi and you for that.

For the case of this question, I hope chunga/drakma/hunchentoot could accept a special feature or a speical variable
to make them accept the content type header which not follows http protocols well,like cl-http does.

     -----------------------------------------------------------------------------
     (parse-mime-content-type-header "application/x-www-form-urlencoded;
     text/html; charset=UTF-8")
        ==> (:APPLICATION :X-WWW-FORM-URLENCODED :CHARSET :UTF-8)
     -----------------------------------------------------------------------------

I think your solution(request-with-bad-content-type) will be a little trivial for me.

If you accept my suggestion, I can give you a patch for these three packages(chunga/drakma/hunchentoot).

With Best Regards,

At Sun, 26 May 2013 08:04:15 +0200,
Hans Hübner wrote:
> 
> [1  <text/plain; ISO-8859-1 (quoted-printable)>]
> 
> [2  <text/html; ISO-8859-1 (quoted-printable)>]
> Jingtao,
> 
> please refer to http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7, it clearly
> describes that a media type consists of exactly one type/subtype indicator followed by
> optional attribute=value pairs.  The content type that you have presented is not valid
> according to these rules.   Neither a lax parser like the one in CL-HTTP nor the fact
> that a large site sends these bogus headers makes them valid.  I do not want to include
> code in Hunchentoot that tries to interpret such bogus data.
> 
> However, if you cannot get your trading partner to fix their client, I can offer this
> solution:
> 
> (defclass request-with-bad-content-type (hunchentoot:request)
>   ())
> 
> (defmethod hunchentoot:header-in :around ((name (eql :content-type)) (request
> request-with-bad-content-type))
>   (alexandria:when-let (content-type (call-next-method))
>     (ppcre:regex-replace-all "^([^/]+/[^/]+); *[^/]+/[^/;]+" content-type "\\1")))
> 
> You'll then have to use the :request-class argument to your acceptor instantiation to
> make it use the request-with-bad-content-type class.  You also want to review the regular
> expression carefully and maybe profile your application to see whether you need to cache
> or otherwise improve performance.
> 
> -Hans
> 
> On Sun, May 26, 2013 at 5:07 AM, Jingtao Xu <jingtaozf at gmail.com> wrote:
> 
>     Hi Hans,
>    
>     I don't agree with you to say that this content type header is just bogus.
>     As the content-type is sent by the largest B2B/B2C site in china, it
>     must have a reason.
>    
>     And if you try cl-http, you can find that cl-http will parse such
>     content type correctly.
>    
>     -----------------------------------------------------------------------------
>     (parse-mime-content-type-header "application/x-www-form-urlencoded;
>     text/html; charset=UTF-8")
>        ==> (:APPLICATION :X-WWW-FORM-URLENCODED :CHARSET :UTF-8)
>     -----------------------------------------------------------------------------
>    
>     You can find the definition in cl-http/server/headers.lisp
>     -----------------------------------------------------------------------------
>     (define-header-type :content-type-header (:header)
>       :parse-function parse-mime-content-type-header
>       :print-function print-mime-content-type-header)
>     -----------------------------------------------------------------------------
>    
>     Even this content-type header is bogus(actually I don't think so),
>     hunchentoot/drakma should parse
>     the header without raising an error if one special variable like *
>     accept-bogus-content-type* is true.
> 
>     With Best Regards,
>     jingtao.
>    
>     On Sat, May 25, 2013 at 8:11 PM, Hans Hübner <hans.huebner at gmail.com> wrote:
>     > Jingtao,
>     >
>     > the content-type header "application/x-www-form-urlencoded; text/html;
>     > charset=UTF-8" is just bogus.  I do not want to include code that makes
>     > Hunchentoot work with clearly broken clients.  Better error reporting would
>     > be acceptable, though.
>     >
>     > -Hans
>     >
>     >
>     > On Sat, May 25, 2013 at 12:38 PM, Jingtao Xu <jingtaozf at gmail.com> wrote:
>     >>
>     >> Hi all,
>     >>
>     >> I found the content type header which raise the bug in my message.log
>     >> generated by hunchentoot.
>     >> It happened when hunchentoot get following content type header:
>     >>
>     >>
>     >>
>     -----------------------------------------------------------------------------------------
>     >> application/x-www-form-urlencoded; text/html; charset=UTF-8
>     >>
>     >>
>     -----------------------------------------------------------------------------------------
>     >>
>     >> I noticed that in package drakma's file read.lisp,function
>     >> 'get-content-type'
>     >> also assumed "/" as a token separator.
>     >>
>     >> I hope package chunga/drakma/hunchentoot could accept such content type
>     >> header
>     >> without raising an exception,As Edl said,a new special variable
>     >> similar to *accept-bogus-eols* or
>     >> *treat-semicolon-as-continuation* which only assume " ,;" as token
>     >> separator may be a good idea and will fix my question.
>     >>
>     >> Any way, RFC standard is not well fit with the read world.
>     >>
>     >> Thanks very much.
>     >>
>     >> WIth Best Regards,
>     >> jingtao.
>     >>
>     >>
>     >> On Thu, May 23, 2013 at 2:01 PM, Edi Weitz <edi at agharta.de> wrote:
>     >> > I'm not the maintainer anymore, but my take is that if some Ruby or
>     >> > Java client misinterprets the RFC I wouldn't change Hunchentoot's (or
>     >> > rather Chunga's) default behavior because of that.  I'd rather
>     >> > introduce a new special variable similar to *accept-bogus-eols* or
>     >> > *treat-semicolon-as-continuation*.
>     >> >
>     >> > Just my .02 Euros,
>     >> > Edi.
>     >> >
>     >> >
>     >> >
>     >> > On Thu, May 23, 2013 at 2:52 AM, Jingtao Xu <jingtaozf at gmail.com> wrote:
>     >> >> Hi All,
>     >> >>
>     >> >> 1. The function `read-name-value-pair' is called by `
>     >> >> parse-content-type' in hunchentoo/util.lisp,not by my codes.
>     >> >> 2. the slash is a token constituent in java/ruby implementation,and I
>     >> >> think some web client/server treat it as a token constituent too,
>     >> >>     but I am waiting for the hunchentoot log to give us a live example.
>     >> >>
>     >> >> With Best Regards,
>     >> >> jingtao
>     >> >>
>     >> >>
>     >> >> On Wed, May 22, 2013 at 11:40 PM, Edi Weitz <edi at agharta.de> wrote:
>     >> >>> If I'm not mistaken, the slash is a "separator" and thus not a token
>     >> >>> constituent according to RFC 2616 which means "path=/foo" is not legal
>     >> >>> input for READ-NAME-VALUE-PAIR.
>     >> >>>
>     >> >>> On Wed, May 22, 2013 at 5:27 PM, Ron Garret <ron at flownet.com> wrote:
>     >> >>>> Very likely Jingtao's code is calling READ-NAME-VALUE-PAIR without
>     >> >>>> being wrapped in this macro
>     >> >>>>
>     >> >>>> But there's still a bug in READ-NAME-VALUE-PAIR:
>     >> >>>>
>     >> >>>> ? (WITH-INPUT-FROM-VECTOR (S (MAP '(VECTOR (UNSIGNED-BYTE 8))
>     >> >>>> 'CHAR-CODE "path=/foo"))
>     >> >>>>   (chunga:with-character-stream-semantics
>     >> >>>>       (CHUNGA:READ-NAME-VALUE-PAIR S)))
>     >> >>>> ("path" . "")
>     >> >>>>
>     >> >>>> On May 22, 2013, at 8:19 AM, Edi Weitz wrote:
>     >> >>>>
>     >> >>>>> On Wed, May 22, 2013 at 4:18 PM, Ron Garret <ron at flownet.com> wrote:
>     >> >>>>>> I found a bug in CHUNGA:READ-NAME-VALUE-PAIR.
>     >> >>>>>
>     >> >>>>> It's not quite clear to me yet what the bug is supposed to be.
>     >> >>>>>
>     >> >>>>> The documentation clearly says that calls to READ-NAME-VALUE-PAIR
>     >> >>>>> and
>     >> >>>>> friends must be wrapped with this macro:
>     >> >>>>>
>     >> >>>>>  http://weitz.de/chunga/#with-character-stream-semantics
>     >> >>>>>
>     >> >>>>> (You might argue that this isn't very user-friendly, but Chunga
>     >> >>>>> wasn't
>     >> >>>>> really intended to be used that way.)
>     >> >>>>
>     >> >>
>     >
>     >
> 
>