From avodonosov at yandex.ru Tue Jun 3 00:02:55 2008 From: avodonosov at yandex.ru (Anton Vodonosov) Date: Tue, 3 Jun 2008 03:02:55 +0300 Subject: [cl-openid-devel] html parsing for html-based discovery Message-ID: <1138843367.20080603030255@yandex.ru> Hello Maciek. I've checked two java implementations for how they parse html - hte joid library by Verisign is uses very simple approach: they read html page line by line. If line contains "openid.server" string, they search value of the href attribute on the same or one of following lines, just by scanning for "href=" string. http://code.google.com/p/joid/source/browse/trunk/src/org/verisign/joid/consumer/Discoverer.java - the openid4java uses more thorough approach: they defined interface for html parser, provide a mechanism to plug a parser implementation and created default implementation based on some external HTML parser library http://code.google.com/p/openid4java/source/browse/trunk/src/org/openid4java/discovery/html/HtmlResolver.java I personally like the joid approach. Although in theory it may fail on a valid html document, it will work in almost any real life scenario. It's pleasant to read their simple code. If it takes some difficulties/uncertainty to decide on the html parsing problem right now, we may create the simplest variant of parser: just scanning for "openid2.provider", etc. It will be sufficient for our initial experiments and I almost sure it will work for all the popular providers. We may create a ticket to improve html parsing and fix the ticket in the future, according it's priority. What do you think? Best regards, -Anton