[cl-openid-devel] html parsing for html-based discovery
Anton Vodonosov
avodonosov at yandex.ru
Tue Jun 3 00:02:55 UTC 2008
Hello Maciek.
I've checked two java implementations for how they parse html
- hte joid library by Verisign is uses very simple approach:
they read html page line by line. If line contains
"openid.server" string, they search value of the href
attribute on the same or one of following lines, just
by scanning for "href=" string.
http://code.google.com/p/joid/source/browse/trunk/src/org/verisign/joid/consumer/Discoverer.java
- the openid4java uses more thorough approach: they defined
interface for html parser, provide a mechanism to plug
a parser implementation and created default implementation
based on some external HTML parser library
http://code.google.com/p/openid4java/source/browse/trunk/src/org/openid4java/discovery/html/HtmlResolver.java
I personally like the joid approach. Although in theory
it may fail on a valid html document, it will work in almost any
real life scenario. It's pleasant to read their simple code.
If it takes some difficulties/uncertainty to decide on the html
parsing problem right now, we may create the simplest variant of
parser: just scanning for "openid2.provider", etc. It will be
sufficient for our initial experiments and I almost sure it will
work for all the popular providers. We may create a ticket to
improve html parsing and fix the ticket in the future, according
it's priority.
What do you think?
Best regards,
-Anton
More information about the cl-openid-devel
mailing list