[cl-pdf-devel] parsing pdf file meta info

Sat Oct 14 00:28:15 UTC 2006

Hi all,

I know that cl-pdf is a pdf generation library. So this is OT, but I thought
people on this list might care to comment.

I want to programmatically check if a pdf file's page size is 8"x11"
(and that it has only one page).

I have a bunch of files for printing and I need to make sure
that all of them meet the above requirements.

So I go to adobe's site, and download the pdf spec.
At 1200 pages, it seems daunting, and the acrobat sdk is even bigger
(documentation is 3000+ page, it's HUGE and complex).

However, not that I don't want to read, but it might save me some time here to
ask first before going to the wrong direction.

Can someone knowledgeable with pdf briefly describe if this task is
trivial or difficult?

Maybe I dont have to do everything with lisp? If you know any open source
or 3rd party library tool that can help I'd love to hear.

Again, most of the pdf api out there are for _generating_ pdf content.

I read an interesting article on parsing pdf with python.

http://www.python10.org/p10-papers/17/index.htm

There are lots of arguments the authors mentioned regarding how python are best
suited for this task. But I think lisp will even be better (with CLOS, macro,
etc etc). However, I cannot find the implementation of this python library.
Otherwise, I want to try to port that to CL if it's not too big to digest (and
open sourced).

Thanks in advance
fungsin