[cl-typesetting-devel] Serious hyphenation bug

Marc Battyani marc.battyani at fractalconcept.com
Thu Mar 30 19:53:12 UTC 2006


"Peter Heslin" <pj at heslin.eclipse.co.uk> wrote:

> While playing with cl-typesetting, I have had the vague feeling that
> it did not find as many hyphenation points as TeX, which seemed
> strange, since it uses the same patterns.  I tested this, using this
> function:
>
> (defun show-hyphens (string)
>  (concatenate 'string
>               (loop
>                  for char across string
>                  for i upfrom 0
>                  for points = (tt::hyphenate-string string)
>                  appending (if (member i points)
>                                (list #\- char)
>                                (list char)))))
>
> Here is the output from TeX' showhyphens command for a piece of text:
>
> but tor-ture with-out end still urges, and a fiery del-uge, fed with
> ever-burning sul-phur un-con-sumed. such place eter-nal jus-tice had
> pre-pared for those re-bel-lious? here their prison or-dained in
> ut-ter dark-ness, and their por-tion set as far re-moved from god and
> light of heaven, as from the cen-tre thrice to the ut-most pole. oh,
> how un-like the place from whence they fell.
>
> Here is cl-typesetting, using the show-hyphens function defined above:
>
> but tor-ture without end still urges, and a fiery deluge, fed with
> ever-burning sulphur un-con-sumed.  such place eter-nal justice had
> pre-pared for those re-bellious? here their pri-son or-dained in
> utter dark-ness, and their por-tion set as far re-moved from god and
> light of heaven, as from the centre thrice to the ut-most pole.  oh,
> how unlike the place from whence they fell.
>
> Note that cl-typesetting finds only about half of the hyphenation
> points that TeX does, despite using similar patterns.
>
> I think I have discovered the cause of this bug.
>
> In the file hyphenation-fp.lisp, the function hyphen-make-trie has
> this comment:
>
> ;; Build a trie out of a sorted list
> ;; of pairs (word, hyph-points)
>
> So it is important that the input list is sorted.  This is done by this
> line in the function read-hyphen-file:
>
>    (setq patterns (sort patterns #'hyphen-cmp-char-lists)
>
> Here is that sort predicate:
>
> (defun hyphen-cmp-char-lists (l1 l2)
>  (let (result done)
>    (loop for c1 = (pop l1)
>          for c2 = (pop l2)
>          while (and (characterp c1) (characterp c2) (not done))
>   do
>   (if (char< c1 c2)
>       (setq result t done t)
>     (if (char> c1 c2)
> (setq done t)
>       ))
>   finally (if done result nil))))
>
> It seems to me that this function will fail to sort the lists of chars
> correctly when one of the lists represents an initial substring of the
> other string, which is not uncommon.
>
> The result of this bug is that the contents of the patterns variable
> are only partially sorted, and so hyphen-make-trie generates a trie
> that excludes many of the patterns.  In fact, if you want to check it,
> you can see that, at least for some initial letters, the trie
> generated only includes patterns that correspond to a line in the
> hyphenation file that begins with a dot or a number.
>
> Here is my revised version of the sort predicate:
>
> (defun nix::hyphen-cmp-char-lists (l1 l2)
>  (loop
>     for c1 = (pop l1)
>     for c2 = (pop l2)
>     do (cond
>          ((and (characterp c1) (not (characterp c2)))
>           (return nil))
>          ((and (not (characterp c1)) (characterp c2))
>           (return t))
>          ((and (not (characterp c1)) (not (characterp c2)))
>           (return))
>          ((char< c1 c2)
>           (return t))
>          ((char> c1 c2)
>           (return nil)))))
>
> When I use this function and re-load the american hyphen file, and run
> my show-hyphens function again, I get a correct result, like so:
>
> but tor-ture with-out end still urges, and a fiery del-uge, fed with
> ever-burn-ing sul-phur un-con-sumed.  such place eter-nal jus-tice had
> pre-pared for those re-bel-lious? here their prison or-dained in
> ut-ter dark-ness, and their por-tion set as far re-moved from god and
> light of heaven, as from the cen-tre thrice to the ut-most pole.  oh,
> how un-like the place from whence they fell.
>
> Now cl-typesetting has found all of the same hyphenation points that
> TeX did.
>
> So it looks from a very superficial test that this is the correct fix,
> but I should emphasize that I do not understand most of the code in
> hyphenation-fp.lisp, so I'm not entirely sure that my sort function is
> correct.  It would nice if someone who understands the code fully
> could check this out.

Thanks for this well commented bug fix. :)

I haven't closely looked at this code too and I'm not a TeX expert so I will 
forward this to Fabrice Popineau who wrote it. (and who is a TeX expert)

BTW even though my interest in cl-typesetting/cl-pdf has not faded, don't be 
surprised if I don't reply very fast. I have an incredible amount of work to 
do these days with several deadlines at the end of April. Nevertheless I 
plan to make a new release with all the patches/modifications I've got 
before May.

Marc





More information about the cl-typesetting-devel mailing list