[armedbear-devel] Losing on multiprocessing
Mark Evenson
evenson at panix.com
Thu Sep 26 11:44:27 UTC 2013
On 9/26/13 0728 , Alan Ruttenberg wrote:
> Howdy,
>
> I wonder if those of you have worked with threads might have a quick
> look to see if I am doing something stupid.
>
> https://lsw2.googlecode.com/svn/branches/bona/util/jargrep.lisp
I whacked away at your file, converting it to the attached form to use
the JSS namespace and ABCL-ASDF to resolve the dk.brics.automaton
artifact, but I can't get to seem the matches to occur. Not having
your jar files to test, I just run it across Maven jars as follows:
CL-USER> (jar-map-threads-automaton-find "Manifest" (jss::all-jars-below
"~/.m2"))
12.295 seconds real time
1897572 cons cells
0
0
CL-USER> (length (jss::all-jars-below "~/.m2"))
460
which should result in matches for all jars, because every jar that
Maven uses, has a manifest contains the string "Manifest-Version: 1.0".
But I get no hits, and the execution is so fast, that I suspect that
the matcher is not actually working on anything for some reason. Since
you pass a closure with a reference to the regex as the function to
THEREADS:MAKE-THREAD, trying to TRACE stuff doesn't seem to work so well.
I need to spend more time with the matcher to understand why I am not
generating any hits. Any ideas on your end?
[…]
> The result of running this is about (and their's the rub) 20 key value
> pairs in the hash table (I had read that ABCL hash tables are thread
> safe). The problem is that different runs of this code on the same data
> get different numbers of key value pairs, between 13 and 24!
ABCL hashtables should indeed be thread-safe, with all accesses
protected by an underlying java.util.concurrent.locks.ReentrantLock.
> I'm not sure whether I'm just not doing this the right way, in which
> case it would be very helpful to get an explanation of why not, or
> there's a problem somewhere in the implementation.
For the record, I used
CL-USER> (lisp-implementation-version)
"1.3.0-dev"
"Java_HotSpot(TM)_64-Bit_Server_VM-Oracle_Corporation-1.7.0_40-b43"
"amd64-Linux-2.6.18-348.16.1.el5.centos.plus"
to run my tests, but I have no reason to currently suspect the ABCL
version is at fault here.
More later when I get the time,
Mark
--
"A screaming comes across the sky. It has happened before, but there
is nothing to compare to it now."
-------------- next part --------------
;;; https://lsw2.googlecode.com/svn/branches/bona/util/jargrep.lisp
;; Author: Alan Ruttenberg
;; Date: September 24, 2013
#|
(jar-map-threads-automaton-find
regex
(generate-filename-sequence "/data/jars/15/file#.jar" 2 0 14))
(jar-map-threads-automaton-find
"MANIFEST"
(jss::all-jars-below "~/.m2")
:threads 8)
|#
(require :abcl-contrib)
(require :jss)
(require :abcl-asdf)
(java:add-to-classpath
(abcl-asdf:resolve-dependencies "dk.brics.automaton" "automaton"))
(defun jar-map (jar-or-jars fn)
"given a jar file or a list of jar files, call fn on the string that is the decompressed entry.
TODO: Add filtering by path name, so we can look only in, say, the XML files"
(format t "~&jar-map: ~A~%." jar-or-jars)
(loop for jar in (if (consp jar-or-jars) jar-or-jars (list jar-or-jars))
with buffer-size = 0
with buffer = nil
for jarfile = (jss:new 'jarfile (jss:new 'file (namestring (truename jar))))
for entries = (#"entries" jarfile)
do
(loop while (#"hasMoreElements" entries)
for next-in = (#"nextElement" entries)
for in-stream = (#"getInputStream" jarfile next-in)
for size = (#"getSize" next-in)
do
(when (> size buffer-size)
(setq buffer (jnew-array "byte" size)))
(when (> size 0)
(#"read" in-stream buffer)
(setq @ buffer)
(unwind-protect
(let ((name (#"getName" next-in)))
(funcall fn (jss:new 'java.lang.string buffer size) name))
(#"close" in-stream))))))
;; One global variable to hold our results hash
(defvar *hits*)
;; Create a thread for each jar file. Each thread executes
;; thread-run-function passed the name of a jar file. Call
;; thread-join on each to wait until they are all finished. Use (time
;; .. ) to get timings.
(defun thread-per-jar (thread-run-function jar-filenames
&key (thread-name-prefix "per-jar-")
(nthreads (length jar-filenames)))
(time (loop for thread in
(loop
for i from 0 below nthreads
for f in jar-filenames
collect (threads:make-thread
(lambda()
(funcall thread-run-function f))
:name (format nil "~a~a" thread-name-prefix i)))
do (threads:thread-join thread)))
(print (hash-table-count *hits*)))
;; And a method to add a result. There is no duplication of the entry
;; names across the jar files. I had hoped this was thread safe, but
;; I get different numbers of entries in the hash table in diffreent
;; runs of the job.
(defun add-hit (entry-name jarfile data)
(setf (gethash entry-name *hits*)
(list jarfile data)))
;; This uses the java regex package and is substantially slower than
;; the dk.brics.automaton. Optimizations for regex coding from
;; http://www.fasterj.com/articles/regex2.shtml
(defun jar-map-threads-regex-find (regex jar-filenames &key (threads (length jar-filenames)))
(setq *hits* (make-hash-table :test 'equal)) ;; initialize results
(thread-per-jar
(lambda (jarfile)
(let* ((pat (#"compile" 'java.util.regex.Pattern regex))
(matcher (#"matcher" pat "notused")))
(jss:with-constant-signature ((find "find") (reset "reset" t))
(jar-map
jarfile
(lambda (s name)
; (declare (optimize (speed 3) (safety 0)))
(reset matcher s)
(when (find matcher)
(add-hit name jarfile s)))))))
jar-filenames
:nthreads threads))
;; Prepare the automaton, analogous to compiling the regular expression
(defun compile-regex-automaton (pattern)
(jss:new 'dk.brics.automaton.RunAutomaton
(#"toAutomaton"
(jss:new 'dk.brics.automaton.RegExp pattern
(jss:get-java-field 'dk.brics.automaton.RegExp "ALL")))))
(defun jar-map-threads-automaton-find (regex jar-filenames &key (threads (length jar-filenames)))
(setq *hits* (make-hash-table :test 'equal))
(thread-per-jar
(lambda (jarfile)
(format t "~&Working on ~A~%." jarfile)
(let* ((pat (compile-regex-automaton regex)))
(jss:with-constant-signature ((find "find") (newmatcher "newMatcher" t))
(jar-map
jarfile
(lambda(s name)
; (declare (optimize (speed 3) (safety 1)))
(when (find (newmatcher pat s))
(add-hit name jarfile s)))))))
jar-filenames
:nthreads threads))
(defun generate-filename-sequence (template digits from to)
(let ((format-string (#"replaceFirst" template "#" (format nil "~~~a,'0d" digits))))
(loop for i from from to to collect (format nil format-string i))))
More information about the armedbear-devel
mailing list