File I/O Metrics
Robert Goldman
rpgoldman at sift.net
Fri Oct 21 20:47:04 UTC 2022
I don't know what data you are reading but is there any chance that the
difference is that when you read text in lisp as ISO-8859-1 lisp is
actually processing the text as unicode, but when you are reading it in
Java you are just slamming raw bytes into memory?
Maybe this is relevant?
https://stackoverflow.com/questions/979932/read-unicode-text-files-with-java
I don't use Java myself, so I can't say, and I don't have access to your
data, but it does seem like the Java code is doing something simpler
than the Lisp code.
What happens if you change your Lisp code to `read-sequence` of type
`byte` instead of `character`?
On 21 Oct 2022, at 13:43, Garrett Dangerfield wrote:
> I don't want to cause a firestore here but I was doing some simple
> benchmarks on file i/o between Java, ABCL, and SBCL and I'm a bit
> shocked,
> honestly.
>
> Reading a 2.5M file in 16M chunks in (using iso-8859-1):
> - abcl takes a tad over 1 second
> - sbcl takes 0.04 seconds
>
> Reading a 5.8G file in 16M chunks in (using iso-8859-1 for Lisp, for
> Java
> it's just bytes):
> - abcl takes...too long, I gave up
> - sbcl takes between 20 and 21 seconds
> - Java takes 1.5 seconds
>
> These are all run on the same computer using the same files, etc.
>
> What's up with this? Thoughts? I'd heard that SBCL should be as fast
> as C
> under at least some circumstances. I'd wager that C is at least as
> fast as
> Java (probably faster).
>
> Thanks,
> Garrett Dangerfield. (he/him/his)
>
> P.S. Don't get me wrong, I *LOVE* Lisp, I'm trying to get away from
> Java as
> fast as I can (the syntax is killing me slowly). I've used ABCL in
> projects before (it was wonderful, Java doesn't handle XML well).
>
> Lisp code:
> (with-open-file (stream "/media/danger/OS/temp/jars.txt"
> :external-format
> :iso-8859-1) ; great_expectations.iso
> (let ((size (file-length stream))
> (buffer-size (* 16 1024 1024)) ; 16M
> )
> (time
> (loop with buffer = (make-array buffer-size :element-type
> 'character)
> for n-characters = (read-sequence buffer stream)
> while (< 0 n-characters)))
> )))
>
> Java code:
> private static final int BUFFER_SIZE = 16 * 1024 * 1024;
> try (InputStream in = new
> FileInputStream("/media/danger/OS/temp/great_expectations.iso"); ) {
> byte[] buff = new byte[BUFFER_SIZE];
> int chunkLen = -1;
> long start = System.currentTimeMillis();
> while ((chunkLen = in.read(buff)) != -1) {
> System.out.println("chunkLen = " + chunkLen);
> }
> double duration = System.currentTimeMillis() - start;
> duration /= 1000;
> System.out.println(String.format("it took %,2f secs", duration));
> } catch (Exception e) {
> e.printStackTrace(System.out);
> } finally {
> System.out.println("Done.");
> }
Robert P. Goldman
Research Fellow
Smart Information Flow Technologies (d/b/a SIFT, LLC)
319 N. First Ave., Suite 400
Minneapolis, MN 55401
Voice: (612) 326-3934
Email: rpgoldman at SIFT.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/armedbear-devel/attachments/20221021/69a85265/attachment.html>
More information about the armedbear-devel
mailing list