File I/O Metrics
Robert Goldman
rpgoldman at sift.net
Fri Oct 21 21:42:35 UTC 2022
I took a file of about 450MB of characters. Using SBCL, when I read it
like this:
```
(defun do-test2 ()
(with-open-file (stream *text-file*)
(let ((buffer-size (* 16 1024 1024)) ; 16M
)
(time
(loop with buffer = (make-array buffer-size
:element-type 'character)
for n-characters = (read-sequence buffer stream)
while (< 0 n-characters))))))
```
It took an average of 1.08125s to read (4 trials).
This procedure:
```
(defun do-test3 ()
(with-open-file (stream *text-file* :element-type
'(unsigned-byte 8))
(let ((buffer-size (* 16 1024 1024)) ; 16M
)
(time
(loop with buffer = (make-array buffer-size
:element-type '(unsigned-byte 8))
for n-characters = (read-sequence buffer stream)
while (< 0 n-characters))))))
```
It took an average of 0.07s
Modifying this to set the `:external-format` to `:iso8859-1` and reading
into an array of `:element-type 'character` it takes an average of
0.8095s
So there seems to be *some* overhead to the unicode handling. Note that
I didn't have a file at hand that actually had ISO8859-1 in it, so I
don't know if that would have complicated matters.
This suggests that just moving around bits without worrying about their
interpretation *may* be faster than treating them as characters. So you
could see if that changes your results at all.
I'm not a real expert in CL file I/O, so it's likely that this could be
done better.
On 21 Oct 2022, at 16:18, Garrett Dangerfield wrote:
> I tried changing (make-array buffer-size :element-type 'character)
> to
> (make-array buffer-size :element-type 'byte)
> and I got additional warnings and it took 70 seconds instead of 20.
>
> Thanks,
> Garrett.
>
> On Fri, Oct 21, 2022 at 1:47 PM Robert Goldman <rpgoldman at sift.net>
> wrote:
>
>> I don't know what data you are reading but is there any chance that
>> the
>> difference is that when you read text in lisp as ISO-8859-1 lisp is
>> actually processing the text as unicode, but when you are reading it
>> in
>> Java you are just slamming raw bytes into memory?
>>
>> Maybe this is relevant?
>> https://stackoverflow.com/questions/979932/read-unicode-text-files-with-java
>>
>> I don't use Java myself, so I can't say, and I don't have access to
>> your
>> data, but it does seem like the Java code is doing something simpler
>> than
>> the Lisp code.
>>
>> What happens if you change your Lisp code to read-sequence of type
>> byte
>> instead of character?
>>
>> On 21 Oct 2022, at 13:43, Garrett Dangerfield wrote:
>>
>> I don't want to cause a firestore here but I was doing some simple
>> benchmarks on file i/o between Java, ABCL, and SBCL and I'm a bit
>> shocked,
>> honestly.
>>
>> Reading a 2.5M file in 16M chunks in (using iso-8859-1):
>> - abcl takes a tad over 1 second
>> - sbcl takes 0.04 seconds
>>
>> Reading a 5.8G file in 16M chunks in (using iso-8859-1 for Lisp, for
>> Java
>> it's just bytes):
>> - abcl takes...too long, I gave up
>> - sbcl takes between 20 and 21 seconds
>> - Java takes 1.5 seconds
>>
>> These are all run on the same computer using the same files, etc.
>>
>> What's up with this? Thoughts? I'd heard that SBCL should be as fast
>> as C
>> under at least some circumstances. I'd wager that C is at least as
>> fast as
>> Java (probably faster).
>>
>> Thanks,
>> Garrett Dangerfield. (he/him/his)
>>
>> P.S. Don't get me wrong, I *LOVE* Lisp, I'm trying to get away from
>> Java
>> as
>> fast as I can (the syntax is killing me slowly). I've used ABCL in
>> projects before (it was wonderful, Java doesn't handle XML well).
>>
>> Lisp code:
>> (with-open-file (stream "/media/danger/OS/temp/jars.txt"
>> :external-format
>> :iso-8859-1) ; great_expectations.iso
>> (let ((size (file-length stream))
>> (buffer-size (* 16 1024 1024)) ; 16M
>> )
>> (time
>> (loop with buffer = (make-array buffer-size :element-type 'character)
>> for n-characters = (read-sequence buffer stream)
>> while (< 0 n-characters)))
>> )))
>>
>> Java code:
>> private static final int BUFFER_SIZE = 16 * 1024 * 1024;
>> try (InputStream in = new
>> FileInputStream("/media/danger/OS/temp/great_expectations.iso"); ) {
>> byte[] buff = new byte[BUFFER_SIZE];
>> int chunkLen = -1;
>> long start = System.currentTimeMillis();
>> while ((chunkLen = in.read(buff)) != -1) {
>> System.out.println("chunkLen = " + chunkLen);
>> }
>> double duration = System.currentTimeMillis() - start;
>> duration /= 1000;
>> System.out.println(String.format("it took %,2f secs", duration));
>> } catch (Exception e) {
>> e.printStackTrace(System.out);
>> } finally {
>> System.out.println("Done.");
>> }
>>
>> Robert P. Goldman
>> Research Fellow
>> Smart Information Flow Technologies (d/b/a SIFT, LLC)
>>
>> 319 N. First Ave., Suite 400
>> Minneapolis, MN 55401
>>
>> Voice: (612) 326-3934
>> Email: rpgoldman at SIFT.net
>>
Robert P. Goldman
Research Fellow
Smart Information Flow Technologies (d/b/a SIFT, LLC)
319 N. First Ave., Suite 400
Minneapolis, MN 55401
Voice: (612) 326-3934
Email: rpgoldman at SIFT.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/armedbear-devel/attachments/20221021/ae8249d1/attachment-0001.html>
More information about the armedbear-devel
mailing list