File I/O Metrics

Robert Goldman rpgoldman at sift.net
Fri Oct 21 21:42:35 UTC 2022


I took a file of about 450MB of characters.  Using SBCL, when I read it 
like this:

```
  (defun do-test2 ()
            (with-open-file (stream *text-file*)
              (let ((buffer-size (* 16 1024 1024)) ; 16M
                    )
                (time
                 (loop with buffer = (make-array buffer-size 
:element-type 'character)
                       for n-characters = (read-sequence buffer stream)
                       while (< 0 n-characters))))))
```

It took an average of 1.08125s to read (4 trials).

This procedure:
```
(defun do-test3 ()
            (with-open-file (stream *text-file* :element-type 
'(unsigned-byte 8))
              (let ((buffer-size (* 16 1024 1024)) ; 16M
                    )
                (time
                 (loop with buffer = (make-array buffer-size 
:element-type '(unsigned-byte 8))
                       for n-characters = (read-sequence buffer stream)
                       while (< 0 n-characters))))))
```

It took an average of 0.07s

Modifying this to set the `:external-format` to `:iso8859-1` and reading 
into an array of `:element-type 'character` it takes an average of 
0.8095s

So there seems to be *some* overhead to the unicode handling. Note that 
I didn't have a file at hand that actually had ISO8859-1 in it, so I 
don't know if that would have complicated matters.

This suggests that just moving around bits without worrying about their 
interpretation *may* be faster than treating them as characters. So you 
could see if that changes your results at all.

I'm not a real expert in CL file I/O, so it's likely that this could be 
done better.


On 21 Oct 2022, at 16:18, Garrett Dangerfield wrote:

> I tried changing (make-array buffer-size :element-type 'character)
> to
> (make-array buffer-size :element-type 'byte)
> and I got additional warnings and it took 70 seconds instead of 20.
>
> Thanks,
> Garrett.
>
> On Fri, Oct 21, 2022 at 1:47 PM Robert Goldman <rpgoldman at sift.net> 
> wrote:
>
>> I don't know what data you are reading but is there any chance that 
>> the
>> difference is that when you read text in lisp as ISO-8859-1 lisp is
>> actually processing the text as unicode, but when you are reading it 
>> in
>> Java you are just slamming raw bytes into memory?
>>
>> Maybe this is relevant?
>> https://stackoverflow.com/questions/979932/read-unicode-text-files-with-java
>>
>> I don't use Java myself, so I can't say, and I don't have access to 
>> your
>> data, but it does seem like the Java code is doing something simpler 
>> than
>> the Lisp code.
>>
>> What happens if you change your Lisp code to read-sequence of type 
>> byte
>> instead of character?
>>
>> On 21 Oct 2022, at 13:43, Garrett Dangerfield wrote:
>>
>> I don't want to cause a firestore here but I was doing some simple
>> benchmarks on file i/o between Java, ABCL, and SBCL and I'm a bit 
>> shocked,
>> honestly.
>>
>> Reading a 2.5M file in 16M chunks in (using iso-8859-1):
>> - abcl takes a tad over 1 second
>> - sbcl takes 0.04 seconds
>>
>> Reading a 5.8G file in 16M chunks in (using iso-8859-1 for Lisp, for 
>> Java
>> it's just bytes):
>> - abcl takes...too long, I gave up
>> - sbcl takes between 20 and 21 seconds
>> - Java takes 1.5 seconds
>>
>> These are all run on the same computer using the same files, etc.
>>
>> What's up with this? Thoughts? I'd heard that SBCL should be as fast 
>> as C
>> under at least some circumstances. I'd wager that C is at least as 
>> fast as
>> Java (probably faster).
>>
>> Thanks,
>> Garrett Dangerfield. (he/him/his)
>>
>> P.S. Don't get me wrong, I *LOVE* Lisp, I'm trying to get away from 
>> Java
>> as
>> fast as I can (the syntax is killing me slowly). I've used ABCL in
>> projects before (it was wonderful, Java doesn't handle XML well).
>>
>> Lisp code:
>> (with-open-file (stream "/media/danger/OS/temp/jars.txt" 
>> :external-format
>> :iso-8859-1) ; great_expectations.iso
>> (let ((size (file-length stream))
>> (buffer-size (* 16 1024 1024)) ; 16M
>> )
>> (time
>> (loop with buffer = (make-array buffer-size :element-type 'character)
>> for n-characters = (read-sequence buffer stream)
>> while (< 0 n-characters)))
>> )))
>>
>> Java code:
>> private static final int BUFFER_SIZE = 16 * 1024 * 1024;
>> try (InputStream in = new
>> FileInputStream("/media/danger/OS/temp/great_expectations.iso"); ) {
>> byte[] buff = new byte[BUFFER_SIZE];
>> int chunkLen = -1;
>> long start = System.currentTimeMillis();
>> while ((chunkLen = in.read(buff)) != -1) {
>> System.out.println("chunkLen = " + chunkLen);
>> }
>> double duration = System.currentTimeMillis() - start;
>> duration /= 1000;
>> System.out.println(String.format("it took %,2f secs", duration));
>> } catch (Exception e) {
>> e.printStackTrace(System.out);
>> } finally {
>> System.out.println("Done.");
>> }
>>
>> Robert P. Goldman
>> Research Fellow
>> Smart Information Flow Technologies (d/b/a SIFT, LLC)
>>
>> 319 N. First Ave., Suite 400
>> Minneapolis, MN 55401
>>
>> Voice: (612) 326-3934
>> Email: rpgoldman at SIFT.net
>>


Robert P. Goldman
Research Fellow
Smart Information Flow Technologies (d/b/a SIFT, LLC)

319 N. First Ave., Suite 400
Minneapolis, MN 55401

Voice:	(612) 326-3934
Email:    rpgoldman at SIFT.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/armedbear-devel/attachments/20221021/ae8249d1/attachment-0001.html>


More information about the armedbear-devel mailing list