[PATCH] Unnecessary copies and pitfalls in read-stream-content-into-byte-vector

Sebastian Melzer semz at semelz.de
Sun Feb 12 10:41:03 UTC 2023


If a byte stream s has exactly n bytes remaining, then
(read-stream-content-into-byte-vector s :initial-size n) will result in
two unnecessary copies and a total allocation of 4n bytes:

- n for the initial buffer;

- 2n when the buffer is resized and copied since r-s-c-i-b-v doesn't yet
  realize that EOF has been reached;

- n when the now oversized buffer is downsized and copied at the end.

Perversely, this means that a size overestimate is more efficient than a
correct estimate. I've attached a possible patch for this below. Sadly
there is no peek-byte, so I had to opt for a rather awkward read-byte
construction; maybe someone knows a better approach.

The patch additionally gets rid of the pitfall where idioms similar to
(r-s-c-i-b-v s :initial-size (file-length s)) fail when (file-length s)
is zero, but I don't know if broadening the type of initial-size would
be considered a breaking change. The internal %length argument also
becomes basically unnecessary, but I'd bet someone out there uses it as
a workaround.


diff --git a/alexandria-1/io.lisp b/alexandria-1/io.lisp
--- a/alexandria-1/io.lisp
+++ b/alexandria-1/io.lisp
@@ -95,29 +95,35 @@ unless it's NIL, which means the system default."
 (defun read-stream-content-into-byte-vector (stream &key ((%length length))
                                                          (initial-size 4096))
   "Return \"content\" of STREAM as freshly allocated (unsigned-byte 8) vector."
-  (check-type length (or null non-negative-integer))
-  (check-type initial-size positive-integer)
-  (do ((buffer (make-array (or length initial-size)
-                           :element-type '(unsigned-byte 8)))
-       (offset 0)
-       (offset-wanted 0))
-      ((or (/= offset-wanted offset)
-           (and length (>= offset length)))
-       (if (= offset (length buffer))
-           buffer
-           (subseq buffer 0 offset)))
-    (unless (zerop offset)
-      (let ((new-buffer (make-array (* 2 (length buffer))
-                                    :element-type '(unsigned-byte 8))))
-        (replace new-buffer buffer)
-        (setf buffer new-buffer)))
-    (setf offset-wanted (length buffer)
-          offset (read-sequence buffer stream :start offset))))
+  (check-type length (or null non-negative-integer)) ; for compatibility
+  (check-type initial-size non-negative-integer)
+  (setf initial-size (or length initial-size))
+  (let ((result (make-array initial-size :element-type '(unsigned-byte 8)))
+        (bytes-read 0))
+    (loop
+      (setf bytes-read (read-sequence result stream :start bytes-read))
+      (when (and length (>= bytes-read length))
+        (return))
+      ;; There is no PEEK-BYTE, so we just try to read a byte.
+      (let ((next-byte (read-byte stream nil nil)))
+        (when (null next-byte)
+          (return))
+        (let ((new-result (make-array (if (zerop (length result))
+                                          4096
+                                          (* 2 (length result)))
+                                      :element-type '(unsigned-byte 8))))
+          (replace new-result result :end1 bytes-read :end2 bytes-read)
+          (setf (aref new-result bytes-read) next-byte
+                result new-result)
+          (incf bytes-read))))
+    (if (= bytes-read (length result))
+        result
+        (subseq result 0 bytes-read))))
 
 (defun read-file-into-byte-vector (pathname)
   "Read PATHNAME into a freshly allocated (unsigned-byte 8) vector."
   (with-input-from-file (stream pathname :element-type '(unsigned-byte 8))
-    (read-stream-content-into-byte-vector stream '%length (file-length stream))))
+    (read-stream-content-into-byte-vector stream :initial-size (file-length stream))))
 
 (defun write-byte-vector-into-file (bytes pathname &key (if-exists :error)
                                                        if-does-not-exist)



More information about the alexandria-devel mailing list