[PATCH] Unnecessary copies and pitfalls in read-stream-content-into-byte-vector
Sebastian Melzer
semz at semelz.de
Sun Feb 12 10:41:03 UTC 2023
If a byte stream s has exactly n bytes remaining, then
(read-stream-content-into-byte-vector s :initial-size n) will result in
two unnecessary copies and a total allocation of 4n bytes:
- n for the initial buffer;
- 2n when the buffer is resized and copied since r-s-c-i-b-v doesn't yet
realize that EOF has been reached;
- n when the now oversized buffer is downsized and copied at the end.
Perversely, this means that a size overestimate is more efficient than a
correct estimate. I've attached a possible patch for this below. Sadly
there is no peek-byte, so I had to opt for a rather awkward read-byte
construction; maybe someone knows a better approach.
The patch additionally gets rid of the pitfall where idioms similar to
(r-s-c-i-b-v s :initial-size (file-length s)) fail when (file-length s)
is zero, but I don't know if broadening the type of initial-size would
be considered a breaking change. The internal %length argument also
becomes basically unnecessary, but I'd bet someone out there uses it as
a workaround.
diff --git a/alexandria-1/io.lisp b/alexandria-1/io.lisp
--- a/alexandria-1/io.lisp
+++ b/alexandria-1/io.lisp
@@ -95,29 +95,35 @@ unless it's NIL, which means the system default."
(defun read-stream-content-into-byte-vector (stream &key ((%length length))
(initial-size 4096))
"Return \"content\" of STREAM as freshly allocated (unsigned-byte 8) vector."
- (check-type length (or null non-negative-integer))
- (check-type initial-size positive-integer)
- (do ((buffer (make-array (or length initial-size)
- :element-type '(unsigned-byte 8)))
- (offset 0)
- (offset-wanted 0))
- ((or (/= offset-wanted offset)
- (and length (>= offset length)))
- (if (= offset (length buffer))
- buffer
- (subseq buffer 0 offset)))
- (unless (zerop offset)
- (let ((new-buffer (make-array (* 2 (length buffer))
- :element-type '(unsigned-byte 8))))
- (replace new-buffer buffer)
- (setf buffer new-buffer)))
- (setf offset-wanted (length buffer)
- offset (read-sequence buffer stream :start offset))))
+ (check-type length (or null non-negative-integer)) ; for compatibility
+ (check-type initial-size non-negative-integer)
+ (setf initial-size (or length initial-size))
+ (let ((result (make-array initial-size :element-type '(unsigned-byte 8)))
+ (bytes-read 0))
+ (loop
+ (setf bytes-read (read-sequence result stream :start bytes-read))
+ (when (and length (>= bytes-read length))
+ (return))
+ ;; There is no PEEK-BYTE, so we just try to read a byte.
+ (let ((next-byte (read-byte stream nil nil)))
+ (when (null next-byte)
+ (return))
+ (let ((new-result (make-array (if (zerop (length result))
+ 4096
+ (* 2 (length result)))
+ :element-type '(unsigned-byte 8))))
+ (replace new-result result :end1 bytes-read :end2 bytes-read)
+ (setf (aref new-result bytes-read) next-byte
+ result new-result)
+ (incf bytes-read))))
+ (if (= bytes-read (length result))
+ result
+ (subseq result 0 bytes-read))))
(defun read-file-into-byte-vector (pathname)
"Read PATHNAME into a freshly allocated (unsigned-byte 8) vector."
(with-input-from-file (stream pathname :element-type '(unsigned-byte 8))
- (read-stream-content-into-byte-vector stream '%length (file-length stream))))
+ (read-stream-content-into-byte-vector stream :initial-size (file-length stream))))
(defun write-byte-vector-into-file (bytes pathname &key (if-exists :error)
if-does-not-exist)
More information about the alexandria-devel
mailing list