Opened 9 years ago

Closed 7 years ago

#1200 closed defect (fixed)

file->byte-blob can only read files in text mode on Windows

Reported by: dthedens Owned by:
Priority: minor Milestone: someday
Component: extensions Version: 4.9.x
Keywords: byte-blob Cc:
Estimated difficulty:

Description

On Windows, file->byte-blob defaults to reading files in text mode and there is no way to override this (as there is with with-input-from-file and related procedures). As a result, files intended to be binary data that contain eof bytes will be incomplete on Windows. Either a default to binary read mode or optional file mode parameters for #:text and #:binary would be helpful.

Change History (8)

comment:1 Changed 9 years ago by Ivan Raikov

Hello, thanks for your report. Can you provide some example code / test case with your desired API? I've never used the file mode parameters of with-input-from-file, and it is not clear from the documentation if these are keyword arguments or just optional arguments.

comment:2 Changed 9 years ago by dthedens

This code creates an 8 byte file with an eof character in the middle and reads it with file->byte-blob. On Windows it returns 4 as the length of the result instead of the expected 8.

(use srfi-4 byte-blob)
(begin 
  (with-output-to-file "testfile.dat"
    (lambda () (write-u8vector (list->u8vector (list 1 2 3 4 26 25 24 23)))))
  (let ((bblob (file->byte-blob "testfile.dat")))
    (byte-blob-length bblob)))

This workaround code uses the #:binary keyword to read in the file correctly and returns 8.

(use srfi-4 byte-blob)
(begin 
  (with-output-to-file "testfile.dat"
    (lambda () (write-u8vector (list->u8vector (list 1 2 3 4 26 25 24 23)))))
  (let ((bblob (with-input-from-file "testfile.dat"
  		 (lambda () (byte-blob-read (current-input-port) 8))
		 #:binary)))
    (byte-blob-length bblob)))

A compatible change to the API would be to allow

(file->byte-blob "testfile.dat" #:binary)

using an optional argument (#:text is the other option with other procedures). I could also see making the behavior in file->byte-blob be #:binary. This would not affect linux where everything is binary, but in theory would change behavior on Windows.

comment:3 Changed 9 years ago by Ivan Raikov

Thanks for the detailed example. I think it is sensible to add an optional mode argument to file->byte-blob. I will add that and make a new release shortly.

comment:4 Changed 9 years ago by Ivan Raikov

I have added the mode argument, but I don't know if it works on Windows. Can you install byte-blob from trunk and run the tests?

comment:5 Changed 9 years ago by dthedens

Tests for my own use cases run fine with the new version. However, the tests revealed some other subtleties related to text versus binary mode on Windows. The default mode for Windows file operations (and maybe all ports?) is text mode. In this mode, line endings are automatically converted to and from Windows-style CR/LF (see https://en.wikipedia.org/wiki/Newline#In_programming_languages, this is considered part of the C standard). I should probably also point out that I built my Windows version of Chicken with mingw64. I don't know if a cygwin or other build would be different.

In the tests, the byte-blob data that is supposed to be written to the file is (10 9 8 9 6 9 4 9 2). However, this is written in text mode, and so the 10 is interpreted as a LF and converted to CR LF, so the resulting (binary) data in the file is actually (13 10 9 8 9 6 9 4 9 2).

When the byte-blob-read test is run, the port is opened in binary mode. The read specifies 9 bytes (since it was expected 9 bytes were written to the file). In this case, the 9 bytes are (13 10 9 8 9 6 9 4 9). This makes the byte-blob-read test fail.

The default (text mode) file->byte-blob test reads all the bytes in the file, (13 10 9 8 9 6 9 4 9 2) and with text mode line-ending conversion reports back (10 9 8 9 6 9 4 9 2). This is the correct result as compared to the data written, so the default text mode test succeeds.

The binary mode file->byte-blob test also reads all the bytes in the file, but does not do the line ending conversion so the data reported back is (13 10 9 8 9 6 9 4 9 2). This test fails since it does not match what was supposed to have been written because it was written in text mode.

Given that the data read for the port and file-based functions is read in binary mode in the test, it would make sense to me to make the temporary file that is created for this in the tests to also be written in #:binary mode. I think this can be done by adding the #:binary keyword to the (open-output-file* fd) call (as further excruciating detail, the open-output-file* call did not work for me on Windows; I had to change it to (open-output-file temp-path). This was still OK since the temp-port value is never used anyway. The failure may be a result of the mingw64 environment, I don't know). Anyway, by writing the temporary test file in binary mode, all the tests pass.

I think the change you made to file->byte-blob to add the optional argument is sufficient. byte-blob-read and byte-blob-write both take ports as arguments, so it's up to the user to make sure it's the correct (text or binary) mode when opening. Now that file->byte-blob has that option, the same is true as well. It is left to the user to assure that reading and writing of files and ports are done with the correct mode (text or binary).

comment:6 Changed 9 years ago by Ivan Raikov

Ok, thanks for checking and the detailed report. I also had the same problem with open-output-file*, but I had forgotten how it is different from open-output-file. I have now modified the test to use open-output-file in binary mode. I will make the release now and let me know if things don't work.

comment:7 Changed 9 years ago by dthedens

(Sorry was traveling for a bit). Yes, it all works fine for me with the new option in the release version. Thanks for updating.

comment:8 Changed 7 years ago by sjamaan

Resolution: fixed
Status: newclosed

Confirmed as fixed by reporter, so this can be closed

Note: See TracTickets for help on using tickets.