Python pickle file mode (binary mode necessity)

Asked 2 years ago, Updated 2 years ago, 26 views

Is it necessary to save files in binary mode when saving files with python pickle? For example,

import pickle
import numpy as np
x = np.random.random(100,100,100)

# binary mode
pickle.dump(x, open('hoge.pkl', 'wb')))
# not binary mode
pickle.dump(x, open('fuga.pkl', 'w')))

If so, it seems that the 'wb' option and the 'w' option will get exactly the same output file.(There is no difference if you take diff)

ASK: Is there any situation that would be bad without this 'b' (binary option)?
Is it possible to change what is saved by adding or not adding 'b'?

By the way, I use Linux (Ubuntu).

python

2022-09-30 21:10

2 Answers

Python 3 now has the type bytes, which distinguishes it from str.
The write of the file object opened in text mode requests str, so the code for the question fails.

>>pickle.dump(x,open('fuga.pkl','w')))
Traceback (most recent call last):
  File "<ipython-input-27-c8e3989f5c8e>", line 1, in <module>
    pickle.dump(x, open('fuga.pkl', 'w')))
TypeError: must be str, not bytes

Python 2 does not cause errors, but in text mode, if there is a new line character in the data, it will be changed to the OS standard new line code, so the contents may change.I don't know if the data you create contains newline characters and their changes will affect the data loading.


2022-09-30 21:10

From now on, we will talk about Python 2.7.10.

Python's open() function, but the entity is the file_init() function of fileobject.c.

python 2.7.10/Objects/fileobject.c:file_init()

file_init (PyObject*self, PyObject*args, PyObject*kwds)
                  :
  if(open_the_file(foself, name, mode) == NULL)

python 2.7.10/Objects/fileobject.c:open_the_file()

open_the_file (PyFileObject*f, char*name, char*mode)
          :
# ifdef MS_WINDOWS
          :
  f->f_fp =_wfopen(PyUnicode_AS_UNICODE(f->f_name),
                    PyUnicode_AS_UNICODE(wmode);
          :
#endif
if(NULL==f->f_fp&&NULL!=name){
          :
  f->f_fp = fopen(name, newmode);

Windows uses _wfopen(), and others (Linux, OSX, FreeBSD, etc.) use fopen().If you look at fopen(3) in Linux man, you can see

fopen(3)

The mode string can also include the letter 'b' either as a last character or as a character between the characters in any of the two-character strings described above. This is strictly for compatibility with C89 and has no effect; the 'b' is ignored on all POSIX conforming systems, including Linux. (Other systems may treat text files and binary files differently, and adding the 'b' may be a good idea if you do I/O to a binary file and expect that your program may be ported to non-UNIX environments.)

For POSIX-compliant operating systems, the fopen() mode specification is the same regardless of whether the b option is attached.

On the other hand, what about Windows?

MSDN:fopen,_wfopen

b

Open binary (untranslated) mode;translations involving carriage-return and line-feed characters are suppressed.

Also, if you look at the pickle.dump() function, you will eventually see the cPickle.c:write_file() function output to the file.

python 2.7-2.7.10/Modules/cPickle.c:write_file()function

write_file(Picklerobject*self, const char*s, Py_ssize_tn)
{
  size_tnbyteswritten;
        :
  nbyteswritten=fwrite(s,sizeof(char),n,self->fp);
        :

If you open a file in text mode on Windows, fwrite() will cause '\n'->'\r\n' conversion.

MSDN:fwrite

The write function writes up to count items, of size length each, from buffer to the output stream. The file pointer associated with stream (if there is one) is incremented by the number of bytes actually written.If stream is opened in text mode, each line has been renewed.

Given that Python is multi-platform capable including Windows, I think it would be better to open the file (pointer) to pickle.dump() in binary mode.


2022-09-30 21:10

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.