Load any file in binary using apache pyarrow
We put the binary in a list and output it in a parquet format.
I have verified it with the following source, but if you print it in the packet format,
The file size is seven times larger than the original file.
When I opened the output file as a text file, the data seemed to be uncompressed
There is no compression.
Is it possible to compress it somehow?
#-*-coding:utf-8-*-
import pyarrow aspa
import pyarrow.parquet aspq
open_data_path="test_log_70mb.txt"
file_list = [ ]
with open(open_data_path, 'rb') asf:
data=bytes(f.read())
file_list.append(data)
pa_data=[
pa.array(file_list)
]
pa_batch=pa.RecordBatch.from_arrays(pa_data, ["file_list"))
table=pa.Table.from_batch([pa_batch])
pq.write_table(table, "./test_parquet", compression="gzip")
Results
total71734
-rwxrwxrwx1vagrantvagrant423 Aug 2502:05 test2.py
-rwxrwxrwx1vagrant vagrant 73454817 Aug 2502:05 test_log_70mb.txt
vagrant@apex01: /vagrant/arrow_test$python test2.py
vagrant@apex01:/vagrant/arrow_test$ls-l
total511880
-rwxrwxrwx1vagrantvagrant423 Aug 2502:05 test2.py
-rwxrwxrwx1vagrant vagrant 73454817 Aug 2502:05 test_log_70mb.txt
-rwxrwxrwx1vagrant vagrant 450709316 Aug 2502:05 test_parquet
Environment
python 2.7
pyarrow 0.6.0
compression='snappy'
How about ?
584 PHP ssh2_scp_send fails to send files as intended
623 Uncaught (inpromise) Error on Electron: An object could not be cloned
573 rails db:create error: Could not find mysql2-0.5.4 in any of the sources
922 When building Fast API+Uvicorn environment with PyInstaller, console=False results in an error
© 2024 OneMinuteCode. All rights reserved.