For example,
1 2
2 3 4
3
When I tried to load a file like this into read_table
of pandas
, I saw 12
in the first row and automatically recognized that there were two line columns, so the second row with 234
showed a tokenizing error
.
There are 24 text files that need to be retrieved, but it takes a long time to open one text file beyond 1GB, so it is difficult to bring up rows with the maximum column one by one.
I'm worried about using the bad_line_error = False
option because I need all the data.
When I tried to list it as open().read().splitlines
and put it in the dataframe, the code was interrupted due to insufficient memory. (Of course, I got the 64 bit version)
I wish I could set the number of columns in advance, but I can't think of a way. If anyone knows, please give me some advice.
python pandas
I looked it up... I think there's a way to just take them all in the first row and split them appropriately while going around each row. https://stackoverflow.com/a/50914351
But even if it's a bit annoying for me :
If the delimiter of the raw data is a spacing, then read the raw data file, find the row with the most spacing, and find how many spaces there are, and then the +1 value will be the maximum number of columns. I think we can just roughly do something with that.
Good luck.
© 2025 OneMinuteCode. All rights reserved.