Is there a way to load a file with a different number of columns in each row into the read_table in Pandas?

For example,

1 2
2 3 4
3

When I tried to load a file like this into read_table of pandas, I saw 12 in the first row and automatically recognized that there were two line columns, so the second row with 234 showed a tokenizing error.

There are 24 text files that need to be retrieved, but it takes a long time to open one text file beyond 1GB, so it is difficult to bring up rows with the maximum column one by one.

I'm worried about using the bad_line_error = False option because I need all the data.

When I tried to list it as open().read().splitlines and put it in the dataframe, the code was interrupted due to insufficient memory. (Of course, I got the 64 bit version)

I wish I could set the number of columns in advance, but I can't think of a way. If anyone knows, please give me some advice.

python pandas

2022-09-20 16:37

1 Answers

I looked it up... I think there's a way to just take them all in the first row and split them appropriately while going around each row. https://stackoverflow.com/a/50914351

But even if it's a bit annoying for me :

If the delimiter of the raw data is a spacing, then read the raw data file, find the row with the most spacing, and find how many spaces there are, and then the +1 value will be the maximum number of columns. I think we can just roughly do something with that.

Good luck.

2022-09-20 16:37

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656