How do I process the contents of a csv file by column?

Asked 2 years ago, Updated 2 years ago, 21 views

I'm a beginner, so I may not be able to say enough, but I appreciate your cooperation.

The csv file (filename "data") says the following in order from the first row to the first row.

data.csv

0.10.2
0.3 0.5
0.2 0.7
0.9 1.3
1.5 0.8
0.8 1.2
1.1 0.9
hereinafter abbreviated

I was trying to write a program that looked at the first line of each column and found a number above 1.0, deleted the previous data, and saved it in csv, but I didn't know how to write it, so I asked you a question.

The true value of an array with more than one element is ambivalent.Use a.any() or a.all()

The error occurred.

Current State Code:

import numpy as np
import pandas aspd
from pandas import Series, DataFrame
import matplotlib.pyplot asplt
%matplotlib inline

data = 'data.csv'
data=pd.read_csv(data)
data=np.array(data)

from itertools import dropwhile

for x in (dropwhile (lambday:y<1.0, data)) :
    print(x)

In the end, I would like to aim for the following shape.

1.51.3
0.8 0.8
1.1 1.2
    0.9
hereinafter abbreviated

python

2022-09-29 22:19

2 Answers

If you apply the previous answer and cut out the columns with slices, you will be able to do the same.
Retrieve (extract) elements, rows, and columns of the NumPy array ndarray, substitute
How to access the is column of a NumPy multidimensional array?

The data in each column that may have different lengths are summarized in itertools.zip_longest(*iterables, fillvalue=None).
After the fromitertools... line, you can do it as follows:
Add zip_longest to import from itertools.

 from ittertools import dropwhile, zip_longest

d0 = [x for x in dropwhile (lambday:y<1.0, data[:,0])]
d1 = [x for x in dropwhile (lambday:y<1.0, data[:,1])]

NewData=np.array([[d0,d1]for d0,d1 in zip_longest(d0,d1,fillvalue='')])

np.savetxt('NewData.csv', NewData, fmt='%s', delimiter=',')

Like @metropolis, I thought that I could handle changes in the number of columns.
Loop the number of columns in the numpy.array shape.

For csv conversion, we use pandas in the same way, but after using DataFrame, we use transposition and fillna() to shape it.
Replace (transpose) rows and columns in pandas.DataFrame
Exclude (delete)/replace (fill in)/extract NaN in pandas

I do not use zip_longest.

work=[]
for i in range (data.shape[1]):
    work.append([x for x in dropwhile(lambday:y<1.0, data[:,i]])])

df = pd.DataFrame(work).T.fillna(')

df.to_csv('NewData.csv', header=False, index=False)


2022-09-29 22:19

Connect columns with pandas.concat in the same way as your previous answer.

Is there a way to process each column even if there are three columns of data like this?

data.shape[1] (the number of columns in the data frame) is running list compression, so it can accommodate any multiple columns.

import pandas as pd

data = 'data.csv'
data=pd.read_csv(data,header=None)
output = pd.concat([
  data.loc [data[i].ge(1.0).idxmax():][i].reset_index(drop=True)
  for i in range (data.shape[1])
],axis=1)

output.to_csv('data_filtered.csv', header=False, index=False)
$cat data_filtered.csv
1.5,1.3
0.8,0.8
1.1,1.2
,0.9


2022-09-29 22:19

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.