I want to read multiple csv files in python, extract columns according to conditions, and output them as csv files.

Asked 2 years ago, Updated 2 years ago, 22 views

Python reads multiple files (1000 files) of csv, extracts certain columns according to the conditions, and
I would like to print it to a new csv file.
file1: [id, time, value][1,3.5,6][2,2.0,4][3,2.6,8]...[30,15.5,50]
If I had only one file, I could have done what I wanted to do with the following script, but how do I change the script to do it with 1000 files?

import pandas as pd
df = pd.read_csv("list1.csv")
df = (df[df["time"]<0.5])
df.to_csv("list1_0.5h.csv")

I apologize for the rudimentary content, but I would appreciate it if you could teach me.
Thank you for your cooperation.

python

2022-09-30 18:08

3 Answers

You can use pd.concat as a way to combine data frames with the same column.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html

Assuming that the file names are defined in a unified manner and are named list1.csv, list2.csv, ..., list100.csv, you can pass the corresponding ID to a string representing the file name.This can be done in list composition.

import pandas as pd
df = pd.concat(
    [pd.read_csv("list{}.csv".format(i+1))) for i in range(100)])
df = df [df["time"] <0.5]
df.to_csv("list1to100_0.5h.csv")


2022-09-30 18:08

As others have pointed out, if you think of creating an output file in one input file (as shown in the example), why don't you simply use glob?

Suppose you have 1000 csv files similar to "list1.csv" in the same directory.
I think you can do the following.

import pandas as pd
import glob

FNs=glob.glob("list*.csv")

for fn in FNs:
    df = pd.read_csv(fn)
    df = (df[df["time"]<0.5])
    out_fn = fn.split(".");
    df.to_csv(out_fn[0]+"_0.5h."+out_fn[1])

The glob finds a particular file and puts it in the list.


2022-09-30 18:08

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.