I want to read multiple csv files in python, extract columns according to conditions, and output them as csv files.

Python reads multiple files (1000 files) of csv, extracts certain columns according to the conditions, and
I would like to print it to a new csv file.
file1: [id, time, value][1,3.5,6][2,2.0,4][3,2.6,8]...[30,15.5,50]
If I had only one file, I could have done what I wanted to do with the following script, but how do I change the script to do it with 1000 files?

import pandas as pd
df = pd.read_csv("list1.csv")
df = (df[df["time"]<0.5])
df.to_csv("list1_0.5h.csv")

I apologize for the rudimentary content, but I would appreciate it if you could teach me.
Thank you for your cooperation.

python

2022-09-30 18:08

3 Answers

You can use pd.concat as a way to combine data frames with the same column.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html

Assuming that the file names are defined in a unified manner and are named list1.csv, list2.csv, ..., list100.csv, you can pass the corresponding ID to a string representing the file name.This can be done in list composition.

import pandas as pd
df = pd.concat(
    [pd.read_csv("list{}.csv".format(i+1))) for i in range(100)])
df = df [df["time"] <0.5]
df.to_csv("list1to100_0.5h.csv")

2022-09-30 18:08

In this article pandas, I would like to delete rows from csv according to the conditions, and I would like to go back to the equivalent.
For more flexibility, the command-line parameter processing introduced in the comment is incorporated so that parameters for folders and file names can be specified from the outside.

import sys
importos
import pandas aspd

# Destination folder specification (input/output can be specified separately: both current folders here)
infolder='./'
outfolder='./'

# Information on how to assemble the target file name (for specific string + numeric format)
fprefix='list'# string at the beginning of the filename
fsuffixFirst=1# The first number in the filename
fsuffixMaxPlus1 = 1001# filename plus +1
fsuffixStep=1# Increasing number interval in filename

# Loop 1000 files
for fsuffix in range (fsuffixFirst, fsuffixMaxPlus1, fsuffixStep):
    basefname=fprefix+str(fsuffix)#Assemble file name only
    inputfile=infolder+basefname+'.csv'#pathname creation
    ifos.path.exists(inputfile): #Check if a file exists before processing
        df = pd.read_csv(inputfile)
        df = (df[df["time"]<0.5])
        df.to_csv(outfolder+basefname+'_0.5h.csv', index=False)

2022-09-30 18:08

As others have pointed out, if you think of creating an output file in one input file (as shown in the example), why don't you simply use glob?

Suppose you have 1000 csv files similar to "list1.csv" in the same directory.
I think you can do the following.

import pandas as pd
import glob

FNs=glob.glob("list*.csv")

for fn in FNs:
    df = pd.read_csv(fn)
    df = (df[df["time"]<0.5])
    out_fn = fn.split(".");
    df.to_csv(out_fn[0]+"_0.5h."+out_fn[1])

The glob finds a particular file and puts it in the list.

2022-09-30 18:08

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656