I want to remove rows from csv according to the conditions in pandas

Asked 2 years ago, Updated 2 years ago, 39 views

I would like to read csv in Python's pandas, delete the line according to the conditions, and print it to a new file.

If the csv file (list1.csv) is as follows, focus on time2 and delete three of the ones with time2 less than or equal to 1.0, I think you can run it with the following script.

list1.csv

[time1, time2][0.27,0.45][0.28,0.53][0.3309,0.65987][0.36938,0.8952][0.4396,1.087]...

The value of time2 has already been sorted as shown in .

script

import pandas as pd
df = pd.read_csv("list1.csv")
df_a=(df[df['time2']<1.0])
print(df_a)
df_b = df_a.drop([0,1,2])
print(df_b)

What I'd like to ask you is that loading the csv file and focusing on time2 is the same, but

  • Remove rows with values of 0 or greater and less than 0.5 and output the rest as separate files (list1_0.5h.csv)
  • Similarly, delete the line with the value of time2 equal to or greater than 0.5 and less than 1.0 and output the rest as a separate file (list1_1.0h)

How do I repeat time 2 from 5.5 to less than 6.0?
There is no value less than 0 for time2, but there is a value greater than or equal to 6.0.

Also, if the time2 value adds a limit of 5 lines that can be deleted while deleting a row, and the deleted row reaches 5 lines, how do I rename the file as above without deleting the row?
The new filename is the original filename plus the time zone above the delete range.

I would like to do the same thing in multiple files (list1.csv~list1000.csv).

I'm sorry for the rudimentary content, but I'd appreciate it if you could teach me.
Thank you for your cooperation.

python pandas

2022-09-29 22:00

2 Answers

For example.

import pandas as pd
import numpy as np

df = pd.read_csv('list1.csv')

start, end, tick = 0.0, 6.0, 0.5
remove_max = 5
force in np.range (start, end, tick):
  df.drop(df[(df.time2>=s)&(df.time2<(s+tick))] .index[:remove_max])\
    .to_csv('list1_'+str(s+tick)+'h.csv', index=False)


2022-09-29 22:00

Answer portion
@metropolis let me use the processing and I added a process to expand the number of files.
For more flexibility, incorporate the command-line parameter processing described in the comments in the previous question so that you can specify parameters for folders and filenames from the outside.

Incidentally, the fourth data 0.36938,0.8952 in the csv data example in question is 0.369380000000004,0.8952, which is probably the error when converting to pandas.If necessary, please take additional action.

import sys
importos
import pandas aspd
import numpy as np

# Destination folder specification (input/output can be specified separately: both current folders here)
infolder='./'
outfolder='./'

# Information on how to assemble the target file name (for specific string + numeric format)
fprefix='list'# string at the beginning of the filename
fsuffixFirst=1# Number in the first file
fsuffixMaxPlus1 = 1001# filename plus +1
fsuffixStep=1# Increasing number interval in filename

# The following two lines are copied by @metropolis.
start, end, tick = 0.0, 6.0, 0.5
remove_max = 5

# Loop 1000 files
for fsuffix in range (fsuffixFirst, fsuffixMaxPlus1, fsuffixStep):
    basefname=fprefix+str(fsuffix)#Assemble file name only
    inputfile=infolder+basefname+'.csv'#pathname creation
    ifos.path.exists(inputfile): #Check if a file exists before processing
        # Below are a few copies of @metropolis.
        df = pd.read_csv(inputfile)
        force in np.range (start, end, tick):
            df.drop(df[(df.time2>=s)&(df.time2<(s+tick))] .index[:remove_max])\
            .to_csv(outfolder+basefname+'_'+str(s+tick)+'h.csv', index=False)

Pre-confirmation
It's not an answer, but there are several things to check, and the comments don't make it pretty.
Please check and add the contents of this section.

  • time2 in the description, but time1 in the source.Which one is it?
  • Is it true that you want to create data with deleted time ranges? On the other hand, extracting the range seems to be more common.
  • Does the original data have a value 0 or 6.0 or higher?
  • Am I correct in understanding that the name of the new file is the original file name plus the delimited name after the deletion range?
  • df_a I don't think simply extracting the range will sort it in descending order. Have you sorted the original data?
  • Here's the previous question: pythoncsv Is it related to the multi-file read/write output processing? If so, do I need to consider combining them?


2022-09-29 22:00

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.