I'm a beginner in programming.
I would like to trace the read CSV file every 3 months based on the date of purchase, and output the accumulated data as separate CSV files.
The image is
File 1: January 1, 2020 - April 1, 2020
Second file: January 1, 2020 - July 1, 2020
File 3: January 1, 2020 - October 1, 2020
and so on
There is no error message, but all the output files are of the same time period.
The cumulative format is not the same as above.
Could you please let me know the revised proposal?
# Load the combined csv file
df = pd.read_csv('/content/drive/MyDrive/rfm/*.csv')
# Convert purchase date calculated format
df['Purchase Date'] = pd.to_datetime(df['Purchase Date'])
# Divide and output data every 3 months
dt_st=min(df['Purchase Date'].to_pydatetime()
dt_ed=max(df['Purchase Date'].to_pydatetime()
dt_cur=dt_st
while_cur<dt_ed:
dt_cur=dt_cur+relatedelta(months=3)
mask=(df['Purchase Date']>=dt_st)&(df['Purchase Date']<=dt_cur)
grp=df.groupby(mask)
forg, df in grp:
g = dt_cur.strftime('%Y%m%d')
df.to_csv(f'/content/drive/MyDrive/rfm/rfm_until_{g}.csv', index=False)
You may want to use pandas.Grouper —pandas1.3.5 documentation.
import pandas as pd
importio
csv_data='"
date of purchase, data
January 1, 2020 a
February 13, 2020, b
March 26, 2020, c
2020/4/3,d
May 17, 2020, e
2020/6/20,f
2020/7/8, g
August 15, 2020, h
September 22, 2020, i
'''
df=pd.read_csv(io.StringIO(csv_data), parse_dates=['Purchase Date'])
(
df.sort_values('Purchase Date')
.groupby(pd.Grouper(key='Purchase Date', freq='3MS'))
.apply(lambdax:
x.to_csv(f'rfm_from_{x["Purchase Date"].min().strftime("%Y%m%d")}_'+
f'to_{x["Purchase Date"].max().strftime("%Y%m%d")} .csv',
index=False))
)
After execution
$ls-1*.csv
rfm_from_20200101_to_20200326.csv
rfm_from_20200403_to_20200620.csv
rfm_from_20200708_to_20200922.csv
$ for fin*.csv; do echo "$f"; cat$f; echo; done
rfm_from_20200101_to_20200326.csv
date of purchase, data
2020-01-01,a
2020-02-13,b
2020-03-26,c
rfm_from_20200403_to_20200620.csv
date of purchase, data
2020-04-03,d
2020-05-17,e
2020-06-20,f
rfm_from_20200708_to_20200922.csv
date of purchase, data
2020-07-08,g
2020-08-15,h
2020-09-22,i
© 2024 OneMinuteCode. All rights reserved.