I would like to trace the read CSV file every 3 months from the date of purchase and output the accumulated data as separate CSV files.

Asked 2 years ago, Updated 2 years ago, 105 views

I'm a beginner in programming.
I would like to trace the read CSV file every 3 months based on the date of purchase, and output the accumulated data as separate CSV files.
The image is
 File 1: January 1, 2020 - April 1, 2020
 Second file: January 1, 2020 - July 1, 2020
 File 3: January 1, 2020 - October 1, 2020
 and so on

There is no error message, but all the output files are of the same time period.
The cumulative format is not the same as above.
Could you please let me know the revised proposal?

# Load the combined csv file
df = pd.read_csv('/content/drive/MyDrive/rfm/*.csv')

# Convert purchase date calculated format
df['Purchase Date'] = pd.to_datetime(df['Purchase Date'])

# Divide and output data every 3 months
dt_st=min(df['Purchase Date'].to_pydatetime()
dt_ed=max(df['Purchase Date'].to_pydatetime()

dt_cur=dt_st
while_cur<dt_ed:
    dt_cur=dt_cur+relatedelta(months=3)
    mask=(df['Purchase Date']>=dt_st)&(df['Purchase Date']<=dt_cur)
    grp=df.groupby(mask) 
    forg, df in grp:
        g = dt_cur.strftime('%Y%m%d')
        df.to_csv(f'/content/drive/MyDrive/rfm/rfm_until_{g}.csv', index=False)

python pandas google-colaboratory

2022-09-30 18:00

1 Answers

You may want to use pandas.Grouper —pandas1.3.5 documentation.

import pandas as pd
importio

csv_data='"
date of purchase, data
January 1, 2020 a
February 13, 2020, b
March 26, 2020, c
2020/4/3,d
May 17, 2020, e
2020/6/20,f
2020/7/8, g
August 15, 2020, h
September 22, 2020, i
'''

df=pd.read_csv(io.StringIO(csv_data), parse_dates=['Purchase Date'])
(
  df.sort_values('Purchase Date')
    .groupby(pd.Grouper(key='Purchase Date', freq='3MS'))
    .apply(lambdax:
      x.to_csv(f'rfm_from_{x["Purchase Date"].min().strftime("%Y%m%d")}_'+
               f'to_{x["Purchase Date"].max().strftime("%Y%m%d")} .csv',
               index=False))
)

After execution

$ls-1*.csv
rfm_from_20200101_to_20200326.csv
rfm_from_20200403_to_20200620.csv
rfm_from_20200708_to_20200922.csv

$ for fin*.csv; do echo "$f"; cat$f; echo; done
rfm_from_20200101_to_20200326.csv
date of purchase, data
2020-01-01,a
2020-02-13,b
2020-03-26,c

rfm_from_20200403_to_20200620.csv
date of purchase, data
2020-04-03,d
2020-05-17,e
2020-06-20,f

rfm_from_20200708_to_20200922.csv
date of purchase, data
2020-07-08,g
2020-08-15,h
2020-09-22,i


2022-09-30 18:00

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.