We would like to create a new data frame by reading CSV data from each company in a for statement and extracting only the necessary columns.
After loading the file, how can I extract sales of A and B as shown in the attached image and create a new data frame?(In rare cases, sales data may be missing.)
Thank you very much for letting me know.
# file specification
outfolder='...r/'
# I want to load data in loop (Ids (10 and 20) are kept in a data frame called data).
for index, id in enumerate (data['ID'].unique()):
# Read in folder + filename (list10_0.5h, list20_0.5h)
df=(pd.read_csv(outfolder+str('list')+str(id)+'_0.5h.csv', usecols=[3])
Company A's sales data
Company B sales data
I want to extract sales data from A and B and create a new data frame
python csv
"If ""rarely there is a loss in sales data"" means, for example, one line of 2017/02 may be missing, you can first create DataFrame
for the month and merge them one by one."
# file specification
outfolder='...r/'
df_base=DataFrame({
Moon: ['2017/01', '2017/02', '2017/03', '2017/04', '2017/05', '2017/06', '2017/07', '2017/08', '2017/09', '2017/10']
})
# I want to load data in loop (Ids (10 and 20) are kept in a data frame called data).
for index, id in enumerate (data['ID'].unique()):
# Read in folder + filename (list10_0.5h, list20_0.5h)
df=(pd.read_csv(outfolder+str('list')+str(id)+'_0.5h.csv', usecols=[3])
tmp_df = df [['month', 'sales']]
df_base=pd.merge(df_base, tmp_df, on='month', how='left')
df_base=df_base.rename(columns={'Sales':df['name'].unique()[0]})
質問 Looking at the code in the question, it seems that no processing has been implemented at all."Then, I would have asked someone else to implement it, so I think it would be better if you asked me a question like ""I implemented it like this, but it doesn't work well in this part."""
Load all CSV files, store them in a single data frame, and then divide them by company using pandas.DataFrame.groupby()
.Then, the split data frame is keyed to 月month 」 and outer join (pandas.DataFrame.merge()
).Therefore, the missing value is NaN
.
import pandas as pd
from functools import reduction
data=pd.DataFrame({'ID':[10,20,10,20,10,20]})
outfolder='...r/'
dfs=reduce(
lambdal, r:pd.merge(l,r,how = 'outer', on = 'month',
pd.concat([
pd.read_csv(f'{outfolder} list {id}_0.5h.csv',
parse_dates=['Month', thousands=',')
for id in data.ID.unique()])
.groupby('name')
.pipe(lambda grp:[
g[['month', 'sales']].rename(columns={'sales':k})
fork, ingrp
])
)
pd.set_option('display.unicode.east_asian_width', True)
print(dfs)
=>
# Company B does not have sales data for June 1, 2017 (deficit)
month AB
0 2017-01-01 9786.351 6459.0
1 2017-02-01 8464.354 5586.0
2 2017-03-01 20291.400 13392.0
3 2017-04-01 13129.610 8666.0
4 2017-05-01 33974.490 22423.0
5 2017-06-0136048.030 NaN
6 2017-07-01 48785.010 32198.0
7 2017-08-01 12199.740 8052.0
8 2017-09-01 6442.446 4252.0
9 2017-10-01 47193.700 31148.0
© 2024 OneMinuteCode. All rights reserved.