I want to read data frames in CSV format and extract specific columns to create new data frames.

Asked 2 years ago, Updated 2 years ago, 93 views

We would like to create a new data frame by reading CSV data from each company in a for statement and extracting only the necessary columns.
After loading the file, how can I extract sales of A and B as shown in the attached image and create a new data frame?(In rare cases, sales data may be missing.)

Thank you very much for letting me know.

# file specification
outfolder='...r/'
# I want to load data in loop (Ids (10 and 20) are kept in a data frame called data).
for index, id in enumerate (data['ID'].unique()):
    # Read in folder + filename (list10_0.5h, list20_0.5h)
    df=(pd.read_csv(outfolder+str('list')+str(id)+'_0.5h.csv', usecols=[3])

Company A's sales data

Company A's sales data

Company B sales data

B company sales data

I want to extract sales data from A and B and create a new data frame

I want to extract sales data from A and B to create a new data frame

python csv

2022-09-30 17:33

2 Answers

"If ""rarely there is a loss in sales data"" means, for example, one line of 2017/02 may be missing, you can first create DataFrame for the month and merge them one by one."

# file specification
outfolder='...r/'
df_base=DataFrame({
  Moon: ['2017/01', '2017/02', '2017/03', '2017/04', '2017/05', '2017/06', '2017/07', '2017/08', '2017/09', '2017/10']
})
# I want to load data in loop (Ids (10 and 20) are kept in a data frame called data).
for index, id in enumerate (data['ID'].unique()):
    # Read in folder + filename (list10_0.5h, list20_0.5h)
    df=(pd.read_csv(outfolder+str('list')+str(id)+'_0.5h.csv', usecols=[3])
    tmp_df = df [['month', 'sales']]
    df_base=pd.merge(df_base, tmp_df, on='month', how='left')
    df_base=df_base.rename(columns={'Sales':df['name'].unique()[0]})

質問 Looking at the code in the question, it seems that no processing has been implemented at all."Then, I would have asked someone else to implement it, so I think it would be better if you asked me a question like ""I implemented it like this, but it doesn't work well in this part."""


2022-09-30 17:33

Load all CSV files, store them in a single data frame, and then divide them by company using pandas.DataFrame.groupby().Then, the split data frame is keyed to 月month 」 and outer join (pandas.DataFrame.merge()).Therefore, the missing value is NaN.

import pandas as pd
from functools import reduction

data=pd.DataFrame({'ID':[10,20,10,20,10,20]})
outfolder='...r/'

dfs=reduce(
  lambdal, r:pd.merge(l,r,how = 'outer', on = 'month',
  pd.concat([
    pd.read_csv(f'{outfolder} list {id}_0.5h.csv',
      parse_dates=['Month', thousands=',')
    for id in data.ID.unique()])
  .groupby('name')
  .pipe(lambda grp:[
    g[['month', 'sales']].rename(columns={'sales':k})
    fork, ingrp
  ])
)

pd.set_option('display.unicode.east_asian_width', True)
print(dfs)

=>
# Company B does not have sales data for June 1, 2017 (deficit)

          month AB
0 2017-01-01   9786.351   6459.0
1 2017-02-01   8464.354   5586.0
2 2017-03-01  20291.400  13392.0
3 2017-04-01  13129.610   8666.0
4 2017-05-01  33974.490  22423.0
5 2017-06-0136048.030 NaN
6 2017-07-01  48785.010  32198.0
7 2017-08-01  12199.740   8052.0
8 2017-09-01   6442.446   4252.0
9 2017-10-01  47193.700  31148.0


2022-09-30 17:33

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.