Answer I want to change:
I'd like to make it so that only the product id and the number (the number has been added).
22 6
24 10
31 3
raw csv data
clientID product ID product information number
324 24. Clothes. 4
531. 22. Refrigerator. 3
432. 24. Clothes. 3
433.24 Clothes.3
434.31. Refrigerator.3
435. 22. Refrigerator. 3
If it were me...
Let's use aws.
Upload the csv file to S3.
Set up AURORA (MYSQL-compatible DBMS) using RDS.
We will create the aws Lambda function in Python or node, handle the csv file stored in S3, and migrate it to Aurora DB.
It's saved in rdbms, so you can use SQL to query it however you want.
Well, if you don't use AWS, I'll use the SQLITE engine built into Python.
If it's a one-time job...It is also convenient to use PANDAS.
Look at the example below and learn.
import pandas as pd
#data.csv
'''
clientID, product ID, product information, number of units
324,24,Clothes,4
531,22,Refrigerator,3
432, 24, clothes, 3
433,24, clothes,3
434,31 Refrigerator,3
435,22, refrigerator,3
'''
df = pd.io.parsers.read_csv("data.csv")
df.groupby("Product ID")["Number"].sum()
'''
Product ID
22 6
24 10
31 3
Name: Count, dtype: int64
'''
© 2024 OneMinuteCode. All rights reserved.