Is it possible to combine big data?

python 3.7 version anaconda pycharm There are 1000000 columns in our csv file The content is roughly like this. Is it possible to add up the number of the same ID?

Answer I want to change:

I'd like to make it so that only the product id and the number (the number has been added).

22 6

24 10

31 3

    raw csv data
   clientID product ID product information number
   324 24. Clothes. 4
   531. 22. Refrigerator. 3
   432. 24. Clothes. 3
   433.24 Clothes.3
   434.31. Refrigerator.3
   435. 22. Refrigerator. 3

bigdata

2022-09-22 10:17

2 Answers

If it were me...

Let's use aws.

Upload the csv file to S3.

Set up AURORA (MYSQL-compatible DBMS) using RDS.

We will create the aws Lambda function in Python or node, handle the csv file stored in S3, and migrate it to Aurora DB.

It's saved in rdbms, so you can use SQL to query it however you want.

Well, if you don't use AWS, I'll use the SQLITE engine built into Python.

If it's a one-time job...It is also convenient to use PANDAS.

2022-09-22 10:17

Look at the example below and learn.

import pandas as pd

#data.csv
'''
clientID, product ID, product information, number of units
324,24,Clothes,4
531,22,Refrigerator,3
432, 24, clothes, 3
433,24, clothes,3
434,31 Refrigerator,3
435,22, refrigerator,3
'''
df = pd.io.parsers.read_csv("data.csv")
df.groupby("Product ID")["Number"].sum()
'''
Product ID
22     6
24    10
31     3
Name: Count, dtype: int64
'''

2022-09-22 10:17

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656