Is it possible to combine big data?

Asked 1 years ago, Updated 1 years ago, 111 views

python 3.7 version anaconda pycharm There are 1000000 columns in our csv file The content is roughly like this. Is it possible to add up the number of the same ID?

Answer I want to change:

I'd like to make it so that only the product id and the number (the number has been added).

22 6

24 10

31 3

    raw csv data
   clientID product ID product information number
   324 24. Clothes. 4
   531. 22. Refrigerator. 3
   432. 24. Clothes. 3
   433.24 Clothes.3
   434.31. Refrigerator.3
   435. 22. Refrigerator. 3

bigdata

2022-09-22 10:17

2 Answers

If it were me...

Let's use aws.

Upload the csv file to S3.

Set up AURORA (MYSQL-compatible DBMS) using RDS.

We will create the aws Lambda function in Python or node, handle the csv file stored in S3, and migrate it to Aurora DB.

It's saved in rdbms, so you can use SQL to query it however you want.

Well, if you don't use AWS, I'll use the SQLITE engine built into Python.

If it's a one-time job...It is also convenient to use PANDAS.


2022-09-22 10:17

Look at the example below and learn.

import pandas as pd

#data.csv
'''
clientID, product ID, product information, number of units
324,24,Clothes,4
531,22,Refrigerator,3
432, 24, clothes, 3
433,24, clothes,3
434,31 Refrigerator,3
435,22, refrigerator,3
'''
df = pd.io.parsers.read_csv("data.csv")
df.groupby("Product ID")["Number"].sum()
'''
Product ID
22     6
24    10
31     3
Name: Count, dtype: int64
'''


2022-09-22 10:17

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.