Unable to retrieve data under certain conditions in Pandas.

Windows 10 Python 3.X
pandas

Data Description

We would like to use the following df for data processing.
This is made up of columns called id, number, and classification.
Multiple IDs are duplicated, and three categories These IDs are always divided into three categories (type1, type2, type3).
It is written in the column classification, but the data is written in one of the lines of the duplicate data.(Each where it is written)
The number in the column is an integer in the last row of duplicate ids.

What do you want to do

We would like to obtain numerical values for each classification, and eventually find the average, maximum, and minimum values of the numerical values in each of the three categories.Therefore, I would like to get what classification is for aaa and what classification is for bbb, and put those numbers in the list of type1, type2, type3, but I am not used to using pandas, so I don't know exactly how to write the code.Which function of Pandas should I use to get the numbers well?

python pandas

2022-12-27 20:24

2 Answers

テストAdding rows for testing.

import pandas as pd
importio

csv_data='"
id, number, classification
aaa,
aaa,
aaa, 111, type 2
bbb,
bbb, type 1
bbb, 222,
ccc, type 3
ccc,,
ccc,333
ddd,,
ddd,,
ddd, 1234, type 2
'''
df = pd.read_csv(io.StringIO(csv_data))

#
dic=df.groupby('id').agg({'numerical': 'last', 'classification': 'first'})\
        .groupby('classification')['numerical'].agg(list).to_dict()

print(dic)

# {'type1': [222.0], 'type2': [111.0, 1234.0], 'type3': [333.0]}

pandas.core.groupby.groupBy.first—pandas 1.5.2 documentation

finalGroupBy.first(numerical_only=False,min_count=-1)

Compute the first non-null entryof each column.

2022-12-27 22:15

I'm not used to Pandas, so I don't know exactly how to write the code.

I don't know if it's easy to understand, but each step
I'm going to use different DataFrames one by one

As a result, I need a list, so I will list it in the dictionary

update:change agg specification to last or first

df=pd.read_csv(tsvf)#, keep_default_na=False)
# Group by id
df2=df.groupby('id', as_index=False).agg({'id':'first','numerical':'last','classification':'first'})
# Horizontal or vertical alignment
df3=df2.pivot(index='classification', columns='id', values='numerical') .T

# List by id (dictionary)
dct={t:[int(n)for n indf3[t].to_list()if not pd.isnull(n)]
        for in('type1', 'type2', 'type3')}
dct
# {'type1':[222], 'type2':[111], 'type3':[333]}

<
['dddd',444,'type3'] if there is an item
This should be the result of adding to the 'type3' list

2022-12-28 03:51

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656