I would like to know if I can retrieve data using pandas grouping.

Asked 1 years ago, Updated 1 years ago, 421 views

実行 Execution Environment
Windows 10 Python 3.X
pandas

This is the continuation of the question on this link.
Pandas cannot retrieve data under certain conditions.

リンクLinked Questions
In the link above, we were able to use pandas to ask which "classification" each id belongs to in the column, and we were able to use groupby to get the classification and numbers for each id.

質問Question details
I would like to process the df below into data I want to process.
There are two differences from the data in the previous question.
The first is that there is type 4 in the classification.
The other is that the id and classification are not one-to-one, but there are multiple classifications for one id. It means that there is something.
I think the following df has two categories of ccc for id.

There are four categories in this data, but the data you want to retrieve is
I would like to narrow it down to three types: type1, type2, and type3.

Also, if there are more than one classification for an id, I would like to get all of them.(I won't get type 4)

In the previous question, it was one classification for one id, so I was able to retrieve data using groupby, but if there are more than one, I think it will be difficult to retrieve data from groupby.

I thought about steadily acquiring data, but even in this case
I was wondering if there is a way to get it through grouping, so I asked you a question.
Is it possible to obtain data by using grouping for this data?
Or do I have to steadily acquire it?

df (in csv format for easy separation)

 id, numeric, classification
aaa, type 2
aaa,
aaa, 111, type 4
bbb,
bbb, type 1
bbb, 222,
ccc, type 3
ccc,,
ccc, type 1
ccc,333
ddd,,
ddd,,
ddd, 1234, type 2

Data you want to process

 id, numeric, classification
aaa, 111, type 2
bbb,222,type1
ccc,333,type3
ccc,333,type1
ddd, 1234, type 2

python pandas

2023-01-05 19:01

1 Answers

type=('type1', 'type2', 'type3')
dfx=df.groupby('id', group_keys=False)\
        .apply(lambdax:x.assign(numerical=x.iloc[-1]['numerical'].dropna())\
        .astype({'numerical':int}).query('classification in@type').reset_index(drop=True)
print(dfx)


2023-01-05 22:39

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.