実行 Execution environment
Windows 10
Python 3.X
pandas
This is the continuation of the question on this link.
Pandas cannot retrieve data under certain conditions.
リンクLinked questions
In the link above, we were able to use pandas to ask which "classification" each id belongs to in the column, and we were able to use groupby to get the classification and numbers for each id.
質問Contents of questions
I was able to get the following dfx from the above question:
ddfx (csv format for easy separation)
id, numeric, classification
aaa, 3141, type 2
bbb, 5926, type 1
ccc,5358, type 3
ddd,9793,type1
eee, 2384, type 3
fff, 6264, type 2
ggg, 3383, type 2
hhh,2795,type1
iii, 288, type 3
jjj, 4197, type 1
kkk,1693,type3
lll,9937,type2
mmm, 5105, type 2
nnn, 8209, type 1
We executed the following code in order to obtain the maximum and minimum values for each of the three elements of the column Classification from this data.
dfx_max_min = dfx.groupby('classification') .agg(['max','min'])
print(dfx_max_min)
When I checked the print results, I was able to get the following data.
We got the maximum and minimum values for each classification, both id and number.
id number
max min max min
classification
type1 nnn hhh9793 2795
type 2 mmmaaa99373141
type3kkkcc5358288
This time, I would like to recreate the maximum and minimum id as df.
The maximum value of type 1 id is ddd because the id is ddd when the maximum value of type 1 is 9793.
成形 Data you want to mold
id number
max min max min
classification
type1dd bbb97932795
type2lllaaa99373141
type3ccciii5358288
How can I write it in Pandas if I want to perform data molding like this time?
python pandas
Slightly
Above all, it looks disgraceful (personally), so I will rewrite it when I get a better one.
deffn(sdf):
smax=dfx.loc [dfx['numerical']==sdf.max()['numerical'], ['id', 'numerical']].iloc[0]
smin=dfx.loc [dfx['numerical']==sdf.min()['numerical'], ['id', 'numerical']].iloc[0]
df = pd.concat ([smax,min])
df.index=pd.MultiIndex.from_tuples([('max', 'id', ('max', 'numerical'), ('min', 'id', ('min', 'numerical')])
return df
dfx.groupby('classification').apply(fn)
# max min
# id number id number
# classification
# type1ddd9793 hhh2795
# type2ll9937aaa3141
# type3cc5358iii288
midx=pd.MultiIndex.from_product([['max','min', dfx.columns[:2]]])
dfx_max_min = dfx.groupby('classification')['numerical'].agg(['idxmax', 'idxmin'])\
.apply(lambdax:dfx.loc [x,dfx.columns[:2]].stack().set_axis(midx),axis=1)\
.swaplevel(axis=1).sort_index(axis=1, level=0)
print(dfx_max_min)
# id number
# max min max min
# classification
# type1dd hhh97932795
# type2lllaaa99373141
# type3ccciii5358288
© 2024 OneMinuteCode. All rights reserved.