How to get only mode() results in a for statement in Pandas

Asked 2 years ago, Updated 2 years ago, 44 views

As a result of executing the following code, I was able to obtain the most frequent value in series type, but what should I do to obtain only actual data such as "Daiji Nakahara" and "Daiji Tetsuwa"?

▼ Code

null_stationAreas=df["District Name".isnull()
for null_stationArea in list(df.loc[null_stationArea, "Nearby Station: Name"].unique()):
    mode=df.loc[(~null_stationArea)&(df["Nearby Station:Name"]==null_stationArea), "District Name"].mode()
    print(mode)

▼Excerpts of execution results

0 Oaza Nakahara
dtype:object
Series ( [ ], dtype: object )
zero-size iron ring
dtype:object
0 Oaza Minami Tateishi (literally, southern standing stone)
1. Ishigaki East
dtype:object
0 Honmachi (the main town)
dtype:object
0 Tokugura Corporation
one bush clover
dtype:object
0 Mitojima Honmachi
dtype:object

▼ What I tried
Since it is a Series type, we specified the index number of the data we want, 0."I was able to get the first data, ""Daiji Nakahara"", but I couldn't get the other ""Daiji Tetsuwa"" because an error was displayed, and I couldn't get the data."
See screenshots for errors.

null_stationAreas=df["District Name".isnull()
for null_stationArea in list(df.loc[null_stationArea, "Nearby Station: Name"].unique()):
    mode=df.loc[(~null_stationArea)&(df["Nearby Station:Name"]==null_stationArea), "District Name"].mode()[0]
    print(mode)

Enter a description of the image here

Enter a description of the image here

python pandas

2022-09-30 16:37

1 Answers

Use values.

import pandas as pd
importio

data_csv='"
District name, nearest station: name
, A
,B
, C
, C
a large letter, A
a large letter, A
Oaza Nakahara, B
the capital letters of Nakahara, C
a large iron ring
a large iron ring
a large iron ring
a large iron ring
a large iron ring
Daiji Minami Tateishi, A
Daiji Minami Tateishi, B
Daiji Minami Tateishi, C
Daiji Minami Tateishi, C
Daiji Minami Tateishi, D
Daiji Minami Tateishi, D
'''.strip()

df = pd.read_csv(io.StringIO(data_csv))

null_stationAreas=df["District Name".isnull()
for null_stationArea in list(df.loc[null_stationArea, "Nearby Station: Name"].unique()):
    mode = df.loc [
               (~null_stationArea)&(df["Nearby Station: Name"]==null_stationArea", "District Name"
           ] .mode().values
    print(mode)

#
["Daiji Nakahara"]
["Tetsuwa"]
["Daiji Nan Tateishi" and "Daiji Iron Ring"]

You can also write as follows (the result is a data frame):

stations=df.loc [df['District Name'].isnull(), "Nearby Station: Name"].unique()
dfx=(
  df.dropna()
    .loc [df["Nearby Station: Name"].isin(stations)]
    .groupby ("Nearby Station: Name")
    .apply(lambdax:x["District Name"].mode().values)
    .to_frame("District name (most frequent value)"))

print(dfx)
#
                      District name (most frequent value)
nearest station:name                        
A [Daiji Nakahara]
[Tetsuwa with big letters]
C [Daiji Minami Tateishi, Daiji Tetsuwa]


2022-09-30 16:37

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.