As a result of executing the following code, I was able to obtain the most frequent value in series type, but what should I do to obtain only actual data such as "Daiji Nakahara" and "Daiji Tetsuwa"?
▼ Code
null_stationAreas=df["District Name".isnull()
for null_stationArea in list(df.loc[null_stationArea, "Nearby Station: Name"].unique()):
mode=df.loc[(~null_stationArea)&(df["Nearby Station:Name"]==null_stationArea), "District Name"].mode()
print(mode)
▼Excerpts of execution results
0 Oaza Nakahara
dtype:object
Series ( [ ], dtype: object )
zero-size iron ring
dtype:object
0 Oaza Minami Tateishi (literally, southern standing stone)
1. Ishigaki East
dtype:object
0 Honmachi (the main town)
dtype:object
0 Tokugura Corporation
one bush clover
dtype:object
0 Mitojima Honmachi
dtype:object
▼ What I tried
Since it is a Series type, we specified the index number of the data we want, 0."I was able to get the first data, ""Daiji Nakahara"", but I couldn't get the other ""Daiji Tetsuwa"" because an error was displayed, and I couldn't get the data."
See screenshots for errors.
null_stationAreas=df["District Name".isnull()
for null_stationArea in list(df.loc[null_stationArea, "Nearby Station: Name"].unique()):
mode=df.loc[(~null_stationArea)&(df["Nearby Station:Name"]==null_stationArea), "District Name"].mode()[0]
print(mode)
Use values
.
import pandas as pd
importio
data_csv='"
District name, nearest station: name
, A
,B
, C
, C
a large letter, A
a large letter, A
Oaza Nakahara, B
the capital letters of Nakahara, C
a large iron ring
a large iron ring
a large iron ring
a large iron ring
a large iron ring
Daiji Minami Tateishi, A
Daiji Minami Tateishi, B
Daiji Minami Tateishi, C
Daiji Minami Tateishi, C
Daiji Minami Tateishi, D
Daiji Minami Tateishi, D
'''.strip()
df = pd.read_csv(io.StringIO(data_csv))
null_stationAreas=df["District Name".isnull()
for null_stationArea in list(df.loc[null_stationArea, "Nearby Station: Name"].unique()):
mode = df.loc [
(~null_stationArea)&(df["Nearby Station: Name"]==null_stationArea", "District Name"
] .mode().values
print(mode)
#
["Daiji Nakahara"]
["Tetsuwa"]
["Daiji Nan Tateishi" and "Daiji Iron Ring"]
You can also write as follows (the result is a data frame):
stations=df.loc [df['District Name'].isnull(), "Nearby Station: Name"].unique()
dfx=(
df.dropna()
.loc [df["Nearby Station: Name"].isin(stations)]
.groupby ("Nearby Station: Name")
.apply(lambdax:x["District Name"].mode().values)
.to_frame("District name (most frequent value)"))
print(dfx)
#
District name (most frequent value)
nearest station:name
A [Daiji Nakahara]
[Tetsuwa with big letters]
C [Daiji Minami Tateishi, Daiji Tetsuwa]
© 2024 OneMinuteCode. All rights reserved.