I want to create a function that flags lines that contain specific strings.

Asked 2 years ago, Updated 2 years ago, 44 views

for the following data frames:
>df=pd.DataFrame({'pref':['Tokyo', 'Kanagawa', 'Hokkaido', 'Saitama', 'Chiba', 'Shizuoka',
>'Nagano Prefecture',
>'city': ['Shibuya Ward', 'Yokohama City', 'Hakodate City', 'Urawa City', 'Urayasu City', 'Hamamatsu City', 'Okatani City']}) 

I would like to define a function to add a column called pref_flg containing "one city, three prefectures or others".

The image of the completed data frame is as follows.

>df2=pd.DataFrame({'pref':['Tokyo', 'Kanagawa', 'Hokkaido', 'Saitama', 'Chiba',
>'Shizuoka Prefecture', 'Nagano Prefecture',
>'city': ['Shibuya Ward', 'Yokohama City', 'Hakodate City', 'Urawa City', 'Urayasu City', 'Hamamatsu City', 'Okatani City',
>'pref_flg': ['one metropolitan and three prefectures', 'one metropolitan and three prefectures', 'other', 'one metropolitan and three prefectures', 'other', 'other']})

I thought about using 'string'in● as below, but
If there is only one string, it seems that it can be handled correctly.
I don't know how to write if there are more than one string.

def get_pref(x):
        if 'Tokyo' in x:# I don't know how to specify this as multiple strings
            return '1 metropolitan and 3 prefectures'
        else:
            return 'Other'

df['pref_flg'] = df['pref'].apply(get_pref)

I would appreciate your advice.

++++++++ Add
I tried the following and found the expected result.

prefs=['Tokyo', 'Kanagawa', 'Chiba', 'Saitama']
def get_pref(x):
    global prefs
    if x in prefs:
        return '1 metropolitan and 3 prefectures'
    else:
        return 'Other'

df5['pref_flg'] = df5['pref'].apply(get_pref)

python pandas

2022-09-30 16:44

1 Answers

You can also use pandas.DataFrame.isin and numpy.where to write:

import pandas as pd
import numpy as np

pd.set_option('display.unicode.east_asian_width', True)

df = pd.DataFrame({
  'pref': ['Tokyo', 'Kanagawa Prefecture', 'Hokkaido', 'Saitama Prefecture', 'Chiba Prefecture', 'Shizuoka Prefecture', 'Nagano Prefecture',
  'city': ['Shibuya Ward', 'Yokohama City', 'Hakodate City', 'Urawa City', 'Urayasu City', 'Hamamatsu City', 'Okatani City',
}) 

df = df.assign(
  pref_flg = np.where(
    df.pref.isin (('Tokyo', 'Kanagawa', 'Saitama', 'Chiba')),
    'One city, three prefectures', 'Other')
)

print(df)

      pref city pref_flg
0 Tokyo Shibuya Ward, 1 city, 3 prefectures
1. Yokohama City, Kanagawa Prefecture, 1 city, 3 prefectures
(2) Hakodate City, Hokkaido and others
3 Saitama Prefecture Urawa City 1 metropolitan and 3 prefectures
4. Urayasu City, Chiba Prefecture, 1 city, 3 prefectures
5 Hamamatsu City, Shizuoka Prefecture, and others
6 Okatani City, Nagano Prefecture and others


2022-09-30 16:44

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.