I would like to know how to extract only lines that contain specific strings.

Asked 2 years ago, Updated 2 years ago, 38 views

It is about reading a file with multiple columns and retrieving the row by row if one column contains a specific string.Specifically, I would like to know how to extract only lines containing Mr from the following input file.

name age
1 Mr. A30
2 Miss. B20
3 Mr. C25

Thank you for your cooperation.

python python3

2022-09-30 19:49

2 Answers

By reading the file with the open() function along with the for statement, you can perform line-by-line small processing. Python can use A in B to determine whether string A is included in string B. For example, only lines containing Mr. are printed.

for lin open('input.txt'):
    if "Mr." in l:
        print(l,end=')


2022-09-30 19:49

If the data is from Kaggle's passenger list, it would be a problem to extract the column name that contains the honorific 'Mr.' instead of extracting the row containing 'Mr.'Before 'Mr.' it is usually blank, so I think it would be best to check as follows and do concat if there are both.

df [df['name'].str.startswith('Mr.')]
df [df['name'].str.contains('Mr.')]

In this case, there is no problem with the following, but English words are separated by spaces, so it is often effective to check them first to avoid picking them up incorrectly.

df [df['name'].str.contains('Mr.')]

Also, if you want to use honorifics as predictive models, why don't you use split to break them down?(Generally, Mr. has spaces before and after, so even split() can be broken down into spaces without using regular expressions, so there may be no problem.)

df['name'].str.split(r'[\s\.], expand=True)


2022-09-30 19:49

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.