It is about reading a file with multiple columns and retrieving the row by row if one column contains a specific string.Specifically, I would like to know how to extract only lines containing Mr from the following input file.
name age
1 Mr. A30
2 Miss. B20
3 Mr. C25
Thank you for your cooperation.
python python3
By reading the file with the open()
function along with the for statement, you can perform line-by-line small processing. Python can use A in B
to determine whether string A is included in string B. For example, only lines containing Mr.
are printed.
for lin open('input.txt'):
if "Mr." in l:
print(l,end=')
If the data is from Kaggle's passenger list, it would be a problem to extract the column name
that contains the honorific 'Mr.' instead of extracting the row containing 'Mr.'Before 'Mr.' it is usually blank, so I think it would be best to check as follows and do concat
if there are both.
df [df['name'].str.startswith('Mr.')]
df [df['name'].str.contains('Mr.')]
In this case, there is no problem with the following, but English words are separated by spaces, so it is often effective to check them first to avoid picking them up incorrectly.
df [df['name'].str.contains('Mr.')]
Also, if you want to use honorifics as predictive models, why don't you use split
to break them down?(Generally, Mr.
has spaces before and after, so even split()
can be broken down into spaces without using regular expressions, so there may be no problem.)
df['name'].str.split(r'[\s\.], expand=True)
© 2024 OneMinuteCode. All rights reserved.