Understanding String Search Using Regular Expressions in Python 3.8.5 Series

Asked 2 years ago, Updated 2 years ago, 112 views

If you look at tpe, you can see the data <class'pandas.core.series'>.
I would like to do a string search for this data.

 match=Series_data [Series_data=="F:\\"]

and the result was obtained.(So far it looks)

In fact, the drive name is indefinite, sometimes F, sometimes other alphabets.
So I'm wondering if I can write [] in regular expressions.

 match=Series_data [re.search(':\\', Series_data)]

I don't think I can do this, and I get the following message:

 raise error("badescape(end of pattern)",
re.error: badescape(end of pattern) at position 1

Is the format of the regular expression wrong?I can't tell if I can't use regular expressions in the first place.

Thank you for your cooperation.

===========================================================================

Thank you to everyone who responded.I think I can understand
match=Series_data [Series_data.str.contains(r':\\')]
I tried and found that it worked.
It's a bit disgusting to get an error if you don't write r':\\' (requires two \'s) even though you're putting r on it, but it worked for now.
When you write r':\',
SyntaxError: EOL while scanning string literal
The error appears.

python python3 pandas regular-expression

2022-09-30 19:51

3 Answers

If you want to express the file path, it might be easier to use / instead of \.

 match=Series_data [re.search(':/', Series_data)]


2022-09-30 19:51

Pandas includes in and regular expressions for Pandas.
Normal? It's a little different from Python, or something else.

s1=pd.Series (['aaa', 'bbb', 'abb', 'ccc'])
s1 [s1.str.match('.bb')]
# 1 bbb
# 2 abb
# dtype:object

s1 [s1.str.contains('bbb')] #regex=False and other options available
# 1 bbb
# 2 abb
# dtype:object

Note:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html


2022-09-30 19:51

If you are using re.search, you can use pandas.Series.apply.

import pandas as pd
import re

Series_data=pd.Series([
  'F:\\Documents\\Newsletters\\Summer 2018.pdf',
  'C:\\Projects\\apilibrary\\apilibrary.sln',
  '\\Windows\\Python.exe',
  'data_path'
])

>>Series_data [Series_data.apply(lambdax:bool(re.search(r':\\',x))]
0 F:\Documents\Newsletters\Summer 2018.pdf
1 C:\Projects\apilibrary\apilibrary.sln
dtype:object

You can also use os.path.splitdrive instead of re.search (ntpath.splitdrive on non-Windows operating systems).

import pandas as pd
import ntpath

>> Series_data [Series_data.apply (lambdax:bool(ntpath.splitdrive(x)[0]))]
0 F:\Documents\Newsletters\Summer 2018.pdf
1 C:\Projects\apilibrary\apilibrary.sln


2022-09-30 19:51

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.