Regarding the segmentation and extraction of Python data preprocessing string data

I have a question about Python data preprocessing.

I am analyzing the job offer data, and in the attached first sheet of DataFrame,
at the preprocessing stage. "When I checked the string data in the ""position"" column on the attached 2nd sheet, I found that there were a lot of data such as job offer name (details and place of work), so I would like to do it quickly

The job offer name and the contents of ( ) are to be extracted as separate columns, but

to be divided using split.

I have been unable to
I'm looking into it, but I'd appreciate it if you could give me some information that doesn't work.

Split Code

s='Server Side Engineer (Candidate Development Team Leader)'

sep='('
t=s.split(sep)#Separate strings with half-width blank characters

r=t[0]#Server Side Engineer Job Offerings Included in Index 0
print(r)# (extracts before a specific character = removes a specific character or later)

r=t[1]#Details are included in index 1
print(r)#Candidate for development team leader (extract after specific characters = delete up to specific characters)

Output:

Server Side Engineer
candidates for development team leadership)

data frames:

What do you want:

python pandas

2022-09-30 17:49

1 Answers

Here's how to use pandas.Series.str.extract.

import pandas as pd

train_data1 = pd.read_csv('train_data1.csv')

split=train_data1['position'].str.extract(r'^\s*(.+?)(\(.*?)\)?\s*$').fillna('')
train_data1['position'] = split[0]
train_data1.insert(2, 'position2', split[2])

train_data1

Inserted as position2 to the right of the position column.Also, if there is no (...) part, it will be blank (").

2022-09-30 17:49

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656