I have a question about Python data preprocessing.
I am analyzing the job offer data, and in the attached first sheet of DataFrame,
at the preprocessing stage.
"When I checked the string data in the ""position"" column on the attached 2nd sheet, I found that there were a lot of data such as job offer name (details and place of work), so I would like to do it quickly
The job offer name and the contents of ( ) are to be extracted as separate columns, but
to be divided using split.I have been unable to
I'm looking into it, but I'd appreciate it if you could give me some information that doesn't work.
Split Code
s='Server Side Engineer (Candidate Development Team Leader)'
sep='('
t=s.split(sep)#Separate strings with half-width blank characters
r=t[0]#Server Side Engineer Job Offerings Included in Index 0
print(r)# (extracts before a specific character = removes a specific character or later)
r=t[1]#Details are included in index 1
print(r)#Candidate for development team leader (extract after specific characters = delete up to specific characters)
Output:
Server Side Engineer
candidates for development team leadership)
data frames:
python pandas
Here's how to use pandas.Series.str.extract.
import pandas as pd
train_data1 = pd.read_csv('train_data1.csv')
split=train_data1['position'].str.extract(r'^\s*(.+?)(\(.*?)\)?\s*$').fillna('')
train_data1['position'] = split[0]
train_data1.insert(2, 'position2', split[2])
train_data1
Inserted as position2
to the right of the position
column.Also, if there is no (...)
part, it will be blank ("
).
© 2024 OneMinuteCode. All rights reserved.