What do you want to do I would like to set up a separate year column by extracting year from the column of the title of the data frame as follows:Problems However, if you try the code below, a warning will appear and all the values in the year column will be NaN.
defrm_dates_from_title(df:pd.DataFrame):
df['year'] = df.title.str.extract('(\(\d\d\d\d\))', expand=False)
df['year'] = df.year.str.extract('(\d\d\d)', expand=False)
df['title'] = df.loc.title.str.replace('(\(\d\d\d\d\))',',',regex=True)
rm_dates_from_title (recommended_movies)
recommended_movies.head(20)
My environment is Pandas 1.5.2, but the problem warning
(SettingWithCopyWarning
) is not displayed (although the code posted in the question does not). As I mentioned in the comment, if the title
does not contain a 4-digit number in parentheses (the year the movie was released), it will be NaN
.
import pandas as pd
importio
csv_data='"
movieid, title, genres
50, "Usual Suspects, The", Crime | Mystery | Thriller
110, Braveheart, Action | Drama | War
260, Star Wars: Episode IV-New Hope (1977), Action | Adventure | Sci-Fi
'''
df = pd.read_csv(io.StringIO(csv_data))
#
rx=r'\(\d{4})\)'
df['year'] = df['title'].str.extract(rx,expand=False)
df['title'] = df['title'].str.replace(rx,',regex=True)
print(df)
© 2024 OneMinuteCode. All rights reserved.