One of Dataframe's Columns Becomes NaN

What do you want to do　　I would like to set up a separate year column by extracting year from the column of the title of the data frame as follows:Problems However, if you try the code below, a warning will appear and all the values in the year column will be NaN.

defrm_dates_from_title(df:pd.DataFrame):
  df['year'] = df.title.str.extract('(\(\d\d\d\d\))', expand=False)

  df['year'] = df.year.str.extract('(\d\d\d)', expand=False)

  df['title'] = df.loc.title.str.replace('(\(\d\d\d\d\))',',',regex=True)

  rm_dates_from_title (recommended_movies)
recommended_movies.head(20)

python dataframe

2023-01-06 17:29

1 Answers

My environment is Pandas 1.5.2, but the problem warning (SettingWithCopyWarning) is not displayed (although the code posted in the question does not). As I mentioned in the comment, if the title does not contain a 4-digit number in parentheses (the year the movie was released), it will be NaN.

import pandas as pd
importio

csv_data='"
movieid, title, genres
50, "Usual Suspects, The", Crime | Mystery | Thriller
110, Braveheart, Action | Drama | War
260, Star Wars: Episode IV-New Hope (1977), Action | Adventure | Sci-Fi
'''
df = pd.read_csv(io.StringIO(csv_data))

#
rx=r'\(\d{4})\)'
df['year'] = df['title'].str.extract(rx,expand=False)
df['title'] = df['title'].str.replace(rx,',regex=True)

print(df)

2023-01-06 23:23

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656