One of Dataframe's Columns Becomes NaN

Asked 1 years ago, Updated 1 years ago, 305 views

What do you want to do  I would like to set up a separate year column by extracting year from the column of the title of the data frame as follows:Enter a description of the image hereProblems However, if you try the code below, a warning will appear and all the values in the year column will be NaN.

defrm_dates_from_title(df:pd.DataFrame):
  df['year'] = df.title.str.extract('(\(\d\d\d\d\))', expand=False)

  df['year'] = df.year.str.extract('(\d\d\d)', expand=False)

  df['title'] = df.loc.title.str.replace('(\(\d\d\d\d\))',',',regex=True)

  rm_dates_from_title (recommended_movies)
recommended_movies.head(20)

Enter a description of the image here

python dataframe

2023-01-06 17:29

1 Answers

My environment is Pandas 1.5.2, but the problem warning (SettingWithCopyWarning) is not displayed (although the code posted in the question does not). As I mentioned in the comment, if the title does not contain a 4-digit number in parentheses (the year the movie was released), it will be NaN.

import pandas as pd
importio

csv_data='"
movieid, title, genres
50, "Usual Suspects, The", Crime | Mystery | Thriller
110, Braveheart, Action | Drama | War
260, Star Wars: Episode IV-New Hope (1977), Action | Adventure | Sci-Fi
'''
df = pd.read_csv(io.StringIO(csv_data))

#
rx=r'\(\d{4})\)'
df['year'] = df['title'].str.extract(rx,expand=False)
df['title'] = df['title'].str.replace(rx,',regex=True)

print(df)


2023-01-06 23:23

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.