Python dataframe re.sub Post a question.

Asked 2 years ago, Updated 2 years ago, 58 views

The API information is being preprocessed using a data frame. In order to eliminate useless text, re.subfunction was used as below.

import re
import pandas as pd

regex = "\(.*\)|\s-\s.*"
df = pd.dataframe(recipe)
for i in range(len(df)):
    df['material'][i] = re.sub(regex',',df['material'][i])

In this way, you tried to remove the values in parentheses and parentheses in the material column, but all the text behind the parentheses disappeared.

Before this is applied, df,

The second picture is the df after applying the code above.

Sweet potato porridge. Sweet potato, sugar, glutinous rice powder... I don't want to fly the back, I just want to fly the letters in parentheses and parentheses. I think there's a problem with the regex part, so could you take a look?

python dataframe re.sub

2022-09-20 14:29

1 Answers

The current regular expression r"\(.*\)" is

This is what it means, and all characters in a sequence of any characters include . So, I think the result of your question is coming out.

Replace ) with a continuous of all characters instead of to resolve it. At the regular ceremony, r"\([^\)*\)" is expressed like this. Changed . to [^\].


2022-09-20 14:29

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.