[Pandas] Create a new data frame by drawing only the characters in the <> of the source data frame

Asked 1 years ago, Updated 1 years ago, 293 views

From the following data in the sentence of the data frame

   df_sentence
 <Donald Trump:PS> is <America:LC> President. He came to <Japan:OG> in <July 20:DT>
 <NC Soft:OG>is established in  <Match, 1993:DT>

I would like to extract the following results (as a data frame).

 PSLC DTOG
    DonaldTrump    America          20-Jul  
                      Japan         Match, 1993      NC Soft

python pandas

2022-12-22 19:14

1 Answers

import re

# # Define the regular expression pattern for entity-label pairs
pattern = r'<(.+?):(.+?)>'

# # Define the input text
text = """<Donald Trump:PS> is <America:LC> President. He came to <Japan:OG> in <July 20:DT>
 <NC Soft:OG>is established in  <March, 1993:DT>"""

# # Find all the matches in the text
matches = re.finditer(pattern, text)

# # Iterate over the matches and print the entity-label pairs
for match in matches:
  entity = match.group(1)
  label = match.group(2)
  print((entity, label))


2022-12-22 20:26

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.