From the following data in the sentence of the data frame
df_sentence
<Donald Trump:PS> is <America:LC> President. He came to <Japan:OG> in <July 20:DT>
<NC Soft:OG>is established in <Match, 1993:DT>
I would like to extract the following results (as a data frame).
PSLC DTOG
DonaldTrump America 20-Jul
Japan Match, 1993 NC Soft
import re
# # Define the regular expression pattern for entity-label pairs
pattern = r'<(.+?):(.+?)>'
# # Define the input text
text = """<Donald Trump:PS> is <America:LC> President. He came to <Japan:OG> in <July 20:DT>
<NC Soft:OG>is established in <March, 1993:DT>"""
# # Find all the matches in the text
matches = re.finditer(pattern, text)
# # Iterate over the matches and print the entity-label pairs
for match in matches:
entity = match.group(1)
label = match.group(2)
print((entity, label))
© 2025 OneMinuteCode. All rights reserved.