I want to use Python's regular expression to extract specific points from the string.

We would like to extract only the necessary information from the body of the Outlook email.
The part you want to extract has a mix of numbers (up to 5 integers and 1st place in the minority) and you are currently able to extract only numbers.
I would appreciate it if you could teach me how to write regular expressions that can be used either way.

import pandas as pd
import re

data="""Name:\r\nsatou:\r\nLast Score: 14686.5\r\nThis Score: 8992.5\r\n\r\nName:\r\ntanaka:\r\nLast Score: 778.5\r\nThis Score: 82.5\r\nName:\r\nSuzuki:\r\nLast Score: -r\n"\n"Last Score:\n"\n"\n"Last Score: 9."\r\n"\n"\n"\n"\r\n"\r\n"\n"

ptn=r "Name:\r\n(.*?):\r\nLast Score\s:\s*([\d.]+)\s* points\r\nthis time\s:\s*([\d.]+)\s*(\w+)"
output_data=pd.DataFrame(re.findall(ptn,data,re.M|re.DOTALL))

output_data=output_data.rename (columns={0:"Name", 1:"Last Score", 2:"This Time Score", 3:"Unit")

python regular-expression

2022-09-30 17:40

2 Answers

import pandas as pd
import re

pd.set_option('display.unicode.east_asian_width', True)

data = """名前:\r\nsatou:\r\n前回の点数 : 14686.5 点\r\n今回の点数 : 8992.5 点\r\n\r\n名前:\r\ntanaka:\r\n前回の点数 : 778.5 点\r\n今回の点数 : 82.5 点\r\n\r\n名前:\r\nsuzuki:\r\n前回の点数 : - 点\r\n今回の点数 : 9.5 点\r\n\r\n"""


output_data = (
    pd.DataFrame([
        m.groupdict() for m in re.finditer(
            r ' Name :? it's Proctor & Gamble (n ; and gt. + his name?) : If n '
            r ' If the number of the last : His. * (last time? it's Proctor & Gamble ; & g ;. +) If * If the number of n '
            r ' As part of the number of guns : * (? it's Proctor & Gamble ; the current ;. +? and the number of) * '
            a)])), ' (Unit? it's Proctor & Gamble ; and gt ; n) (? =) '

print(output_data)

#
     the number of name   last current   the number of unit
Twisted     1 4 6 8 6 5   899, 2, point 5
1 78. 982     the Consent aka   a point 5
2   where Suzuki 29    . 5

2022-09-30 17:40

([\d.] if the problem is that suzuki's "last score:-point" cannot be extracted in regular expression.+) Rewrite to match the minus sign as well as numbers.

Before change
- Score\s:\s*([\d.]+)\s*
Change Example 1
- Score\s:\s*([\d.]+|-)\s*
  Suitable for decimal or -.However, it will also hit "1.2.3"
Changes Example 2
- score\s:\s*(\d{1,5}\.\d?|-)\s*score
  Compatible with 1- to 5-digit integers + 1-digit decimal places or -.Integer only means no hit.
Change Example 3
- score\s:\s*(\d{1,5}(?:\.\d)?|-)\s*score
  Compatible with 1- to 5-digit integers + 1-digit decimal places or -.Integer only hits.

Score\s:\s*([\d.]+)\s*

Score\s:\s*([\d.]+|-)\s*
Suitable for decimal or -.However, it will also hit "1.2.3"

score\s:\s*(\d{1,5}\.\d?|-)\s*score
Compatible with 1- to 5-digit integers + 1-digit decimal places or -.Integer only means no hit.

score\s:\s*(\d{1,5}(?:\.\d)?|-)\s*score
Compatible with 1- to 5-digit integers + 1-digit decimal places or -.Integer only hits.

2022-09-30 17:40

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656