I want to use Python's regular expression to extract specific points from the string.

Asked 2 years ago, Updated 2 years ago, 63 views

We would like to extract only the necessary information from the body of the Outlook email.
The part you want to extract has a mix of numbers (up to 5 integers and 1st place in the minority) and you are currently able to extract only numbers.
I would appreciate it if you could teach me how to write regular expressions that can be used either way.

import pandas as pd
import re

data="""Name:\r\nsatou:\r\nLast Score: 14686.5\r\nThis Score: 8992.5\r\n\r\nName:\r\ntanaka:\r\nLast Score: 778.5\r\nThis Score: 82.5\r\nName:\r\nSuzuki:\r\nLast Score: -r\n"\n"Last Score:\n"\n"\n"Last Score: 9."\r\n"\n"\n"\n"\r\n"\r\n"\n"

ptn=r "Name:\r\n(.*?):\r\nLast Score\s:\s*([\d.]+)\s* points\r\nthis time\s:\s*([\d.]+)\s*(\w+)"
output_data=pd.DataFrame(re.findall(ptn,data,re.M|re.DOTALL))

output_data=output_data.rename (columns={0:"Name", 1:"Last Score", 2:"This Time Score", 3:"Unit")

python regular-expression

2022-09-30 17:40

2 Answers

import pandas as pd
import re

pd.set_option('display.unicode.east_asian_width', True)

data = """名前:\r\nsatou:\r\n前回の点数 : 14686.5 点\r\n今回の点数 : 8992.5 点\r\n\r\n名前:\r\ntanaka:\r\n前回の点数 : 778.5 点\r\n今回の点数 : 82.5 点\r\n\r\n名前:\r\nsuzuki:\r\n前回の点数 : - 点\r\n今回の点数 : 9.5 点\r\n\r\n"""


output_data = (
    pd.DataFrame([
        m.groupdict() for m in re.finditer(
            r ' Name :? it's Proctor & Gamble (n ; and gt. + his name?) : If n '
            r ' If the number of the last : His. * (last time? it's Proctor & Gamble ; & g ;. +) If * If the number of n '
            r ' As part of the number of guns : * (? it's Proctor & Gamble ; the current ;. +? and the number of) * '
            a)])), ' (Unit? it's Proctor & Gamble ; and gt ; n) (? =) '

print(output_data)

#
     the number of name   last current   the number of unit
Twisted     1 4 6 8 6 5   899, 2, point 5
1 78. 982     the Consent aka   a point 5
2   where Suzuki 29    . 5


2022-09-30 17:40

([\d.] if the problem is that suzuki's "last score:-point" cannot be extracted in regular expression.+) Rewrite to match the minus sign as well as numbers.

  • Before change
    • Score\s:\s*([\d.]+)\s*
  • Change Example 1
    • Score\s:\s*([\d.]+|-)\s*
      Suitable for decimal or -.However, it will also hit "1.2.3"
  • Changes Example 2
    • score\s:\s*(\d{1,5}\.\d?|-)\s*score
      Compatible with 1- to 5-digit integers + 1-digit decimal places or -.Integer only means no hit.
  • Change Example 3
    • score\s:\s*(\d{1,5}(?:\.\d)?|-)\s*score
      Compatible with 1- to 5-digit integers + 1-digit decimal places or -.Integer only hits.
  • Score\s:\s*([\d.]+)\s*
  • Score\s:\s*([\d.]+|-)\s*
    Suitable for decimal or -.However, it will also hit "1.2.3"
  • score\s:\s*(\d{1,5}\.\d?|-)\s*score
    Compatible with 1- to 5-digit integers + 1-digit decimal places or -.Integer only means no hit.
  • score\s:\s*(\d{1,5}(?:\.\d)?|-)\s*score
    Compatible with 1- to 5-digit integers + 1-digit decimal places or -.Integer only hits.


2022-09-30 17:40

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.