There is an error in the regular expression, but I would like you to tell me about the fixed width of the post-reading.

Asked 1 years ago, Updated 1 years ago, 61 views

I am scraping and would like to extract Shinjuku from the data below.

<td>1</td>,<td class="stationName"><a href="http://www.jreast.co.jp/estation/station/info.aspx?StationCD=866">Shinjuku</a></td>,>355,778>/td>41>>>60>>

So I checked the regular expression I created in the regular expression checker for here.

Enter a description of the image here

After confirming that it could be retrieved, I executed the following code:

import re
data='<td>1</td>,<td class="stationName"><a href="http://www.jreast.co.jp/estation/station/info.aspx?StationCD=866">Shinjuku</a>/td>;,<355,778>/td>,>41>>60;

r=re.findall('(?<=(<td class="stationName"><a href=".*">))(.*?)(?=</a>),data)

The following error appears in findall:

------------------------------------------------------------------------------------------------
error Traceback (most recent call last)
<ipython-input-24-a625cd10ed2c>in<module>()
      1# station_name_list=[ ]
---->2r=re.findall('(?<=(<td class="stationName"><a href=".*">))(.*?)(?=</a>')', data2[0])
      3 
      4 for num in r:
      5 station_name_list.append(num[1])

4 frames
/usr/lib/python 3.7/sre_compile.py in_compile(code, pattern, flags)
    180 lo, hi = av[1].getwidth()
    181 iflo!=hi:
-->182 raise error ("look-behind requirements fixed-width pattern")
    183 emit(lo)#look behind
    184_compile(code, av[1], flags)

error:look-behind requirements fixed-width pattern

I'm not used to regular expressions and need to set a fixed width for postreading, but I didn't know what to do.

Please let me know if you understand.Thank you for your cooperation.

python regular-expression

2022-09-30 11:44

1 Answers

I didn't have to read it after reading it, so I changed it to the following and it worked.

import re
data='<td>1</td>,<td class="stationName"><a href="http://www.jreast.co.jp/estation/station/info.aspx?StationCD=866">Shinjuku</a>/td>;,<355,778>/td>,>41>>60;

r=re.findall('<td class="stationName"><a href=".*?">(.*?)</a>', page)

# Displayed as Shinjuku


2022-09-30 11:44

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.