On the same website (e.g., two people's auction), I entered the categories "Auction" and "Short Sale" and I am studying webcrawling. Row 3 is auction, Row 4 is auction, If you run the auction (line 3 url_), you get the tots value If you run a public sale (line 4 url_), you won't get a tots value. (Only do one of the lines 3 and 4) /
By analyzing two HTMLs, 'div.page' is the only one.
Finally, you want to get the (square) value in the html code. (14032 at auction, 2153 at short sale) It is not easy to extract a value in the middle of a string. I don't know if I can ask you this. I've been doing it for hours, but I don't know.
import urllib.request<a>
from bs4 import BeautifulSoup
url = 'http://www.dooinauction.com/auction/ca_list.php' #The auction area<a>
url = 'http://www.dooinauction.com/pubauct/list.php' #Salesfield<a>
req = urllib.request.Request(url)
html = urllib.request.urlopen(req).read()
soup = BeautifulSoup(html, 'html.parser')
tots = soup.select('div.pagn')
print('Test end')
Announcement page html
sold page html
The reason why they say no is because they receive data dynamically and do it on the client side.
import re
import requests
from bs4 import BeautifulSoup
url = 'http://www.dooinauction.com/auction/ca_list.php' #The auction area
html = requests.get(url).content
soup = BeautifulSoup(html, 'html.parser')
tots = soup.select('div.pagn a')
results = [re.findall(r'total_record=([0-9]+)', link['href'])[0] for link in tots]
print(results)
['14568',
'14568',
'14568',
'14568',
'14568',
'14568',
'14568',
'14568',
'14568',
'14568',
'14568']
The problem is that it's a public sale.You need to use the link below. You can get xml, so you can parse it and use it.
import re
import requests
from bs4 import BeautifulSoup
url = 'http://www.dooinauction.com/xml/pubauct_list.php?pdNo=&pdStatus=1&sdate=&edate=&g_sprice=0&g_eprice=0&ctgr1=0&ctgr2=0&l_sprice=0&l_eprice=0&sido=0&gugun=0&dong=0&ref_page=&ref_sido=&ref_gugun=&ref_dong=&decrease=0&order_type=0&list_scale=20&page_scale=10&start=0&total_record=0' #public sale sector
html = requests.get(url).content
soup = BeautifulSoup(html, 'lxml-xml')
print(soup.find('total_record').text)
2153
Page 1 has 0 start
Page 2 starts at 20
Page 3 start is 40
I'm sure you understand.
555 rails db:create error: Could not find mysql2-0.5.4 in any of the sources
555 Who developed the "avformat-59.dll" that comes with FFmpeg?
563 Uncaught (inpromise) Error on Electron: An object could not be cloned
799 When building Fast API+Uvicorn environment with PyInstaller, console=False results in an error
564 GDB gets version error when attempting to debug with the Presense SDK (IDE)
© 2024 OneMinuteCode. All rights reserved.