http://dart.fss.or.kr/dsac001/mainAll.do
How can I get the link of the company subject to disclosure/report name/report of about 5 disclosures, starting with the latest one among the balls exposed in the above link?
I know it's in <tbody>
but I don't know how to get it.
The above respondent introduced the relevant module, but I'm uploading a simple crawling code just in case.
Code that crawls the company name, report name, and report address to be disclosed.
from bs4 import BeautifulSoup
import urllib.request
import re
# HTML parsing
with urllib.request.urlopen("http://dart.fss.or.kr/dsac001/mainAll.do") as response:
html = response.read()
soup = BeautifulSoup(html, 'html.parser')
trs = soup.findAll('tr')[1:5+1] #Top 5 list
for tr in trs:
td = tr.findAll('td')
company = re.sub(r'[\t\n\r ]', '', td[1].find('a').text) # Name of the company subject to disclosure
report = re.sub(r'[\t\n\r ]', '', tr.findAll('td')[2].find('a').text) # Report name
report_link = 'dart.fss.or.kr' + tr.findAll('td')[2].find('a').attrs['href'] # Report Address
print('company name subject to disclosure:', company, '\t report name:', report, '\t report address:', report_link) # output
© 2024 OneMinuteCode. All rights reserved.