Get corporate disclosure information dart.fss.or.kr crawl text and links

Asked 2 years ago, Updated 2 years ago, 56 views

http://dart.fss.or.kr/dsac001/mainAll.do

How can I get the link of the company subject to disclosure/report name/report of about 5 disclosures, starting with the latest one among the balls exposed in the above link?

I know it's in <tbody> but I don't know how to get it.

python crawling

2022-09-21 12:37

2 Answers

https://pypi.org/project/dart-fss/

There's something like this. I haven't tried it.


2022-09-21 12:37

The above respondent introduced the relevant module, but I'm uploading a simple crawling code just in case.

Code that crawls the company name, report name, and report address to be disclosed.

from bs4 import BeautifulSoup
import urllib.request
import re

# HTML parsing
with urllib.request.urlopen("http://dart.fss.or.kr/dsac001/mainAll.do") as response:
    html = response.read()
    soup = BeautifulSoup(html, 'html.parser')

trs = soup.findAll('tr')[1:5+1] #Top 5 list
for tr in trs:
    td = tr.findAll('td')
    company = re.sub(r'[\t\n\r ]', '', td[1].find('a').text) # Name of the company subject to disclosure
    report = re.sub(r'[\t\n\r ]', '', tr.findAll('td')[2].find('a').text) # Report name
    report_link = 'dart.fss.or.kr' + tr.findAll('td')[2].find('a').attrs['href'] # Report Address

    print('company name subject to disclosure:', company, '\t report name:', report, '\t report address:', report_link) # output


2022-09-21 12:37

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.