While I was working on Python to download the code that can automatically download the past question from Ebsi, I downloaded the web page with beautiful soup, and I downloaded the code that was abbreviated than the html code on the site.
All html codes do not go down when crawling the web with Beautiful Soup.
import requests
from bs4 import BeautifulSoup as bs
login_info = {
'userid': '00000000',
'passwd': '00000000'
}
with requests.session() as s:
login_req = s.post('https://www.ebsi.co.kr/ebs/pot/potl/SSOLoginSubmit.ebs', data = login_info)
print(login_req.status_code)
page_req = s.get('http://www.ebsi.co.kr/ebs/xip/xipc/previousPaperList.ebs')
html = page_req.text
soup = bs(html, 'html.parser')
Looking at the soup value that came out of the code like this
The part that needs to come out like this
(omitted)
</select>
</span>
</em>
</h4>
<div id="div_contentList"></div>
</div>
</div>
</form>
</div>
(Omitted)
It comes out like this.
(It was 5,000 lines to upload the entire result, so I uploaded it as a capture like this.)
How can I solve this cut-off phenomenon?
web-crawling beautifulsoup python
There is a possibility that the client-side script will render additional pages.
In this case, I think we should use Selenium to render and crawl with a browser, not just HTML crawling.
611 GDB gets version error when attempting to debug with the Presense SDK (IDE)
581 PHP ssh2_scp_send fails to send files as intended
915 When building Fast API+Uvicorn environment with PyInstaller, console=False results in an error
578 Understanding How to Configure Google API Key
618 Uncaught (inpromise) Error on Electron: An object could not be cloned
© 2024 OneMinuteCode. All rights reserved.