Hello! I'm a beginner at coding. I've been trying to scribble stock data on the Nav Securities site for the past few days, but I've had a hard time because of the continuous errors. I'd appreciate it if you could help meㅠ<
First, the crawl url is https://finance.naver.com/item/sise_day.nhn?code=068270&page=1 and is imported through urlopen and read as Beautiful Soup. If you try to print up to here, the page cannot be found, you will see an error_content message and no stock information will be printed. I thought about the encoding problem, but I couldn't find the answer. Please!
Code:
url = 'https://finance.naver.com/item/sise_day.nhn?code=068270&page=1'
with urlopen(url) as doc:
html = BeautifulSoup(doc, 'lxml')
print(html)
pgrr = html.find('td', class_='pgRR')
s = str(pgrr.a['href']).split('=')
last_page = s[-1]
Output:
AttributeError: 'NoneType' object has no attribute 'a'
Next time, please upload the full code including the module you are using.
When html is output, it is output as follows.
In this case, you need to use selenium or find another way.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<title>Naver:: All knowledge in the world, Naver</title>
<style type="text/css">
.error_content * {margin:0;padding:0;}
.error_content img{border:none;}
.error_content em {font-style:normal;}
.error_content {width:410px; margin:80px auto 0; padding:57px 00; font-size:12px; font-family:"sharing Gothic", "Nanum Gothic", "Standing", Dotum, Apple Gothic, Sans-serialign (https://ssl.pstatic.net/static/common/error/090610/bg_thumb; next month)gif) no-repeat center top; white-space:nowrap;}
.error_content p{margin:0;}
.error_content .error_desc {margin-bottom:21px; overflow:hidden; text-align:center;}
.error_content .error_desc2 {margin-bottom:11px; padding-bottom:7px; color:#888; line-height:18px; border-bottom:1px solid #eee;}
.error_content .error_desc3 {clear:both; color:#888;}
.error_content .error_desc3 a {color:#004790; text-decoration:underline;}
.error_content .error_list_type {clear:both; float:left; width:410px; _width:428px; margin:0 0 18px 0; *margin:0 0 7px 0; padding-bottom:13px; font-size:13px; color:#000; line-height:18px; border-bottom:1px solid #eee;}
.error_content .error_list_type dt {float:left; width:60px; _width /**/:70px; padding-left:10px; background:url(https://ssl.pstatic.net/static/common/error/090610/bg_dot.gif) no-repeat 2px 8px;}
.error_content .error_list_type dd {float:left; width:336px; _width /**/:340px; padding:0 0 0 4px;}
.error_content .error_list_type dd span {color:#339900; letter-spacing:0;}
.error_content .error_list_type dd a{color:#339900;}
.error_content p.btn{margin:29px 0 100px; text-align:center;}
</style>
</head>
<!-- ERROR -->
<body>
<div class="error_content">
<p class="error_desc"><imgalt="page not found" height="30" src="https://ssl.pstatic.net/static/common/error/090610/txt_desc5.gif" width="319"/></p>
<p class="error_desc2">The address of the page you are trying to visit is entered incorrectly, or <br/>
The page you requested could not be found because the address of the page has been changed or deleted.<br/>
Please check again if the address you entered is correct.
</p>
<p class="error_desc3">For inquiries, refer to <a href="https://help.Please let us know at naver.com/"target="_blank">Customer Center</a> and we will kindly guide you. Thank you.</p>
<p class="btn">
<a href="javascript:history.back()"><imgalt="previous page" height="35" src="https://ssl.pstatic.net/static/common/error/090610/btn_prevpage.gif" width="115"/></a>
<a href="https://finance.naver.com"><imgalt="to financial home" height="35" src="https://ssl.pstatic.net/static/nfinance/btn_home.gif" width="115"/></a>
</p>
</div>
</body>
</html>
I don't know how to crawl with urlips.
Using the requests module, you can:
import requests
from bs4 import BeautifulSoup
url = 'https://finance.naver.com/item/sise_day.nhn?code=068270&page=1'
headers = {#user agent}
r = requests.get(url, headers = headers)
soup = BeautifulSoup(r.text, 'html.parser')
tag = soup.select_one('td.pgRR > a')['href']
sp = tag.split('=')
print(sp)
# # ['/item/sise_day.nhn?code', '068270&page', '386']
574 Who developed the "avformat-59.dll" that comes with FFmpeg?
611 GDB gets version error when attempting to debug with the Presense SDK (IDE)
581 PHP ssh2_scp_send fails to send files as intended
578 Understanding How to Configure Google API Key
618 Uncaught (inpromise) Error on Electron: An object could not be cloned
© 2024 OneMinuteCode. All rights reserved.