I'm a beginner at Python web crawling. The goal is to extract the Naver News Inlink address using Beautiful Soup.
In other words, In the attached image, "https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=106&oid=382&aid=0000896" is not ""http://sports.donga.com/"I'd like to scratch 566".
I coded as below, but only http://sports.donga.com/ is printed...
Please save me
from urlib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('https://search.naver.com/search.naver?&where=news&query=%22``%5B%EB%8B%A8%EB%8F%85%5D%22&sm=tab_pge&sort=1&photo=0&field=0&reporter_article=&pd=0&ds=&de=&docid=&nso=so:dd,p:all,a:all&mynews=0&refresh_start=0&start=1')
bsObject = BeautifulSoup(html, "html.parser")
news_urls = []
for cover in bsObject.find_all('li', {'class':'bx'}):
link = cover.select('a.info')[0].get('href')
news_urls.append(link)
print(news_urls)
It seems that you want to get a link to the news you searched on Naver as a search word.
You can modify it as follows.
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('https://search.naver.com/search.naver?&where=news&query=%22``%5B%EB%8B%A8%EB%8F%85%5D%22&sm=tab_pge&sort=1&photo=0&field=0&reporter_article=&pd=0&ds=&de=&docid=&nso=so:dd,p:all,a:all&mynews=0&refresh_start=0&start=1')
bsObject = BeautifulSoup(html, "html.parser")
news_urls = []
c = bsObject.find_all('a', {'class':'info'})
for cover in c:
link = cover['href']
if link.find('https://news.naver.com') != -1:
news_urls.append(link)
print(news_urls)
>> ['https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=100&oid=081&aid=0003171086',
'https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=106&oid=018&aid=0004876243',
'https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=105&oid=092&aid=0002216140',
'https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=106&oid=015&aid=0004513548']
566 Understanding How to Configure Google API Key
583 Uncaught (inpromise) Error on Electron: An object could not be cloned
853 When building Fast API+Uvicorn environment with PyInstaller, console=False results in an error
571 PHP ssh2_scp_send fails to send files as intended
562 Who developed the "avformat-59.dll" that comes with FFmpeg?
© 2024 OneMinuteCode. All rights reserved.