Python Web Crawling, how to select a specific anchor among multiple anchors in html

Asked 2 years ago, Updated 2 years ago, 21 views

1 Answers

It seems that you want to get a link to the news you searched on Naver as a search word.

You can modify it as follows.

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen('https://search.naver.com/search.naver?&where=news&query=%22``%5B%EB%8B%A8%EB%8F%85%5D%22&sm=tab_pge&sort=1&photo=0&field=0&reporter_article=&pd=0&ds=&de=&docid=&nso=so:dd,p:all,a:all&mynews=0&refresh_start=0&start=1')
bsObject = BeautifulSoup(html, "html.parser")


news_urls = []
c = bsObject.find_all('a', {'class':'info'})
for cover in c:
    link = cover['href']
    if link.find('https://news.naver.com') != -1:
        news_urls.append(link)
print(news_urls)

>> ['https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=100&oid=081&aid=0003171086',
'https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=106&oid=018&aid=0004876243',
'https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=105&oid=092&aid=0002216140',
'https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=106&oid=015&aid=0004513548']


2022-09-20 17:39

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.