Python Naver search result crawling problem..

Asked 2 years ago, Updated 2 years ago, 55 views

The results are printed as follows: ('', Gyeongnam 'COVID-19' has one more confirmed case, and it has increased to 23 (one step), '' ('', [1st step] 60 new COVID-19 patients...A total of 893 confirmed cases in Korea, ')

I think the result I wanted to make was printed as below. It's not working. Overall, it is a problem, but I don't know why I can't read href.

One more confirmed case of "COVID-19" in Gyeongsangnam-do has increased to 23 (1 step)

http://www.newsis.com/view/?id=NISX20200225_0000931012&cID=10812&pID=10800

[1st step] 60 new COVID-19 patients...A total of 893 confirmed cases in Korea

http://www.seoulwire.com/news/articleView.html?idxno=400533

Please help me.

import requests
import urllib.request
from bs4 import BeautifulSoup
from apscheduler.schedulers.blocking import BlockingScheduler

sched = BlockingScheduler()

old_newsflashs = []

def extract_newsflashs(old_newsflashs=[]):
    url = 'https://m.search.naver.com/search.naver?where=m_news&query=1step&sm=mtb_tnw&sort=1'
    req = requests.get(url)
    html = req.text
    soup = BeautifulSoup(html, 'html.parser')

    search_result = soup.select_one('#news_result_list')
    result_list = search_result.select('.bx > .news_wrap > a')

    news_list = []
    for title_list in result_list:
        title = (title_list.get_text())
        news_link = title_list['href']


        If 'Corona' title:
            news_list.append([title, news_link])

    newsflashs = []
    for news_list in result_list[:10]:
        newsflash = news_list
        newsflashs.append(newsflash)

    new_newsflashs=[]
    for newsflash in newsflashs:
        if newsflash not in old_newsflashs:
            new_newsflashs.append(newsflash)

    return new_newsflashs

def send_newsflashs():
    global old_newsflashs
    new_newsflashs = extract_newsflashs(old_newsflashs)
    if new_newsflashs:
        for newsflash in new_newsflashs:
            print(tuple(newsflash))
    else:
        pass
    old_newsflashs += new_newsflashs.copy()
    old_newsflashs = list(map(list, set(map(tuple, old_newsflashs))))

send_newsflashs()

sched.add_job(send_newsflashs, 'interval', seconds=60)

sched.start()

python crawling naver

2022-09-21 12:16

1 Answers

    news_list = []
    for title_list in result_list:
        title = (title_list.get_text())
        news_link = title_list['href']

        If 'Corona' title:
            news_list.append([title, news_link])

I think we need to go through the whole code again, like you said?

The reason why href doesn't print out seems to be because it only contains news_list containing href and doesn't use it anywhere.


2022-09-21 12:16

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.