Python Web Crawling Error Problem

Asked 1 years ago, Updated 1 years ago, 98 views

import requests

from bs4 import BeautifulSoup

import operator

def start(url):
     word_list=[]
    source_code=requests.get(url).text
    soup=BeautifulSoup(source_code,'lxml')
    p_tag=soup.findAll('p',{'class':'tt-post-title'})

    for title_text in p_tag:
        content = title_text.text
        words=content.lower().split()
        for each_word in words:
            word_list.append(each_word)
    clean_up_list(word_list)

def clean_up_list(word_list):
    clean_up_list=[]
    for word in word_list:
        symbols="`~!@#$%^&*()-=_+[]\{}|;',./:\"<>?"
        for i in  range(0,len(symbols)):
            word = word.replace(symbols[i],"")
        if len(word)>0:
        #    #    print(word)
            clean_word_list.append(word)

    create_dictionary(clean_word_list)

def create_dictionary(clean_word_list):
    word_count={}
    for word in clean_word_list:
        if word in word_count:
            word_count[word] +=1
        else:
            word_count[word]=1
    for key, value in sorted(word_count.items(),key=operator.itemgetter(1)):
        print(key,value)

start('https://creativeworks.tistory.com')

I'm a student studying Python following the T-story blog. There was an unknown error during coding, so I'm posting a question. The code is as above Currently coding in MAC and using ATOM IDE. Python version is 3.7.1. Compiled above code

Traceback (most recent call last):
  File "/Users/stronghu/Documents/test36.py", line 44, in <module>
    start('https://creativeworks.tistory.com')
  File "/Users/stronghu/Documents/test36.py", line 19, in start
    clean_up_list(word_list)
  File "/Users/stronghu/Documents/test36.py", line 31, in clean_up_list
    create_dictionary(clean_word_list)
NameError: name 'clean_word_list' is not defined

This error occurs. The create_dictionary (clean_word_list) was originally supposed to run outside the for statement, so I coded it like the code above, but if the create_dictionary (clean_word_list) statement goes inside the for statement, it will be compiled without a problem.

web-crawling crawling web-crawler

2022-09-21 23:23

2 Answers

I think it would be good to take the example code and write it down as it is, and then if there is a problem, ask the person who made the example first.


2022-09-21 23:23

def clean_up_list(word_list):
    clean_up_list=[]
    for word in word_list:
        symbols="`~!@#$%^&*()-=_+[]\{}|;',./:\"<>?"
        for i in  range(0,len(symbols)):
            word = word.replace(symbols[i],"")
        if len(word)>0:
        #    #    print(word)
            clean_word_list.append(word)

In the above function, clean_word_list is an object that does not exist.

Didn't you mean to append to clean_up_list?


2022-09-21 23:23

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.