Please make an attribute error while practicing Python web crawling

Asked 2 years ago, Updated 2 years ago, 18 views

It's a code from a book called 100 for practitioners It's embarrassing if some code runs well and some code doesn't work ㅠ<

I ran it on repl.it Data crawling is so hard ㅠ 코드가 The code could have followed I can understand the algorithm, but these errors are nothing

I get this error Traceback (most recent call last): File "main.py", line 14, in target_img_src = target_img.get('src') AttributeError: 'NoneType' object has no attribute 'get'  

The code is as follows

import requests from bs4 import BeautifulSoup

url = "https://en.wikipedia.org/wiki/Seoul_Metropolitan_Subway" resp = requests.get(url) html_src = resp.text

soup = BeautifulSoup(html_src, 'html.parser')

target_img = soup.find(name='img', attrs={'alt':'Seoul-Metro-2004-20070722.jpg'}) print('HTML Element: ', target_img) print("\n")

target_img_src = target_img.get('src') print('image file path: ', target_img_src) print("\n")

target_img_resp = requests.get('http:' + target_img_src) out_file_path = "./output/download_image.jpg"

with open(out_file_path, 'wb') as out_file: out_file.write(target_img_resp.content) print("Saved as an image file.")

python

2022-09-20 14:36

2 Answers

To explain the error simply, "None type means that an object cannot be executed as get". And the code of the 14th line that is causing the error is as follows.

target_img_src = target_img.get('src')

Where did the variable src come from here? Have you ever set a setting that you didn't show us what src is?

Below is more than the 14th line, which means that it's from the 1st to the 13th line.)

# External Library Contents
import requests
from bs4 import BeautifulSoup

#Web-crawling address and the results of the crawl are included in the soup variable.
url = "https://en.wikipedia.org/wiki/Seoul_Metropolitan_Subway"
resp = requests.get(url)
html_src = resp.text
soup = BeautifulSoup(html_src, 'html.parser')

Finds and contains the attribute img in the variable #target_img
(This part is my prediction, so I think it could be wrong.)
target_img = soup.find(name='img', attrs={'alt':'Seoul-Metro-2004-20070722.jpg'})
print('HTML Element: ', target_img)
print("\n")

As you can see, there is no sentence in the form of src = ~~~~~~~ anywhere in the content.

If you wanted to do something with src, you had to make a variable called src and put something in it.

My conclusion is that you should have put a different variable, but I think it was an error caused by using only src by mistake.

ps: And if you want to ask questions, at least this amount of annotation and markdown should be used for readability.

Especially if you're studying Python, line alignment and indentation are essential.

If you scratch and paste the code as it is, it doesn't work, so my conclusion may be wrong because I lined it up properly.


2022-09-20 14:36

Yes, thank you I didn't even know how to ask questions because it's my first time here. The indent was all set, but I didn't know it was so badly pasted while copying and pasting the codecrying I didn't know why it was wrong because the code was copied from the book, "The Python 100 System for practitioners." I also asked the author, but there is no answer yet When there was an error, I checked that the contents of HTml were modified and corrected a little bit I finally understand what you said Thank you so muchㅜㅜ I don't know if it'll be solved, but I'll refer to the content and revise it again and post a question Thank you


2022-09-20 14:36

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.