from urllib.request import urlopen
from bs4 import BeautifulSoup
response = urlopen("https://music.naver.com/listen/top100.nhn?domain=TOTAL&duration=1d")
b_html = response.read()
s_html = b_html.decode()
bs = BeautifulSoup(s_html,"html.parser")
mugic_dictionary={}
for x in bs.find('tbody').findAll('tr'):
ranking = x.find(attrs = {'class' : 'ranking'})
name = x.find(attrs = {'class' : 'name'}).find(attrs = {'class' : 'ellipsis'})
if name != None and ranking != None:
mugic_dictionary[ranking.text] = name.text
print(mugic_dictionary)
With this type, https://ko.wikipedia.org/wiki/%EC%9D%B8%EA%B5%AC%EC%88%9C_%EB%82%98%EB%9D%BC_%EB%AA%A9%EB%A1%9D Which part should I modify if I want to modify the code that shows the number of countries and populations on the site?
beautifulsoup python scraping web dictionary
You must first analyze the target site with the browser developer tool .
The site has two elements with the tbody tag, and the goal tbody is the first, so you can find the table immediately with find('tbody').
Note that the tbody search code is a little loose, so if the target site structure changes and there is another table in front of the target table, the wicode should be reinforced to ensure that the table is imported even if the site structure changes.
For more information, see bs4 site.
If you transform the example code without compromising it as much as possible: I think it would be good to change the name of the mugic_dictionary to a proper name.
from urllib.request import urlopen
from bs4 import BeautifulSoup
response = urlopen("https://ko.wikipedia.org/wiki/%EC%9D%B8%EA%B5%AC%EC%88%9C_%EB%82%98%EB%9D%BC_%EB%AA%A9%EB%A1%9D")
b_html = response.read()
s_html = b_html.decode()
bs = BeautifulSoup(s_html,"html.parser")
mugic_dictionary={}
for x in bs.find('tbody').findAll('tr'):
line_values = x.findAll('td')
if line_values:
name = line_values[0].get_text()
population = line_values[1].get_text().replace(',', '')
if name and population:
mugic_dictionary[name] = int(population)
print(mugic_dictionary)
© 2024 OneMinuteCode. All rights reserved.