Web crawling values are different

Asked 2 years ago, Updated 2 years ago, 13 views

//import requests
from bs4 import BeautifulSoup

headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}

_url = "https://band.us/discover/band-search/"
keyword = "Travel"
url = _url+keyword

data = requests.get(url,headers=headers)
soup = BeautifulSoup(data.text, 'html.parser')

band_info = soup.select('#content > main > div > div._globalSearchContentRegion > div > section > h2')

print(url)
print(band_info)

Hello. As you can see on Chrome web, I'd like to get the information of Naver band at that address, but the actual value is different. The band_info eventually returns only the empty list.

I wonder if there is a way to bring it properly. Just select('#content') didn't bring any value Let's raise the question.

python

2022-09-20 21:40

2 Answers

If it is dynamically rendered with javascript, even if I know the selector of html, I cannot bring it.

If you can't bring it to the selector, I think it would be a good idea to use xPath. Since you are a Python user, if you cannot import it only with beautiful Soup, you can also use the jquery expression $$ and $ using selenium.

The difference between $ and $$ is You can find it here :)


2022-09-20 21:40

You can check xpath on Chrome developer console. I think you can bring it to that xpath.


2022-09-20 21:40

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.