I want to get the menu data from the web page above and print it out, but it's not working well. The code I wrote is as follows.
//#encoding = utf-8
import urllib.request
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = "https://mensaar.de/#/menu/sb/"
html=urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
meal_list=soup.findAll("div")
print (meal_list)
I try to print out all the div tags for the test, but the result value is omitted a lot as shown below.
[<div class="navbar-header">
<button class="navbar-toggle" data-target="#mensaar-navbar-collapse" data-toggle="collapse" type="button">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
</div>, <div class="container-fluid">
<div class="collapse navbar-collapse" id="mensaar-navbar-collapse">
<ul class="nav navbar-nav">
<li data-match-route="^/$"><a class="mensaar-brand" data-ng-click="collapseNavbar()" href="#/">MenSaar.de</a></li>
<li data-match-route="^/menu(/\w+)?$" $" data-ng-cloak=""><a data-ng-click="collapseNavbar()" href="#/menu">Speiseplan</a></li>
</ul>
<ul class="nav navbar-nav navbar-right">
<li><a data-match-route="^/privacy$" data-ng-click="collapseNavbar()" href="#/privacy">Datenschutz</a></li>
<li><a href="http://www.studentenwerk-saarland.de/de/Impressum-(2)/Impressum">Impressum</a></li>
</ul>
</div>
</div>, <div class="collapse navbar-collapse" id="mensaar-navbar-collapse">
<ul class="nav navbar-nav">
<li data-match-route="^/$"><a class="mensaar-brand" data-ng-click="collapseNavbar()" href="#/">MenSaar.de</a></li>
<li data-match-route="^/menu(/\w+)?$" $" data-ng-cloak=""><a data-ng-click="collapseNavbar()" href="#/menu">Speiseplan</a></li>
</ul>
<ul class="nav navbar-nav navbar-right">
<li><a data-match-route="^/privacy$" data-ng-click="collapseNavbar()" href="#/privacy">Datenschutz</a></li>
<li><a href="http://www.studentenwerk-saarland.de/de/Impressum-(2)/Impressum">Impressum</a></li>
</ul>
</div>, <div data-ng-view="" id="view"></div>]
Process finished with exit code 0
I searched and found that pages containing JavaScript need to be processed separately to get the data completely, so if there is anyone who knows a solution, please answer...!
python crawling beautifulsoup
When the page is completely loaded, data is loaded into ajax. Of course, these parts need to be handled separately.
That is, the request url calls one, but another url within the url.
You need to create a page by combining the values obtained by calling each called url.
However, it is cumbersome to do so, so you can also use a web browser to invoke url and use return values (html).
A typical method is to use selenium webdriver.
The principle is simple, but it automatically launches a web browser to call url and browse to the screen. You can obtain the browsed html code.
You can parse the obtained html using beautiful soap.
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.PhantomJS() # Use phantomjs for browser. i, firefox, chrome are also possible
driver.get('https://mensaar.de/#/menu/sb')
bs = BeautifulSoup(driver.page_source, 'html5lib')
print(bs.findAll("div"))
© 2024 OneMinuteCode. All rights reserved.