https://fifa.com/worldcup/players/browser/
I want to crawl all the ID codes for each player here, but I can't find the ID codes for each player on HTml. Can you tell me where it is in HTML? If not, is there a way to get the ID for each player?
ex) Aaron Mooy -> 312252
python crawling html
You can do it as below.
In [6]: soup.find('a', {'data-player-id' : '312252'})
Out[6]:
<a class="fi-p--link " data-player-id="312252" href="/worldcup/players/player/312252/">
<div class="fi-p">
<div class="fi-p__picture">
<svg class="fi-clip-svg" id="" viewBox="0 0 200 200">
<image class="image-r image-responsive" height="100%" width="100%" xlink:href="https://api.fifa.com/api/v1/picture/players/2018fwc/312252_sq-300_jpg?allowDefaultPicture=true"></image>
</svg>
<div class="fi-p__flag" data-countrycode="aus">
<div class="fi-p__flag__wrapper">
<img alt="AUS" class="fi-AUS fi-flag--4" src="http://api.fifa.com/api/v1/picture/flags-fwc2018-4/aus" title="AUS"/>
</div>
</div>
<div class="fi-p__jerseyNum ">
<span class="fi-p__num">13</span>
</div>
</div>
<div class="fi-p__wrapper-text">
<div class="fi-p__name">
Aaron MOOY
</div>
<div class="fi-p__country">
Australia
</div>
<div class="fi-p__role">
Midfielder </div>
</div>
</div>
</a>
In [7]: soup.find('a', {'data-player-id' : '312252'}).attrs['data-player-id']
Out[7]: '312252'
Players' information is not a questionable address.
The list of competitors is below.
https://www.fifa.com/worldcup/players/_libraries/byposition/all/_players-list
Player id extraction code.
import requests
import bs4
contents = requests.get('https://www.fifa.com/worldcup/players/_libraries/byposition/all/_players-list').content
soup = bs4.BeautifulSoup(contents, 'html5lib')
players_id = [tag.attrs['data-player-id'] for tag in soup.findAll(lambda tag:tag.has_attr('data-player-id'))]
© 2024 OneMinuteCode. All rights reserved.