Python html crawling

Asked 2 years ago, Updated 2 years ago, 95 views

https://fifa.com/worldcup/players/browser/

I want to crawl all the ID codes for each player here, but I can't find the ID codes for each player on HTml. Can you tell me where it is in HTML? If not, is there a way to get the ID for each player?

ex) Aaron Mooy -> 312252

python crawling html

2022-09-22 20:31

2 Answers

You can do it as below.

In [6]: soup.find('a', {'data-player-id' : '312252'})
Out[6]: 
<a class="fi-p--link " data-player-id="312252" href="/worldcup/players/player/312252/">
    <div class="fi-p">


      <div class="fi-p__picture">
        <svg class="fi-clip-svg" id="" viewBox="0 0 200 200">
              <image class="image-r image-responsive" height="100%" width="100%" xlink:href="https://api.fifa.com/api/v1/picture/players/2018fwc/312252_sq-300_jpg?allowDefaultPicture=true"></image>

        </svg>

          <div class="fi-p__flag" data-countrycode="aus">
            <div class="fi-p__flag__wrapper">


  <img alt="AUS" class="fi-AUS fi-flag--4" src="http://api.fifa.com/api/v1/picture/flags-fwc2018-4/aus" title="AUS"/>
            </div>
          </div>
            <div class="fi-p__jerseyNum ">
      <span class="fi-p__num">13</span>
    </div>
      </div>


      <div class="fi-p__wrapper-text">
        <div class="fi-p__name">
          Aaron MOOY
        </div>
          <div class="fi-p__country">
            Australia
          </div>
                    <div class="fi-p__role">
Midfielder            </div>



      </div>
    </div>
  </a>

In [7]: soup.find('a', {'data-player-id' : '312252'}).attrs['data-player-id']
Out[7]: '312252'


2022-09-22 20:31

Players' information is not a questionable address.

The list of competitors is below.

https://www.fifa.com/worldcup/players/_libraries/byposition/all/_players-list

Player id extraction code.

import requests
import bs4

contents = requests.get('https://www.fifa.com/worldcup/players/_libraries/byposition/all/_players-list').content
soup = bs4.BeautifulSoup(contents, 'html5lib')

players_id =  [tag.attrs['data-player-id'] for tag in soup.findAll(lambda tag:tag.has_attr('data-player-id'))]


2022-09-22 20:31

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.