There is an index error when parsing the xml page using python3 and beautiful soup.


BASE_URL = "http://openapi.gbis.go.kr/ws/rest/buslocationservice?serviceKey=%2BFdkupBYoTx3q0Sd%2B6GFPa6NZ0Quorkb0guP7oMfTj8I75dQKX8vhMXO4QoY6KLZwx%2Bja8eT7irD11Gxv31t1g%3D%3D&routeId=200000085"

responses = requests.get(BASE_URL)
print (responses.status_code)
dom = BeautifulSoup(responses.content, "html.parser")
elements = dom.select("span.text")
element = elements[0]

If you run this code, the following index error occurs:

element = elements[0]

IndexError: list index out of range

I want to get routeId, stationId, stationSeq, etc. from that page No matter how many times I change the code, the index error is because I set the wrong tag name in select?

xml beautifulsoup parsing python3

2022-09-22 11:38

1 Answers

If you connect to the url, it returns xml data, not html. Therefore, you should import data using xml parser, not html parser. If you look at the beautifulsoup document, use lxml parser as xml parser. First, install lxml parser and run the code below.

from bs4 import BeautifulSoup
import requests

BASE_URL = "http://openapi.gbis.go.kr/ws/rest/buslocationservice?serviceKey=%2BFdkupBYoTx3q0Sd%2B6GFPa6NZ0Quorkb0guP7oMfTj8I75dQKX8vhMXO4QoY6KLZwx%2Bja8eT7irD11Gxv31t1g%3D%3D&routeId=200000085"

responses = requests.get(BASE_URL)
soup = BeautifulSoup(responses.content, 'lxml-xml')

for busLocation in soup.findAll('busLocationList'):
  stationId = busLocation.find('stationId')
  plateNo = busLocation.find('plateNo')
  print("StationId: " + stationId.string + " PlateNo: " + plateNo.string)

2022-09-22 11:38

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656