There is an index error when parsing the xml page using python3 and beautiful soup.

Asked 1 years ago, Updated 1 years ago, 118 views


BASE_URL = "http://openapi.gbis.go.kr/ws/rest/buslocationservice?serviceKey=%2BFdkupBYoTx3q0Sd%2B6GFPa6NZ0Quorkb0guP7oMfTj8I75dQKX8vhMXO4QoY6KLZwx%2Bja8eT7irD11Gxv31t1g%3D%3D&routeId=200000085"

responses = requests.get(BASE_URL)
print (responses.status_code)
dom = BeautifulSoup(responses.content, "html.parser")
elements = dom.select("span.text")
element = elements[0]

If you run this code, the following index error occurs:

element = elements[0]

IndexError: list index out of range

I want to get routeId, stationId, stationSeq, etc. from that page No matter how many times I change the code, the index error is because I set the wrong tag name in select?

xml beautifulsoup parsing python3

2022-09-22 11:38

1 Answers

If you connect to the url, it returns xml data, not html. Therefore, you should import data using xml parser, not html parser. If you look at the beautifulsoup document, use lxml parser as xml parser. First, install lxml parser and run the code below.

from bs4 import BeautifulSoup
import requests

BASE_URL = "http://openapi.gbis.go.kr/ws/rest/buslocationservice?serviceKey=%2BFdkupBYoTx3q0Sd%2B6GFPa6NZ0Quorkb0guP7oMfTj8I75dQKX8vhMXO4QoY6KLZwx%2Bja8eT7irD11Gxv31t1g%3D%3D&routeId=200000085"

responses = requests.get(BASE_URL)
soup = BeautifulSoup(responses.content, 'lxml-xml')

for busLocation in soup.findAll('busLocationList'):
  stationId = busLocation.find('stationId')
  plateNo = busLocation.find('plateNo')
  print("StationId: " + stationId.string + " PlateNo: " + plateNo.string)


2022-09-22 11:38

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.