I'd like to get an attribute of im:id from feedparser. (I'm analyzing the XML of Apple's iOS ranking data.)

Asked 2 years ago, Updated 2 years ago, 46 views

I am writing a script to get the ranking data of app_store on python.

Will be retrieved from ↓'s XML file
https://itunes.apple.com/jp/rss/topfreeapplications/limit=100/xml

I plan to use feedparser for the library.

Trying to get it with feedparser

feedparser.parse(RSS_URL)
I was looking at the contents.
I was unable to get the following im:id attribute.

<idim:id='443904275'im:bundleId='jp.naver.line'>...</>

I only want to get the 443904275 as described in ↑.

After doing a lot of research, I thought that I was doing something close to you, but as a result, I was not able to implement it well, so I asked you a question.
https://www.ianlewis.org/jp/feedparser_and_media

If feedparser cannot do it, I will use another library.

Thank you for your cooperation.

python

2022-09-30 21:17

1 Answers

http://pythonhosted.org/feedparser/

"I have reviewed the feedparser document above, but it seems that feedparser does not handle the attribute ""im:id='443904275'"" in the first place."

I don't recommend you to mention that you have been referenced because you are changing the behavior of feedparser forcefully with a monkey patch.

Here's a quick example using a different library.

If you install the above two libraries, you can extract "im:id" with the following code.

import requests
from bs4 import BeautifulSoup

URL="https://itunes.apple.com/jp/rss/topfreeapplications/limit=100/xml"
soup = BeautifulSoup(requests.get(URL).content, 'html.parser')
for tag_id in group.find_all('id'):
    im_id=tag_id.get('im:id')
    if im_id:
        print(im_id)


2022-09-30 21:17

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.