I want to frame the JSON I got from the API in Python
Continued with
With the code below, the same page will be removed even if i looped with numbers.
Is there any way to avoid this?Or is it not supported?
hb_count='http://api.b.st-hatena.com/entrylist/json?sort=count&page='+str(i)
Also, the order of data in JSON (count, link, etc.) is not constant. (It's not causing problems during data framing, so I'm concerned about it.)
I look forward to your kind cooperation.
python json pandas api
It seems that the number of entries that can be retrieved is fixed in the Hatena Bookmark API, so I tried web scraping.Install the BeautifulSoup package beforehand.
get_entry.py
import requests
import json
from bs4 import BeautifulSoup
def get_entry(url, num_page):
entry = [ ]
for pin range(num_page):
r=requests.get(
hb_count, params = {'url': url, 'sort': 'count', 'page': p+1})
soup = BeautifulSoup(r.content, 'html.parser')
for s in group.select ('div.entrylist-content-main'):
atr=s.select('h3.entrylist-contents-title a')[0]
cnt=s.select('span.entrylist-contents-users as span')[0].text
entry.append({
'title': atr ['title'], 'url': atr ['href'], 'count': cnt})
return json.dumps(entry)
if__name__=='__main__':
hb_count='https://b.hatena.ne.jp/entrylist'
url='https://newspicks.com/'
num_page=3
json_data=get_entry(url,num_page)
print(json_data)
Run Results
$python3get_entry.py|jq'.'
[
{
title: "What is Google's list of questions to ask at the job interview?"
"url": "https://newspicks.com/news/951070/body",
"count": "1035"
},
{
title: "Ask former employees, "Why Ever Note got into a serious situation" (first part)",
"url": "https://newspicks.com/news/1237596/body/",
"count": "854"
},
{
"Title": "[New] The most advanced country, is there tomorrow for Japanese office workers who don't study?"",
"url": "https://newspicks.com/news/2647674/",
"count": "581"
},
:
:
© 2024 OneMinuteCode. All rights reserved.