Retrieving data across multiple pages in the API in Python

I want to frame the JSON I got from the API in Python
Continued with

With the code below, the same page will be removed even if i looped with numbers.
Is there any way to avoid this?Or is it not supported?

hb_count='http://api.b.st-hatena.com/entrylist/json?sort=count&page='+str(i)

Also, the order of data in JSON (count, link, etc.) is not constant. (It's not causing problems during data framing, so I'm concerned about it.)

I look forward to your kind cooperation.

python json pandas api

2022-09-30 10:44

1 Answers

It seems that the number of entries that can be retrieved is fixed in the Hatena Bookmark API, so I tried web scraping.Install the BeautifulSoup package beforehand.

get_entry.py

import requests
import json
from bs4 import BeautifulSoup

def get_entry(url, num_page):
  entry = [ ]
  for pin range(num_page):
    r=requests.get(
      hb_count, params = {'url': url, 'sort': 'count', 'page': p+1})
    soup = BeautifulSoup(r.content, 'html.parser')
    for s in group.select ('div.entrylist-content-main'):
      atr=s.select('h3.entrylist-contents-title a')[0]
      cnt=s.select('span.entrylist-contents-users as span')[0].text
      entry.append({
        'title': atr ['title'], 'url': atr ['href'], 'count': cnt})

  return json.dumps(entry)

if__name__=='__main__':
  hb_count='https://b.hatena.ne.jp/entrylist'
  url='https://newspicks.com/'
  num_page=3

  json_data=get_entry(url,num_page)
  print(json_data)

Run Results

$python3get_entry.py|jq'.'

[
  {
    title: "What is Google's list of questions to ask at the job interview?"
    "url": "https://newspicks.com/news/951070/body",
    "count": "1035"
  },
  {
    title: "Ask former employees, "Why Ever Note got into a serious situation" (first part)",
    "url": "https://newspicks.com/news/1237596/body/",
    "count": "854"
  },
  {
    "Title": "[New] The most advanced country, is there tomorrow for Japanese office workers who don't study?"",
    "url": "https://newspicks.com/news/2647674/",
    "count": "581"
  },
                            :
                            :

2022-09-30 10:44

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656