Scrap text information after the "View More" button on a web page

Asked 2 years ago, Updated 2 years ago, 15 views

You are trying to get text information from a page by scraping.
There is a "View More" button in the middle of the page, and I would like to get all the information that follows.
I'm in trouble because I only got information before "See More".

】Additional comments br
クエリパラメータで表示件数を調整できる場合もあることを、I found out from another question, but can I use the query parameters on this page as well?I'm sorry if you don't know much about the web and you're not considering it correctly.

url='https://prtimes.jp/main/action.php?run=html&page=searchkey&search_word=%E3%81%8A%E3%81%AB%E3%81%8E%E3%82%8A&search_pattern=1'

res=requests.get(url)
soup=bs4.BeautifulSoup(res.text, features='lxml')
rvws = group.find_all (class_="link-title-item link-title-item-ordinary") 

reviews_text=[ ]
for i in range (len(rvws)) :
    reviews_text.append(rvws[i].text)
reviews_text

python

2022-09-30 16:54

1 Answers

Can I use the query parameters on this page as well?

Immediately after clicking "See More", I looked at the Firefox web console and found:

Web Console

The query parameters will be page and limit. I'm not sure if the parameter random is a literal random number, but if you set it appropriately, you'll get JSON format data back.However, since the response data is constructed as addReleaseList(JSON format data), take it out as follows (example command line execution):

##page=1,limit=20:first 20 pieces of data
$ curl-s'https://prtimes.jp/api/search_release.php?callback=addReleaseList&type=search&page=1&v=%E3%81%8A%E3%81%AB%E3%81%8E%E3%82%8A&limit=20&random=1574862397013' |
  grep-Po'\AaddReleaseList\(\K.+(?=\)\Z)'|jq-r.|head-n 10

{
  "status": {
    "code"—200
  },
  "type": "search",
  "articles": [
    {
      "id": 174,
      Title: "SALON GINZA SABOU ""Ultimate Onigiri"" made from the best rice in Japan will be released"",
      "url": "/main/html/rd/p/000000174.000006099.html",


2022-09-30 16:54

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.