You are trying to get text information from a page by scraping.
There is a "View More" button in the middle of the page, and I would like to get all the information that follows.
I'm in trouble because I only got information before "See More".
】Additional comments br
クエリパラメータで表示件数を調整できる場合もあることを、I found out from another question, but can I use the query parameters on this page as well?I'm sorry if you don't know much about the web and you're not considering it correctly.
url='https://prtimes.jp/main/action.php?run=html&page=searchkey&search_word=%E3%81%8A%E3%81%AB%E3%81%8E%E3%82%8A&search_pattern=1'
res=requests.get(url)
soup=bs4.BeautifulSoup(res.text, features='lxml')
rvws = group.find_all (class_="link-title-item link-title-item-ordinary")
reviews_text=[ ]
for i in range (len(rvws)) :
reviews_text.append(rvws[i].text)
reviews_text
Can I use the query parameters on this page as well?
Immediately after clicking "See More", I looked at the Firefox web console and found:
The query parameters will be page
and limit
. I'm not sure if the parameter random
is a literal random number, but if you set it appropriately, you'll get JSON format data back.However, since the response data is constructed as addReleaseList(JSON format data)
, take it out as follows (example command line execution):
##page=1,limit=20:first 20 pieces of data
$ curl-s'https://prtimes.jp/api/search_release.php?callback=addReleaseList&type=search&page=1&v=%E3%81%8A%E3%81%AB%E3%81%8E%E3%82%8A&limit=20&random=1574862397013' |
grep-Po'\AaddReleaseList\(\K.+(?=\)\Z)'|jq-r.|head-n 10
{
"status": {
"code"—200
},
"type": "search",
"articles": [
{
"id": 174,
Title: "SALON GINZA SABOU ""Ultimate Onigiri"" made from the best rice in Japan will be released"",
"url": "/main/html/rd/p/000000174.000006099.html",
© 2024 OneMinuteCode. All rights reserved.