Python with poor deep scraping

Asked 2 years ago, Updated 2 years ago, 89 views

https://www.nikkei.com/nkd/company/?scode=3911

on this page

I would like to get 4 of them well, but I can easily get the current value alone, but I can't get 2 to 4.
Please tell me a good way.I look forward to your kind cooperation.

<ul class="m-stockInfo_detail_list">
    <li>
        <span class="m-stockInfo_detail_title">Start (9:03)</span>
        <span class="m-stockInfo_detail_value">422<span class="m-stockInfo_detail_unit">yen</span>>
    </li>
    <li>
        <span class="m-stockInfo_detail_title">High (9:03)</span>
        <span class="m-stockInfo_detail_value">422<span class="m-stockInfo_detail_unit">yen</span>>
    </li>
    <li>
        <span class="m-stockInfo_detail_title">low (14:05)</span>
        <span class="m-stockInfo_detail_value">399<span class="m-stockInfo_detail_unit">yen</span>>
    </li>
</ul>

Modules in use

import urllib2 from bs4
import BeautifulSoup

python python3 web-scraping

2022-09-30 21:33

2 Answers

New Answer (Verified)

Instead of using nth-child, it seems that you can get several close elements as a list and then treat each item well.

import requests
from bs4 import BeautifulSoup
url='https://www.nikkei.com/nkd/company/?scode=3911'
html=requests.get(url)
soup = BeautifulSoup(html.text, "html.parser")
detail_values=soup.select("#JSID_stockInfo>div.m-stockInfo_top>div.m-stockInfo_top_left>div.m-stockInfo_detail.m-stockInfo_detail_01>div.m-stockInfo_detail_left>dot>dit.m-stockInfo_detail>detail>dit;
for dvin detail_values:
    print(dv.text)

About Previous Answers

After confirming that it could be retrieved on the browser, I thought it was definitely normal, but in fact, BeautifulSoup did not implement the pseudo-class (because nth-child was not implemented), and li:nth-child(1) received the following error:

NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type.

Please accept my sincere apologies.


2022-09-30 21:33

Old Answers

The following CSS selectors were able to retrieve the values for each element:With the relatively new BeautifulSoup, you should be able to use the CSS selector as it is, so you should be able to obtain it using the following:

#JSID_stockInfo>div.m-stockInfo_top>div.m-stockInfo_top_left>div.m-stockInfo_detail.m-stockInfo_detail_01>div.m-stockInfo_detail_left>ul>gt>dil;1>dit.

Changing the value of li:nth-child along the way also allows you to get the starting, high, and low values.


2022-09-30 21:33

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.