Questions related to finding hidden data in Python data frames.

Asked 2 years ago, Updated 2 years ago, 48 views

import pandas as pd
import requests
from bs4 import BeautifulSoup
from datetime import datetime

# Code Settings
code = '005930'

# Calculating risk-free interest rates
ke_url = "https://www.kisrating.com/ratingsStatistics/statics_spread.do"
ke_res = requests.get(ke_url)
ke_df = pd.read_html(ke_res.text)
ke = float(ke_df[0].iloc[10, 8])


# Import Data Area: Import only snapshots and financial statements
snapshot_url = "http://comp.fnguide.com/SVO2/ASP/SVD_Main.asp?pGB=1&gicode=A{}&cID=&MenuYn=Y&ReportGB=&NewMenuID=101&stkGb=701".format(code)
snapshot_res = requests.get(snapshot_url)
snapshot_df = pd.read_html(snapshot_res.text)

fs_rpt_url = "http://comp.fnguide.com/SVO2/ASP/SVD_Finance.asp?pGB=1&gicode=A{}&cID=&MenuYn=Y&ReportGB=&NewMenuID=103&stkGb=701".format(code)
fs_rpt_res = requests.get(fs_rpt_url)
fs_rpt_df = pd.read_html(fs_rpt_res.text)

# Changing the NAN value to 0 in this data
for i in range(len(snapshot_df)):
    # # snapshot_df[i] = df[i].apply(pd.to_numerical, errors = 'cource') # This is very important. If you apply this, everything in letters becomes a Nan value.
    snapshot_df[i].fillna(0,inplace=True)

for i in range(len(fs_rpt_df)):
    # # fs_rpt_df[i] = df[i].apply(pd.to_numerical, errors = 'cource') # This is very important. If you apply this, everything in letters becomes a Nan value.
    fs_rpt_df[i].fillna(0,inplace=True)

# fs_rpt_df[0] : Consolidated income statement (annual)
# fs_rpt_df[1]: Consolidated income statement (quarter)
# fs_rpt_df[2]: Consolidated income statement
print(fs_rpt_df[0].iloc[3, 0])

Hello, I have a question for Python beginners.

I am posting because there is hidden data in the process of scraping the data.

If you run the code, "Open accounts that participated in calculating sales and administrative expenses." The item appears.

On the actual web page,

It's marked "Sales and administrative expenses +".

I need the data that comes out when I press the + button, how can I bring this?

I'd appreciate it if you could give me an answer.

And thank you to all the people who always reply ^^

dataframe python

2022-09-20 18:05

1 Answers

It seems like a lucky case, When you receive the corresponding page source itself, it looks like .

<tr id="p_grid1_4" class=" rwf acd_dep_start_close ">
    <th scope="row" class="l clf">
        <div class="><span class="txt_acd"> Sales and administrative expenses</span><aid="grid1_4" href="javascript:foldOpen"('grid1_4');"bspan_4"> account that participated in the calculation
    </th>
    <td class="r">566,397</td>
    <td class="r">524,903</td>
    <td class="r">553,928</td>
    <td class="r">416,252</td>
    <td class="r">412,229</td>
    <td class="r cle">1.0</td>
</tr>
<tr  class="c_grid1_4 rwf acd_dep2_sub" style="display:none;">
    <th scope="row" class="lclf">&nbsp;&nbsp;Working expenses</th>
    <td class="r">67,972</td>
    <td class="r">64,514</td>
    <td class="r">64,226</td>
    <td class="r">51,701</td>
    <td class="r">49,271</td>
    <td class="r cle">4.9</td>
</tr>

The row in question exists in the DOM, but the inline CSS shows that it is simply hidden. What javascript:foldOpen('grid1_4') does is so clear.

I think you can just find all the tr that has rwf classes and parse while touring.
You can branch out the processing with what the text content of the first th is.


2022-09-20 18:05

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.