CSV output of all the results of scraping multiple-page tables

Asked 2 years ago, Updated 2 years ago, 20 views

I am able to scrap a multi-page table, but I am having trouble because I cannot output all of the results to CSV.
In the code we tried, only the table obtained from the second URL is printed as a CSV.
Please tell me how to output all tables from all URLs to the CSV.

import csv
import urllib.request
from bs4 import BeautifulSoup
import pandas aspd
urls=['http://honya.univ.coop/ranking_lst.php?rankingcd=001',
'http://honya.univ.coop/ranking_lst.php?rankingcd=002',
]

For urlin urls:
  html=urllib.request.urlopen(url)
  bsObj=BeautifulSoup(html, "html.parser")
  table=bsObj.findAll("table", {"class": "rankingTable"})[0]
  tables = table.findAll("tr")
  print(tables)

  with open("newbooks.csv", "w", encoding='utf-8') as file:
    writer=csv.writer(file)
    For row intables:
        csvRow = [ ]
        for cell in row.findAll (['td', 'th']
            csvRow.append(cell.get_text())
        writer.writerow (csvRow)

pd.read_csv("newbooks.csv")

python

2022-09-30 14:18

2 Answers

The current code specifies overwrite mode w when opening the file in open, so I think the loop action resets every time you open the file, and only the last result is output.

Try additional modea when you open the file as follows:

open("newbooks.csv", "a", encoding='utf-8')


2022-09-30 14:18

Pandas has a method called pandas.read_html, so you can use it.

import pandas as pd

urls = [
  'http://honya.univ.coop/ranking_lst.php?rankingcd=001',
  'http://honya.univ.coop/ranking_lst.php?rankingcd=002',
]

df = pd.concat(
  [pd.concat(pd.read_html(url,header=0,attrs={'class':'rankingTable'},axis=0)
   for urlin urls],
  axis=0)
df.to_csv('newbooks.csv', encoding='utf-8', index=False)

newbooks.csv

 Ranking, Book Name, Author, Publisher, Unit Price, Release Month, ISBN
1. Public Rebellion, Jose Ortega Lee Gasset, Iwami Bookstore, 1,070 Yen, April 2020, 9784003423110
2, Trapedium, Kazumi Takayama, KADOKAWA, 680 yen, April 2020, 9784041026441
3. Philosophy of study, Masaya Chiba, Bungei Spring Autumn, 700 Yen, March 2020, 9784167914639
                       :

28, Wabasuke no Haku, Yasuhide Saeki, Bungei Spring Autumn, 730 yen, May 2020, 9784167914943
29. Thinking about the history of the dialogue book, Ryotaro Shima, Shunaki Bungei, 670 yen, May 2020, 9784167914998
30, Marugoto no Tsuzui-san, Tsuzui, Bungei Spring Autumn, 900 Yen, May 2020, 9784167915001
1. What has education evaluated? Yuki Honda, Iwami Bookstore, 840 yen, March 2020, 9784004318293
2. There are no real infectious diseases, Kentaro Iwata, Shueisha International, 980 yen, April 2020, 9784797680522
3,5G, Hiroyuki Morikawa, Iwami Bookstore, 860 yen, April 2020, 9784004318316
                       :


2022-09-30 14:18

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.