I'd like to get elements from multiple table tags that are not waved class,id by GAS (using jQuery).

Asked 1 years ago, Updated 1 years ago, 42 views

I would like to obtain (scrap) the total number of people infected with corona in each city (Yokohama City, Yamakita Town, etc.) from this corona information.
https://www.pref.kanagawa.jp/docs/ga4/covid19/occurrence.html

[Environment]
GoogleAppsScript
[Usage library]
Cheerio

I would like to use Cheerio library in GAS, specify elements in jQuery, and do scraping to retrieve data.
The table tag on the site does not have a class name, id, etc., so we obtain it by specifying the number of elements.

The current code is as follows:

// URL to start scraping
  consturl="https://www.pref.kanagawa.jp/docs/ga4/covid19/occurrence.html";
 
  // Retrieved from UrlFetchApp (specifies UTF-8)
  consttml=UrlFetchApp.fetch(url).getContentText('UTF-8');

  // feed Cheerio the return price
  let$ = Cheerio.load(html);

  // Add id:area01 to the table you want to specify
  $('table').eq(3).attr('id', 'area01')

  const hiratsuka_data=$('#area01tr:nth-child(6)td:nth-child(5)').text();

Give an id to the table you want to specify by table number, and based on it,
It's a very aggressive way to get it with the nth-child number.

I can't write any more code because I don't know much about jQuery. Is there a smarter way to specify elements?

If anyone understands, please let me know.
Thank you for your cooperation.

javascript google-apps-script

2022-09-30 19:51

1 Answers

Looking at the table of the specified URLs, it seemed a little complicated to parse HTML data itself because there were merged columns and the number of columns in each row was different.Therefore, I would like to suggest how to parse HTML data without using Cheerio and obtain data.In this case, parse the HTML table using the Sheets API.I think the HTML table perspective of the Sheets API is very powerful, so I suggested this.

The script flows as follows:

This flow is reflected in the script as follows:

Sample Script

This script uses the Sheets API to parse HTML tables, so enable the Sheets API in Advanced Google services. Ref

function myFunction(){
  consturl="https://www.pref.kanagawa.jp/docs/ga4/covid19/occurrence.html";

  // 1. Obtaining HTML data
  consttml=UrlFetchApp.fetch(url).getContentText();

  // 2. Obtain each table from HTML data
  const tables=[...html.matchAll(/<table[\s\S\w]+?<\/table>/g)];

  // 3. Create a spreadsheet as a temporary file
  consts = SpreadsheetApp.create("temp");
  const sheet=ss.getSheets()[0];
  const spreadsheetId=ss.getId();
  const sheetId = sheet.getSheetId();

  // 4. Parse HTML tables using the Sheets API and insert them into each cell of the spreadsheet
  Sheets.Spreadsheets.batchUpdate({
    requests: [
      { pasteData: {data:tables[3][0], html:true, coordinate:{sheetId:sheetId}},
      { deleteDimension: {range:{sheetId:sheetId, startIndex:0, endIndex:4, dimension:"ROWS"}},
      { deleteDimension: {range:{sheetId:sheetId, startIndex:0, endIndex:1, dimension:"COLUMNS"}}}
    ]
  }, spreadsheetId);

  // 5. Retrieve data from spreadsheet and convert to JSON object
  const values=sheet.getDataRange().getValues();
  constres=values[0].map(_,c)=>values.map(r=>r[c])) .reduce(o,r)=>{
    while(r.length>0){
      const[a,b] = r.splice(0,2)
      if(a)o[a]=b||0;
    }
    return;
  }, {});

  // 6. Delete Temporary Files
  DriveApp.getFileById(spreadsheetId).setTrashed(true);

  // 7. View Results
  console.log(res)
}

Results

This script allows you to check the Log for the following results:

{
   Yokohama City: 903,
   Fujisawa City: 540,
   Atsugi City: 4003,
   Ayase City: 1517,
   "Oi-cho" : 202,
   Yugawara-cho: 206,
   Kawasaki City: 41,
   Odawara City: 2177,
   "Yamato City" : 4381.
   Hayama-cho: 360,
   "Matsuda-cho" : 80,
   "Aikawa-cho" : 778,
   Sagamihara City: 418,
   Chigasaki City: 81,
   Isehara City: 1082,
   "Kanagawa-cho" : 65,
   "Sanbuk-cho" : 57,
   Cheongcheon Village: 10,
   Yokosuka City: 74,
   Zushi City: 686,
   Ebina City: 1983,
   "Oiso-cho" : 288,
   Kaisei Town: 220.
   Others: 177,
   Hiratsuka City: 3126,
   Miura City: 424,
   "Zama City" : 1983,
   Ninomiya Town: 216,
   "Hakone-cho"
   Kamakura City: 2051,
   Hagino City: 1588,
   Minami Ashigara City: 475,
   Nakai Town: 137,
   "Manazuru-cho" : 47
}

In this case, if you want to see the Yokohama City value, you can get 903 through console.log(res["Yokohama City").

Reference


2022-09-30 19:51

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.