I would like to obtain (scrap) the total number of people infected with corona in each city (Yokohama City, Yamakita Town, etc.) from this corona information.
https://www.pref.kanagawa.jp/docs/ga4/covid19/occurrence.html
[Environment]
GoogleAppsScript
[Usage library]
Cheerio
I would like to use Cheerio library in GAS, specify elements in jQuery, and do scraping to retrieve data.
The table tag on the site does not have a class name, id, etc., so we obtain it by specifying the number of elements.
The current code is as follows:
// URL to start scraping
consturl="https://www.pref.kanagawa.jp/docs/ga4/covid19/occurrence.html";
// Retrieved from UrlFetchApp (specifies UTF-8)
consttml=UrlFetchApp.fetch(url).getContentText('UTF-8');
// feed Cheerio the return price
let$ = Cheerio.load(html);
// Add id:area01 to the table you want to specify
$('table').eq(3).attr('id', 'area01')
const hiratsuka_data=$('#area01tr:nth-child(6)td:nth-child(5)').text();
Give an id to the table you want to specify by table number, and based on it,
It's a very aggressive way to get it with the nth-child number.
I can't write any more code because I don't know much about jQuery. Is there a smarter way to specify elements?
If anyone understands, please let me know.
Thank you for your cooperation.
Looking at the table of the specified URLs, it seemed a little complicated to parse HTML data itself because there were merged columns and the number of columns in each row was different.Therefore, I would like to suggest how to parse HTML data without using Cheerio and obtain data.In this case, parse the HTML table using the Sheets API.I think the HTML table perspective of the Sheets API is very powerful, so I suggested this.
The script flows as follows:
This flow is reflected in the script as follows:
This script uses the Sheets API to parse HTML tables, so enable the Sheets API in Advanced Google services. Ref
function myFunction(){
consturl="https://www.pref.kanagawa.jp/docs/ga4/covid19/occurrence.html";
// 1. Obtaining HTML data
consttml=UrlFetchApp.fetch(url).getContentText();
// 2. Obtain each table from HTML data
const tables=[...html.matchAll(/<table[\s\S\w]+?<\/table>/g)];
// 3. Create a spreadsheet as a temporary file
consts = SpreadsheetApp.create("temp");
const sheet=ss.getSheets()[0];
const spreadsheetId=ss.getId();
const sheetId = sheet.getSheetId();
// 4. Parse HTML tables using the Sheets API and insert them into each cell of the spreadsheet
Sheets.Spreadsheets.batchUpdate({
requests: [
{ pasteData: {data:tables[3][0], html:true, coordinate:{sheetId:sheetId}},
{ deleteDimension: {range:{sheetId:sheetId, startIndex:0, endIndex:4, dimension:"ROWS"}},
{ deleteDimension: {range:{sheetId:sheetId, startIndex:0, endIndex:1, dimension:"COLUMNS"}}}
]
}, spreadsheetId);
// 5. Retrieve data from spreadsheet and convert to JSON object
const values=sheet.getDataRange().getValues();
constres=values[0].map(_,c)=>values.map(r=>r[c])) .reduce(o,r)=>{
while(r.length>0){
const[a,b] = r.splice(0,2)
if(a)o[a]=b||0;
}
return;
}, {});
// 6. Delete Temporary Files
DriveApp.getFileById(spreadsheetId).setTrashed(true);
// 7. View Results
console.log(res)
}
This script allows you to check the Log for the following results:
{
Yokohama City: 903,
Fujisawa City: 540,
Atsugi City: 4003,
Ayase City: 1517,
"Oi-cho" : 202,
Yugawara-cho: 206,
Kawasaki City: 41,
Odawara City: 2177,
"Yamato City" : 4381.
Hayama-cho: 360,
"Matsuda-cho" : 80,
"Aikawa-cho" : 778,
Sagamihara City: 418,
Chigasaki City: 81,
Isehara City: 1082,
"Kanagawa-cho" : 65,
"Sanbuk-cho" : 57,
Cheongcheon Village: 10,
Yokosuka City: 74,
Zushi City: 686,
Ebina City: 1983,
"Oiso-cho" : 288,
Kaisei Town: 220.
Others: 177,
Hiratsuka City: 3126,
Miura City: 424,
"Zama City" : 1983,
Ninomiya Town: 216,
"Hakone-cho"
Kamakura City: 2051,
Hagino City: 1588,
Minami Ashigara City: 475,
Nakai Town: 137,
"Manazuru-cho" : 47
}
In this case, if you want to see the Yokohama City value, you can get 903
through console.log(res["Yokohama City")
.
© 2024 OneMinuteCode. All rights reserved.