I'm in trouble because I don't know how to deal with this error.
Please let me know.
[Error]
0% | | 0/2 [00:00<?,?it/s] 0
0% | | 0/2 [00:06<?, ?it/s]
---------------------------------------------------------------------------
LookupError Traceback (most recent call last)
/usr/local/lib/python 3.7/dist-packages/sumy/nlp/tokenizers.py in_get_sentence_tokenizer(self,language)
126 path=to_string("tokenizers/punkt/%s.pickle")%to_string(language)
-->127 return nltk.data.load (path)
128 except (LookupError, zipfile.BadZipfile) as:
5 frames
LookupError:
**********************************************************************
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>import nltk
>>>nltk.download('punkt')
Searched in:
- '/root/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/usr/nltk_data'
- '/usr/lib/nltk_data'
- ''
**********************************************************************
During handling of the above exception, another exception occurred:
LookupError Traceback (most recent call last)
/usr/local/lib/python 3.7/dist-packages/sumy/nlp/tokenizers.py in_get_sentence_tokenizer(self,language)
130 "NLTK tokenizers are missing or the language is not supported.\n"
131""Download them by following command: python-c" import nltk;nltk.download('punkt')"\n""
-->132 "Original error was:\n" + str(e)
133 )
134
LookupError: NLTK tokenizers are missing or the language is not supported.
Download them by following command: python-c "import nltk;nltk.download('punkt')"
Original error was:
**********************************************************************
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>import nltk
>>>nltk.download('punkt')
Searched in:
- '/root/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/usr/nltk_data'
- '/usr/lib/nltk_data'
- ''
**********************************************************************
[Code]
#@title
import requests
import json
import csv
import pytz
import datetime
import tqdm
import numpy as np
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from time import sleep
from google.colab import files
##############
### DEFINE###
##############
#KEYWORD = "Welfare"
# SLEEP_TIME = 0.5
url='https://api.jgrants-portal.go.jp/exp/v1/public/subsidies?keyword='+keyword+'&sort=created_date&order=DESC&acceptance=1'
req = requests.get(url)
reqJSON=json.loads(req.text)
# loop of result data
csvList = [ ]
ID = [ ]
for intqdm.tqdm(range(len(reqJSON["result")))):
# tqdm
np.pi*np.pi
if i%1e6 == 0:
print(i)
# Add header row to csv write list
resultData=reqJSON["result"][i]
csvRow = [ ]
csvRow.append(resultData["title"])
csvRow.append(resultData["id"])
csvRow.append (resultData["acceptance_start_datetime")
csvRow.append(resultData["acceptance_end_datetime"])
csvRow.append(resultData["subsidy_max_limit"])
csvRow.append(resultData["target_area_search"])
csvRow.append(resultData["target_number_of_employees")
# selenium, definition of Chromedriver
options=webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
wd=webdriver.Chrome('chromedriver', options=options)
# Open HP and Scrap Based on ID List Stored in ID
tURL="https://www.jgrants-portal.go.jp/subsidy/"+resultData["id"]
wd.get(tURL)
sleep(sleep_time)
detail=wd.find_element(by=By.TAG_NAME, value="table").text
# Load Packages
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
# For Strings
parser=PlaintextParser.from_string(detail, Tokenizer("english"))
from sumy.summarizers.text_rank import TextRankSummarizer
# Summarize using sumy TextRank
summerizer=TextRankSummerizer()
summary=summarizer_4(parser.document, 2)
text_summary=""
For presence in summary:
text_summary+=str(sentence)
print(text_summary)
# Add details to csv write list
csvRow.append(detail)
csvRow.append(text_summary)
# Add tURL
csvRow.append(tURL)
# Save one row as one element in an array
csvList.append(csvRow)
wd.close()
# Creating a New CSV File
csv_date = datetime.datetime.now(pytz.timezone('Asia/Tokyo')).strftime("%Y%m%d")
csv_file_name=(keyword+"jGrants"+csv_date+".csv")
f=open(csv_file_name, "w", encoding="Shift-jis", errors="ignore")# For windows encoding=Shift-js
# Writing to a csv file
writer=csv.writer(f, lineterminator="\n")
csv_header=["Title", "ID", "Start Date", "End Date", "Amount Limit", "Region Coverage", "Number of Employees Coverage", "Details", "Summary", "URL" ]
writer.writerow(csv_header)
for csvData incsvList:
writer.writerow (csvData)
f.close()
# csv file output
files.download(csv_file_name)
As @payaneco commented, running python-c "import nltk;nltk.download()"
described in Create NLTK environment does not seem to cause any question errors.
Verified in Windows 10 environment
By the way, since it's Windows 10, from google.colab import files
and files.download(csv_file_name)
have commented out.
Also, the following two lines of comments that were commented out had to be removed and the variable name had to be lowercase.Click here:
#KEYWORD="Welfare"
# SLEEP_TIME = 0.5
I changed it to this one.
keyword="welfare"
sleep_time = 0.5
In addition, summarizer_4
in the line below, which is extracted below, seems to be a transcription error or something, and if there is no such name, it worked as summarizer
.
summarizer=TextRankSummarizer()
summary=summarizer_4(parser.document, 2)
Also, it seems that the package sumy
that I import in the middle of the program will not work after Python 3.8 due to the function of the package that I import internally ( from collections import sequence
.(At least Python 3.10 didn't work.Before that, it might be just a warning.)
© 2025 OneMinuteCode. All rights reserved.