I was able to automate Chrome with Ubuntu+Python+Selenium and click the Save button to save the file locally, but this file name cannot be controlled manually and the site from which I got it becomes the appropriate name based on the search word.
If Python retrieves the filename of this file that has just been saved, which method would be smart?
I searched the directory, got a list that matches the file name in the regular expression, and thought that it was the newest one, but I think there is an example, so please let me know if there is a better way.
Ubuntu + Python + Selenium
This is not a prerequisite for your question, but if Puppeter instead of Selenium, you can specify the path and filename to download directly.
→ Save with any path and name when downloading files in papeter|AquaWare tweet blog
The above article is an example of node.js+Puppeteer, but using pyppeteer results in Python as follows:
import asyncio
# Specify Chromeimum Revision for Fetch.enable
os.environ ['PYPPETER_CHROMIUM_REVISION'] = '884014'
import pyppeteer
async def main(file_name,headless=True,wait_time=5.0):
b = wait pyppeteer.launch({'headless':headless})
p = wait b.newPage()
wait p.goto('https://github.com/pyppeteer/pyppeteer')
e = wait p.querySelector('get-repo')
wait e.click()
client=wait p.target.createCDPSession()
wait client.send('Page.setDownloadBehavior', {'behavior':'allow', 'downloadPath':os.getcwd()})
wait client.send('Fetch.enable', {'patterns':[{'urlPattern':'*', 'requestStage':'Response'}] })
async defaultRequestPaused (requestEvent):
responseHeaders = [ v for v in requestEvent [ 'responseHeaders' ] if v [ 'name' ]! = 'content-disposition' ]
requestId = requestEvent ['requestId']
if requestEvent ['responseStatusCode'] == 200:
responseHeaders.append({'name':'content-disposition', 'value':f' attachment; filename="{file_name}"}})
response=wait client.send('Fetch.getResponseBody', {'requestId':requestId})
wait client.send('Fetch.fullRequest', {'requestId':requestId', 'responseCode': 200, 'responseHeaders':responseHeaders', 'body':response['body'] })
wait client.send('Fetch.continueRequest', {'requestId':requestId});
client.on('Fetch.requestPaused', lambdae:asyncio.ensure_future(onRequestPaused(e), loop=event_loop))
# Click the "Download ZIP" button on Github
e = wait p.querySelector('a[href$=".zip"]')
wait e.click()
wait asyncio.sleep(wait_time)
wait client.send('Fetch.disable')
wait b.close()
event_loop.run_until_complete(main(file_name='specified_name.zip', headless=False))
Verified with Ubuntu on Windows and WSL.
Since the original pyppeteer is node.js, it is basically an asynchronous API, so if you are familiar with Selenium's synchronous API, it will be difficult to handle.
For your reference.
We resolved the issue by referring to the following article you mentioned in the comment.
Python Selenium Dynamic Download Completion Waiting
In fact, besides the subject matter, there was a problem that I couldn't relate to the end of the download, but I am happy that it was resolved at the same time.
I'll put up my sauce.
Due to confidentiality obligations, the following conditions apply:
sitesKey points of the program are as follows:
import glob
import time
from selenium import webdriver
from selenium.webdriver.common.by import By# for find_element_by_ID
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.common.keys import Keys
wait = Maximum number of seconds for WebDriverWait(driver,10)#wait.until
driver.get('https://foo.example.com/')#search site
def search_and_save(search_term):
wait.until(expected_conditions.visibility_of_element_located((By.ID, "search_term_id"))))# Wait for the search box to appear
element.send_keys (Keys.DELETE)
element.send_keys(search_term)#Submit by entering the search word
new_file='ERROR'+search_term# filename to download, first with error name
for i in range(30): # Wait MAX 30 seconds
download_fileName=look for globe.glob(f'foo_search_*.txt.crdownload')#crdownload
if download_fileName: # If there is a crdownload
replace new_file=download_fileName[0].replace('.crdownload', '')#crdownload removed and new_file replaced
time.sleep(1)# Wait 1 second
else —#crdownload is missing.The download is over or hasn't started in the first place.
print('new_file:'+new_file)# If the download is successful, the file name will be included, if not already downloaded.
© 2024 OneMinuteCode. All rights reserved.