I'm a beginner for about 3 days for python programming.
This time, I am trying to do web scraping at Python Selenium.
If you have any questions, I am sorry.
next_page_url_all → Get the URL of the that appears when you click Next Page.(List)
I'd like to patrol the URLs contained in this next_page_url_all and get the link URL to the detail page listed in each page.
That's the goal.
Patrol with 1st URL → 2nd URL → 3rd URL...
I have attached the URL obtained by detail_url.get_attribute("href") to detail_url_all.
After execution, if you look at the contents of detail_url_all, it contains only the URL obtained on the last page.
While executing page 1 → page 2 → page 3,
I feel like it's been overwritten every time.
Is there a way to avoid this?
I tried extend, but the result was the same.
If you look at the contents of the list, only the results from the last page are included.
The extend results were similar with just one character separation.
Append Results:
['http://example.jp', 'http://example.jp']
extend results:
['h', 't', 't', 'p', ':', '/', 'e', 'x', 'a', 'm', 'p', 'l', 'e', 'e', '.'', 'j', 'p', 'h', 't', 't', 'p', 'p', ':', '/', 'e', 'x', 'a', 'm', 'p', 'e', 'p', 'p', 'p', 'e', 'e', 'e', 'e', 'x', 'm', 'm', 'e', 'e', 'e', 'e', 'e', 'e', 'e',
detail_url_all = [ ]
for urlin next_page_url_all:
driver.get(url)
detail_url = driver.find_element_by_class_name('class1').find_element_by_class_name('class2').find_element_by_tag_name('a')
detail_url_all.append(detail_url.get_attribute("href"))
I can't say for sure because there is no target url or execution result, but I think YuuG's find_element_by_class_name
method can only get one element in the page, which seems to be overwritten.
Instead, you may use the find_elements_by_class_name
or find_elements_by_xpath
methods.See the site here.
© 2024 OneMinuteCode. All rights reserved.