How to Scrap Each Site from a URL List

Asked 1 years ago, Updated 1 years ago, 73 views

You have the code to create a URL list by scraping sites, and you have the code to scrape only the attributes of individual sites.

The URL list shows the URLs of individual sites, but how do I connect to the URLs in the URL list in order and scrape them?Please lend me your wisdom.Thank you for your cooperation.

·Code to obtain URL

import requests, bs4
import codecs
import re

res=requests.get('https://****')
res.raise_for_status()

soup=bs4.BeautifulSoup(res.text, "html.parser")

elems=soup.select('.threadUrlInMetrics')

for elemines:
 with open("abcd.txt", "w") as f:
    print(elems, file=f) 
    file=r'abcd.txt'
with open(file) asf:  
    url_list=f.read()
pattern="https?://[\w/:%#\$&\?\(\)~\.=\+\-]+"
text=url_list
url_list=re.findall(pattern, text)
print(url_list)

·Code for scraping individual sites

import requests, bs4
res=requests.get('https://***')
res.raise_for_status()
soup=bs4.BeautifulSoup(res.text, "html.parser")
elements=soup.select('.container')
for elemines:
    print(em)

python python3 web-scraping

2022-09-30 13:47

1 Answers

As @kunif commented, wouldn't it be okay to use the for loop for url_list?

#Code to get the URL
# ...
url_list=re.findall(pattern, text)


# Code for scraping individual sites
import requests, bs4

for urlin url_list:
    res=requests.get(url)
    res.raise_for_status()
    soup=bs4.BeautifulSoup(res.text, "html.parser")
    elements=soup.select('.container')
    for elemines:
        print(em)


2022-09-30 13:47

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.