I'm asking you a question because I can't solve it even if I google it and read the official document. The problem is as below.
After parsing, I want to output only one link address corresponding to href from the output result below.
// Code statement.
site = requests.get("http://www.alba.co.kr/")
alba = BeautifulSoup(site.text, 'html.parser')
brands = list(alba.find(id = "MainSuperBrand").find('ul', {"class" : "goodsBox"}). find_all('a', {"class" : "goodsBox-info"}))
for b in brands :
if "http" in b :
`b = b.select('a.href')
print(b)
Attempt to extract the href element of the first tag from the parsed output statement.
[
<li class="first impact"><div class="B_MyAd_"></div>
<a class="goodsBox-info" href="http://barogo.alba.co.kr/">*
<span class="logo"> <imgalt="(Note)"src="//imagelogo.alba.kr/data_image2/logo/brand/
20200916174910805.gif"/> </span> <span class="company"> Barogo</span> <span class="title">"<span> Barogo Recruitment <National Riders</span>> < < < < < <<<<<<<<<<<<&n></span> </span> </a>
<a class="brandHover" href="http://barogo.alba.co.kr/" </a></li>, . ,,,,,,.
List statement. ]
li There are two hrefs in the tag <a>
below the class, and in this case, how can only one be output?
I wonder if you can.
Check the format of the return.
a = soup.find_all('a')
print(a)
Add content
You can't? Is it really not working? Aren't you doing it the wrong way, not the way I explained earlier?
Aaa is returned to the list, but how did the result of the result set come out?
a = requests.get("http://www.alba.co.kr/")
aa = BeautifulSoup(a.text, 'html.parser')
aaa = list(aa.find(id = "MainSuperBrand").find('ul', {"class" : "goodsBox"}).find_all('a', {"class" : "goodsBox-info"}))
for aaaa in aaa :
print()
print(aaaa['href'])
If it's Python Beautiful Soup... There's also a function called find Look it up
The code is as follows:
site = requests.get("http://www.alba.co.kr/")
alba = BeautifulSoup(site.text, 'html.parser')
brands = list(alba.find(id = "MainSuperBrand").find('ul', {"class" : "goodsBox"}).find_all('a', {"class" : "goodsBox-info"}))
for b in brands :
if "http" in b :
b = b.select('a.href')
print(b)
Parsed html to be extracted.
<a class="goodsBox-info" href="http://dadam.alba.co.kr/">
<span class="logo"> <imgalt="Three Great Pigs' Feet" src="//image-logo.alba.kr/data_image2/logo/brand/20211125132816761.gif"/> </span>
<span class="company">Three major pigs' feet</span>
<span class="title">
<span>Recruitment of employees and part-timers nationwide</span></span>
<span class="wrap">
<span class="local">National</span>
<span class="pay"><span class="pay Letter">Check by announcement</span>
<span class="payIcon talk"></span></span> </span> </a>
I also tried it with the revised content, The link address is printed normally Like the image, the tag text was printed in duplicate.
So I posted a question because I thought I should do something more within the tag so that I don't get duplicate content.
I don't know why the tag is duplicated on the link address.
592 GDB gets version error when attempting to debug with the Presense SDK (IDE)
567 Understanding How to Configure Google API Key
593 Uncaught (inpromise) Error on Electron: An object could not be cloned
865 When building Fast API+Uvicorn environment with PyInstaller, console=False results in an error
© 2024 OneMinuteCode. All rights reserved.