I'm asking you a question because I can't solve it even if I google it and read the official document. The problem is as below.
After parsing, I want to output only one link address corresponding to href from the output result below.
// Code statement.
site = requests.get("http://www.alba.co.kr/")
alba = BeautifulSoup(site.text, 'html.parser')
brands = list(alba.find(id = "MainSuperBrand").find('ul', {"class" : "goodsBox"}). find_all('a', {"class" : "goodsBox-info"}))
for b in brands :
if "http" in b :
`b = b.select('a.href')
print(b)
Attempt to extract the href element of the first tag from the parsed output statement.
[
<li class="first impact"><div class="B_MyAd_"></div>
<a class="goodsBox-info" href="http://barogo.alba.co.kr/">*
<span class="logo"> <imgalt="(Note)"src="//imagelogo.alba.kr/data_image2/logo/brand/
20200916174910805.gif"/> </span> <span class="company"> Barogo</span> <span class="title">"<span> Barogo Recruitment <National Riders</span>> < < < < < <<<<<<<<<<<<&n></span> </span> </a>
<a class="brandHover" href="http://barogo.alba.co.kr/" </a></li>, . ,,,,,,.
List statement. ]
li There are two hrefs in the tag <a>
below the class, and in this case, how can only one be output?
I wonder if you can.
Check the format of the return.
a = soup.find_all('a')
print(a)
Add content
You can't? Is it really not working? Aren't you doing it the wrong way, not the way I explained earlier?
Aaa is returned to the list, but how did the result of the result set come out?
a = requests.get("http://www.alba.co.kr/")
aa = BeautifulSoup(a.text, 'html.parser')
aaa = list(aa.find(id = "MainSuperBrand").find('ul', {"class" : "goodsBox"}).find_all('a', {"class" : "goodsBox-info"}))
for aaaa in aaa :
print()
print(aaaa['href'])
If it's Python Beautiful Soup... There's also a function called find Look it up
The code is as follows:
site = requests.get("http://www.alba.co.kr/")
alba = BeautifulSoup(site.text, 'html.parser')
brands = list(alba.find(id = "MainSuperBrand").find('ul', {"class" : "goodsBox"}).find_all('a', {"class" : "goodsBox-info"}))
for b in brands :
if "http" in b :
b = b.select('a.href')
print(b)
Parsed html to be extracted.
<a class="goodsBox-info" href="http://dadam.alba.co.kr/">
<span class="logo"> <imgalt="Three Great Pigs' Feet" src="//image-logo.alba.kr/data_image2/logo/brand/20211125132816761.gif"/> </span>
<span class="company">Three major pigs' feet</span>
<span class="title">
<span>Recruitment of employees and part-timers nationwide</span></span>
<span class="wrap">
<span class="local">National</span>
<span class="pay"><span class="pay Letter">Check by announcement</span>
<span class="payIcon talk"></span></span> </span> </a>
I also tried it with the revised content, The link address is printed normally Like the image, the tag text was printed in duplicate.
So I posted a question because I thought I should do something more within the tag so that I don't get duplicate content.
I don't know why the tag is duplicated on the link address.
© 2024 OneMinuteCode. All rights reserved.