How to Remove Unnecessary Tags in BeautifulSoup

<LI>·<a> to retrieve the string you want to retrieve from the HTML code described below, but
<LI>·<a> also has <time>·<span> and so on.

Please let me know.

Example:

<li><a href="XXXXXXXXX"> String you want to retrieve
<time>tt:mm</time>
<time><ige></time>
<span>1>/span>
<div></div>
</a></li>
・
・
・

python python3 beautiful-group

2022-11-23 14:24

2 Answers

 from bs4 import BeautifulSoup
import sys

html_contents='"
<li>
  <a href="XXXXXXXXXX"> String you want to retrieve
    <time>tt:mm</time>
    <time><ige></time>
    <span>1>/span>
    <div></div>
  </a>
</li>
'''

soup = BeautifulSoup(html_contents, 'html.parser')
elm = group.select_one('li>a[href]')
if elm is None:
    sys.exit(1)

wanted=elm.contents[0].strip()
print(wanted)

#
# String you want to retrieve

2022-11-23 16:03

 from bs4 import BeautifulSoup

html = ' '
<li><a href="XXXXXXXXXX"> String you want to retrieve
<time>tt:mm</time>
<time><ige></time>
<span>1>/span>
<div></div>
</a></li>
'''

soup = BeautifulSoup(html)

a=soup.a#Anchor tag a element
strs = [s for s in a.string ]
strs [0]
# US>String you want to retrieve\n
print(strs[0].strip())
# String you want to retrieve

2022-11-23 20:37

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656