How to Remove Unnecessary Tags in BeautifulSoup

Asked 1 years ago, Updated 1 years ago, 392 views

<LI>·<a> to retrieve the string you want to retrieve from the HTML code described below, but
<LI>·<a> also has <time>·<span> and so on.

Please let me know.

Example:

<li><a href="XXXXXXXXX"> String you want to retrieve
<time>tt:mm</time>
<time><ige></time>
<span>1>/span>
<div></div>
</a></li>
・
・
・

python python3 beautiful-group

2022-11-23 14:24

2 Answers

 from bs4 import BeautifulSoup
import sys

html_contents='"
<li>
  <a href="XXXXXXXXXX"> String you want to retrieve
    <time>tt:mm</time>
    <time><ige></time>
    <span>1>/span>
    <div></div>
  </a>
</li>
'''

soup = BeautifulSoup(html_contents, 'html.parser')
elm = group.select_one('li>a[href]')
if elm is None:
    sys.exit(1)

wanted=elm.contents[0].strip()
print(wanted)

#
# String you want to retrieve


2022-11-23 16:03

 from bs4 import BeautifulSoup

html = ' '
<li><a href="XXXXXXXXXX"> String you want to retrieve
<time>tt:mm</time>
<time><ige></time>
<span>1>/span>
<div></div>
</a></li>
'''

soup = BeautifulSoup(html)

a=soup.a#Anchor tag a element
strs = [s for s in a.string ]
strs [0]
# US>String you want to retrieve\n
print(strs[0].strip())
# String you want to retrieve


2022-11-23 20:37

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.