<span class="fnt_e07" lang="en">I <i class="fnt_e08 N=a:smd.words" tabindex="0" lang="en"><input type="hidden" name="assist" value="don" lang="en">don</i>'t <i class="fnt_e08 N=a:smd.words" tabindex="0" lang="en"><input type="hidden" name="assist" value="have" lang="en">have</i> <i class="fnt_e08 N=a:smd.words" tabindex="0" lang="en"><input type="hidden" name="assist" value="a" lang="en">a</i> <i class="fnt_e08 N=a:smd.words" tabindex="0" lang="en"><input type="hidden" name="assist" value="bank account" lang="en">bank <b lang="en">account</b></i>.</span>
I'm trying to get the text item in and the text item in from the code over there I want to bring "I don't have a bank account." I don't know how to bring it because it's cut into tags like that. Help me
python scrapy
Of all the strings,
Remove all strings stacked with <
and <
, including >
.
Simply put, for example,
import re
string = """<span class="fnt_e07" lang="en">I <i class="fnt_e08 N=a:smd.words" tabindex="0" lang="en"><input type="hidden" name="assist" value="don" lang="en">don</i>'t <i class="fnt_e08 N=a:smd.words" tabindex="0" lang="en"><input type="hidden" name="assist" value="have" lang="en">have</i> <i class="fnt_e08 N=a:smd.words" tabindex="0" lang="en"><input type="hidden" name="assist" value="a" lang="en">a</i> <i class="fnt_e08 N=a:smd.words" tabindex="0" lang="en"><input type="hidden" name="assist" value="bank account" lang="en">bank <b lang="en">account</b></i>.</span>"""
text = re.sub(r'<[^>]*?>', '', string)
print(text)
Output: I don't have a bank account.
You can write with it.
Regular expressions can be learned from tryhelloworld - Regular expressions regex101 can be tested
© 2024 OneMinuteCode. All rights reserved.