Scrap the contents of Python Beautiful Soup newspaper / distribute by paragraph

Asked 2 years ago, Updated 2 years ago, 18 views

I used Beautiful Soup to collect newspaper content on the site, but I'm worried that I can't change the lines in each paragraph.

temp = ""
for n in c:
    temp = temp + str(n.get_text())
What is in
#html 
<p>    
    ABC. DEF.   
</p> 
<p> 
    GH. JK.
</p> 
<p> 
    LMN. OPQ. 
</p> 

I got the result using get_text(), but I'm disappointed.

 Results obtained 
    Another problem is that strings are attached after ABC. DEF.GH.JK.LMN. OPQ. <-- <p></p>

Desired result value
    ABC. DEF.
                        <--- Line break
    GH. JK.
                         <--- Line break
    LMN. OPQ.
                         <--- Line break

python

2022-09-21 09:59

1 Answers

I think you can get each paragraph text separately as follows, save it on a list, etc., and use it on your own.

from bs4 import BeautifulSoup

html = """<p>    
    ABC. DEF.   
</p> 
<p> 
    GH. JK.
</p> 
<p> 
    LMN. OPQ. 
</p> """

soup = BeautifulSoup(html, "html5lib")

paragraphs = [p.get_text() for p in soup.find_all("p")]

for i, p in enumerate(paragraphs):
    print(f'paragraph {i} -------------')
    print(p)
paragraph 0 -------------

    ABC. DEF.

paragraph 1 -------------

    GH. JK.

paragraph 2 -------------

    LMN. OPQ.


2022-09-21 09:59

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.