Scrap the contents of Python Beautiful Soup newspaper / distribute by paragraph

I used Beautiful Soup to collect newspaper content on the site, but I'm worried that I can't change the lines in each paragraph.

temp = ""
for n in c:
    temp = temp + str(n.get_text())

What is in

#html 
<p>    
    ABC. DEF.   
</p> 
<p> 
    GH. JK.
</p> 
<p> 
    LMN. OPQ. 
</p>

I got the result using get_text(), but I'm disappointed.

 Results obtained 
    Another problem is that strings are attached after ABC. DEF.GH.JK.LMN. OPQ. <-- <p></p>

Desired result value
    ABC. DEF.
                        <--- Line break
    GH. JK.
                         <--- Line break
    LMN. OPQ.
                         <--- Line break

python

2022-09-21 09:59

1 Answers

I think you can get each paragraph text separately as follows, save it on a list, etc., and use it on your own.

from bs4 import BeautifulSoup

html = """<p>    
    ABC. DEF.   
</p> 
<p> 
    GH. JK.
</p> 
<p> 
    LMN. OPQ. 
</p> """

soup = BeautifulSoup(html, "html5lib")

paragraphs = [p.get_text() for p in soup.find_all("p")]

for i, p in enumerate(paragraphs):
    print(f'paragraph {i} -------------')
    print(p)

paragraph 0 -------------

    ABC. DEF.

paragraph 1 -------------

    GH. JK.

paragraph 2 -------------

    LMN. OPQ.

2022-09-21 09:59

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656