I wonder if Ruby On-Rails brings the contents of a specific tag when it comes to web crawl.

Asked 1 years ago, Updated 1 years ago, 79 views

I'm doing an example of web crawling using the open source library nokogiri. I wonder how to get the contents of a specific tag. For example,

<div class="story_area">
    <div class="title_area">
        <h4 class="h_story"><strong class="blind"> Plot</strong></h4>
    </div>

    <h5 class="h_tx_story"> Friendship between Toto, a boy whose movie is everything to the world, and Alfredo, a projectionist at an old village theater!<br>Revival of the touching masterpiece that has made the world laugh and cry for 25 years!</h5>

    <p class="con_tx"> Toto (Jacques Ferenc), a famous film director, will visit his hometown after 30 years at the news of the death of Alfredo (Philip Noirre), a projectionist in his hometown. Toto (Salvatore Caszio), a boy whose childhood movies were everything to the world, runs to an old theater called Cinema Heaven in the village square to be friends with Alfredo, a consular engineer, and learns projection skills over his shoulder. One day, Alfredo, who was performing an outdoor screening in the square for the audience, became blind due to a fire accident, and Toto succeeded him as a video engineer for "Cinema Heaven." Alfredo, who became Toto's friend and father after his blindness, was frustrated by the opposition of his beloved girl Elena (Agnez Nano)'s parents, and Toto left his hometown.</p>


    <a href="http://terms.naver.com/ncrEntry.nhn?dicId=moviework_dic&ncrDocId=ef6_290" target="_blank" class="movie_terms"><em class="blind"> View Movie Back </em></a>

</div>

I want to bring the contents of the movie's plot, p tag, when the code like above exists, and I wonder what to do.

ruby-on-rails crawler web ruby

2022-09-22 08:38

1 Answers

Use at_css to import only one.

doc = Nokogiri::HTML (open('address', 'User-Agent' => USER_AGENT))
doc.at_css('p.con_tx').text


2022-09-22 08:38

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.