I'm studying Ruby on Rails. I'm trying to scratch four titles of Naver news articles, but it didn't work out as I thought, so I'm posting a question. In the code I made, if you turn 0002713773, 0002713772, 0002713771, 00027137770 separately without using #{c}, it works well, but when I turn this part into a repeating sentence, it doesn't come out. I wonder why.
@titles = Array.new
0002713773.downto(0002713770) do |c|
@url ="http://news.naver.com/main/read.nhn?mode=LSD&mid=shm&sid1=105&oid=032&aid=#{c}"
@page = Nokogiri::HTML(open(@url), nil, 'EUC-KR')
#@title = @page.search("title").text
@title = @page.css("#articleTitle")
@titles << @title.inner_text
end
Also, I have one more question as I study. I know that curl allows me to download the HTML and save it locally, but I want to select only certain parts (article title, article content) and save them locally as HTML, what should I do? I made a code to pull out a specific part, but I wonder how to save it locally.
For example, I want to save article 1 as 1.html by selecting only the title and content, and article 2 as 2.html.
ruby ruby-on-rails crawling html nokogiri
I solved why. Numbers that start with zero are recognized as octal numbers. I won't erase it just in case.
© 2024 OneMinuteCode. All rights reserved.