class Home Controller < Application Controller
def index
require 'open-uri'
require 'nokogiri'
@url ="http://news.naver.com/main/read.nhn?mode=LSD&mid=shm&sid1=105&oid=030&aid=0002480868"
@page = Nokogiri::HTML(open(@url), nil, 'EUC-KR')
@title = @page.search("title").inner_html
@title_edit = @title.split(':')[0]
@content = @page.at_css('div.article_body').inner_html
end
end
I made the code like above. As shown below, all the Korean words are broken. I don't know what to do.<
Try text
instead of inner_html
.
The reason why the Korean alphabet is broken is that the encoding of the document is EUC-KR
, but the encoding used by the person who constructs the screen by reading the crawled data is UTF-8
.
According to the nakogiri
document,
Therefore, using the text
method instead of inner_html
solves the problem because both parsed data and the encoding used for the web page are unified to UTF-8
.
Strings are always stored as UTF-8 internally. Methods that return text values will always return UTF-8 encoded strings. Methods that return XML (like to_xml, to_html and inner_html) will return a string encoded like the source document.
581 PHP ssh2_scp_send fails to send files as intended
912 When building Fast API+Uvicorn environment with PyInstaller, console=False results in an error
617 Uncaught (inpromise) Error on Electron: An object could not be cloned
572 rails db:create error: Could not find mysql2-0.5.4 in any of the sources
© 2024 OneMinuteCode. All rights reserved.