Linux Kernel (https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.1.30)
Becky!(http://www.rimarts.jp/downloads/B2/Readme.txt)
For pages that consist only of text, such as Changelog and Readme above,
I would like to have a scraping process (Nokogiri) to extract only the title and ID, but I cannot parse using XPath or CSS selector.
Is there anything else you can do other than extract it with the regular expression of the scan method?
■Environment
·Windows 10
·Cygwin
·Ruby 2.2.3
·Nokogiri 1.6.6.2
Nokia is an XML/HTML parser and cannot handle plain text.
© 2024 OneMinuteCode. All rights reserved.