To partially extract a text page with nokogiri

Asked 2 years ago, Updated 2 years ago, 122 views

Linux Kernel (https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.1.30)
Becky!(http://www.rimarts.jp/downloads/B2/Readme.txt)

For pages that consist only of text, such as Changelog and Readme above,
I would like to have a scraping process (Nokogiri) to extract only the title and ID, but I cannot parse using XPath or CSS selector.

Is there anything else you can do other than extract it with the regular expression of the scan method?

■Environment
·Windows 10
·Cygwin
·Ruby 2.2.3
·Nokogiri 1.6.6.2

ruby nokogiri

2022-09-30 19:37

1 Answers

Nokia is an XML/HTML parser and cannot handle plain text.


2022-09-30 19:37

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.