To partially extract a text page with nokogiri

Linux Kernel (https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.1.30)
Becky!(http://www.rimarts.jp/downloads/B2/Readme.txt)

For pages that consist only of text, such as Changelog and Readme above,
I would like to have a scraping process (Nokogiri) to extract only the title and ID, but I cannot parse using XPath or CSS selector.

Is there anything else you can do other than extract it with the regular expression of the scan method?

■Environment
·Windows 10
·Cygwin
·Ruby 2.2.3
·Nokogiri 1.6.6.2

ruby nokogiri

2022-09-30 19:37

1 Answers

Nokia is an XML/HTML parser and cannot handle plain text.

2022-09-30 19:37

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656