179 lines
5.1 KiB
Plaintext
179 lines
5.1 KiB
Plaintext
= Nokogiri {<img src="https://secure.travis-ci.org/tenderlove/nokogiri.png?rvm=1.9.3" />}[http://travis-ci.org/tenderlove/nokogiri]
|
||
|
||
* http://nokogiri.org
|
||
* http://github.com/tenderlove/nokogiri/wikis
|
||
* http://github.com/tenderlove/nokogiri/tree/master
|
||
* http://groups.google.com/group/nokogiri-talk
|
||
* http://github.com/tenderlove/nokogiri/issues
|
||
|
||
== DESCRIPTION:
|
||
|
||
Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser. Among Nokogiri's
|
||
many features is the ability to search documents via XPath or CSS3 selectors.
|
||
|
||
XML is like violence - if it doesn’t solve your problems, you are not using
|
||
enough of it.
|
||
|
||
== FEATURES:
|
||
|
||
* XPath support for document searching
|
||
* CSS3 selector support for document searching
|
||
* XML/HTML builder
|
||
|
||
Nokogiri parses and searches XML/HTML very quickly, and also has
|
||
correctly implemented CSS3 selector support as well as XPath support.
|
||
|
||
== SUPPORT:
|
||
|
||
Before filing a bug report, please read our {submission guidelines}[http://nokogiri.org/tutorials/getting_help.html] at:
|
||
|
||
* http://nokogiri.org/tutorials/getting_help.html
|
||
|
||
The Nokogiri {mailing list}[http://groups.google.com/group/nokogiri-talk]
|
||
is available here:
|
||
|
||
* http://groups.google.com/group/nokogiri-talk
|
||
|
||
The {bug tracker}[http://github.com/tenderlove/nokogiri/issues]
|
||
is available here:
|
||
|
||
* http://github.com/tenderlove/nokogiri/issues
|
||
|
||
The IRC channel is #nokogiri on freenode.
|
||
|
||
== SYNOPSIS:
|
||
|
||
require 'nokogiri'
|
||
require 'open-uri'
|
||
|
||
# Get a Nokogiri::HTML:Document for the page we’re interested in...
|
||
|
||
doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove'))
|
||
|
||
# Do funky things with it using Nokogiri::XML::Node methods...
|
||
|
||
####
|
||
# Search for nodes by css
|
||
doc.css('h3.r a').each do |link|
|
||
puts link.content
|
||
end
|
||
|
||
####
|
||
# Search for nodes by xpath
|
||
doc.xpath('//h3/a').each do |link|
|
||
puts link.content
|
||
end
|
||
|
||
####
|
||
# Or mix and match.
|
||
doc.search('h3.r a.l', '//h3/a').each do |link|
|
||
puts link.content
|
||
end
|
||
|
||
|
||
== REQUIREMENTS:
|
||
|
||
* ruby 1.8 or 1.9
|
||
* libxml2
|
||
* libxml2-dev
|
||
* libxslt
|
||
* libxslt-dev
|
||
|
||
== ENCODING:
|
||
|
||
Strings are always stored as UTF-8 internally. Methods that return
|
||
text values will always return UTF-8 encoded strings. Methods that
|
||
return XML (like to_xml, to_html and inner_html) will return a string
|
||
encoded like the source document.
|
||
|
||
*WARNING*
|
||
|
||
Some documents declare one particular encoding, but use a different
|
||
one. So, which encoding should the parser choose?
|
||
|
||
Remember that data is just a stream of bytes. Only us humans add
|
||
meaning to that stream. Any particular set of bytes could be valid
|
||
characters in multiple encodings, so detecting encoding with 100%
|
||
accuracy is not possible. libxml2 does its best, but it can't be right
|
||
100% of the time.
|
||
|
||
If you want Nokogiri to handle the document encoding properly, your
|
||
best bet is to explicitly set the encoding. Here is an example of
|
||
explicitly setting the encoding to EUC-JP on the parser:
|
||
|
||
doc = Nokogiri.XML('<foo><bar /><foo>', nil, 'EUC-JP')
|
||
|
||
== INSTALL:
|
||
|
||
* sudo gem install nokogiri
|
||
|
||
=== Binary packages
|
||
|
||
Binary packages are available for:
|
||
|
||
* SuSE[http://download.opensuse.org/repositories/devel:/languages:/ruby:/extensions/]
|
||
* Fedora[http://s390.koji.fedoraproject.org/koji/packageinfo?packageID=6756]
|
||
|
||
== DEVELOPMENT:
|
||
|
||
=== Developing on C Ruby (MRI)
|
||
|
||
Developing Nokogiri requires racc and rexical to generate the parser and
|
||
tokenizer. To start development, make sure you have `libxml2` and `libxslt`
|
||
installed.
|
||
|
||
Then install hoe and rake-compiler:
|
||
|
||
$ gem install hoe rake-compiler racc rexical minitest
|
||
|
||
Then run rake:
|
||
|
||
$ rake
|
||
|
||
=== Developing on JRuby
|
||
|
||
Currently, development with JRuby depends on CRuby being installed. With
|
||
CRuby, install racc and rexical:
|
||
|
||
$ gem install racc rexical
|
||
|
||
Make sure hoe and rake compiler are installed with JRuby:
|
||
|
||
$ jgem install hoe rake-compiler
|
||
|
||
Then run rake:
|
||
|
||
$ jruby -S rake
|
||
|
||
== LICENSE:
|
||
|
||
(The MIT License)
|
||
|
||
Copyright (c) 2008 - 2012:
|
||
|
||
* {Aaron Patterson}[http://tenderlovemaking.com]
|
||
* {Mike Dalessio}[http://mike.daless.io]
|
||
* {Charles Nutter}[http://blog.headius.com]
|
||
* {Sergio Arbeo}[http://www.serabe.com]
|
||
* {Patrick Mahoney}[http://polycrystal.org]
|
||
* {Yoko Harada}[http://yokolet.blogspot.com]
|
||
|
||
Permission is hereby granted, free of charge, to any person obtaining
|
||
a copy of this software and associated documentation files (the
|
||
'Software'), to deal in the Software without restriction, including
|
||
without limitation the rights to use, copy, modify, merge, publish,
|
||
distribute, sublicense, and/or sell copies of the Software, and to
|
||
permit persons to whom the Software is furnished to do so, subject to
|
||
the following conditions:
|
||
|
||
The above copyright notice and this permission notice shall be
|
||
included in all copies or substantial portions of the Software.
|
||
|
||
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
|
||
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
||
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
|
||
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
||
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
||
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|