Added basic Nokogiri migration guide.
I will expand this over time, but it's a decent start for the time being. This fixes #13.
This commit is contained in:
parent
c45d32a37e
commit
0c61749c65
|
@ -0,0 +1,169 @@
|
|||
# Migrating From Nokogiri
|
||||
|
||||
If you're parsing XML/HTML documents using Ruby, chances are you're using
|
||||
[Nokogiri][nokogiri] for this. This guide aims to make it easier to switch from
|
||||
Nokogiri to Oga.
|
||||
|
||||
## Parsing Documents
|
||||
|
||||
In Nokogiri there are two defacto ways of parsing documents:
|
||||
|
||||
* `Nokogiri.XML()` for XML documents
|
||||
* `Nokogiri.HTML()` for HTML documents
|
||||
|
||||
For example, to parse an XML document you'd use the following:
|
||||
|
||||
Nokogiri::XML('<root>foo</root>')
|
||||
|
||||
Oga instead uses the following two methods:
|
||||
|
||||
* `Oga.parse_xml`
|
||||
* `Oga.parse_html`
|
||||
|
||||
Their usage is similar:
|
||||
|
||||
Oga.parse_xml('<root>foo</root>')
|
||||
|
||||
Nokogiri returns two distinctive document classes based on what method was used
|
||||
to parse a document:
|
||||
|
||||
* `Nokogiri::XML::Document` for XML documents
|
||||
* `Nokogiri::HTML::Document` for HTML documents
|
||||
|
||||
Oga on the other hand always returns `Oga::XML::Document` instance, Oga
|
||||
currently makes no distinction between XML and HTML documents other than on
|
||||
lexer level. This might change in the future if deemed required.
|
||||
|
||||
## Querying Documents
|
||||
|
||||
Nokogiri allows one to query documents/elements using both XPath expressions and
|
||||
CSS selectors. In Nokogiri one queries a document as following:
|
||||
|
||||
document = Nokogiri::XML('<root><foo>bar</foo></root>')
|
||||
|
||||
document.xpath('root/foo')
|
||||
document.css('root foo')
|
||||
|
||||
Oga currently only supports XPath expressions, CSS selectors will be added in
|
||||
the near future. Querying documents works similar to Nokogiri:
|
||||
|
||||
document = Oga.parse_xml('<root><foo>bar</foo></root>')
|
||||
|
||||
document.xpath('root/foo')
|
||||
|
||||
Nokogiri also allows you to query a document and return the first match, opposed
|
||||
to an entire node set, using the method `at`. In Nokogiri this method can be
|
||||
used for both XPath expression and CSS selectors. Oga has no such method,
|
||||
instead it provides the following more dedicated methods:
|
||||
|
||||
* `at_xpath`: returns the first node of an XPath expression
|
||||
|
||||
For example:
|
||||
|
||||
document = Oga.parse_xml('<root><foo>bar</foo></root>')
|
||||
|
||||
document.at_xpath('root/foo')
|
||||
|
||||
By using a dedicated method Oga doesn't have to try and guess what type of
|
||||
expression you're using (XPath or CSS), meaning it can never make any mistakes.
|
||||
|
||||
## Retrieving Attribute Values
|
||||
|
||||
Nokogiri provides two methods for retrieving attributes and attribute values:
|
||||
|
||||
* `Nokogiri::XML::Node#attribute`
|
||||
* `Nokogiri::XML::Node#attr`
|
||||
|
||||
The first method always returns an instance of `Nokogiri::XML::Attribute`, the
|
||||
second method returns the attribute value as a `String`. This behaviour,
|
||||
especially due to the names used, is extremely confusing.
|
||||
|
||||
Oga on the other hand provides the following two methods:
|
||||
|
||||
* `Oga::XML::Element#attribute` (aliased as `attr`)
|
||||
* `Oga::XML::Element#get`
|
||||
|
||||
The first method always returns a `Oga::XML::Attribute` instance, the second
|
||||
returns the attribute value as a `String`. I deliberately chose `get` for
|
||||
getting a value to remove the confusion of `attribute` vs `attr`. This also
|
||||
allows for `attr` to simply be an alias of `attribute`.
|
||||
|
||||
As an example, this is how you'd get the value of a `class` attribute in
|
||||
Nokogiri:
|
||||
|
||||
document = Nokogiri::XML('<root class="foo"></root>')
|
||||
|
||||
document.xpath('root').first.attr('class') # => "foo"
|
||||
|
||||
This is how you'd get the same value in Oga:
|
||||
|
||||
document = Oga.parse_xml('<root class="foo"></root>')
|
||||
|
||||
document.xpath('root').first.get('class') # => "foo"
|
||||
|
||||
## Modifying Documents
|
||||
|
||||
Modifying documents in Nokogiri is not as convenient as it perhaps could be. For
|
||||
example, adding an element to a document is done as following:
|
||||
|
||||
document = Nokogiri::XML('<root></root>')
|
||||
root = document.xpath('root').first
|
||||
|
||||
name = Nokogiri::XML::Element.new('name', document)
|
||||
|
||||
name.inner_html = 'Alice'
|
||||
|
||||
root.add_child(name)
|
||||
|
||||
The annoying part here is that we have to pass a document into an Element's
|
||||
constructor. As such, you can not create elements without first creating a
|
||||
document. Another thing is that Nokogiri has no method called `inner_text=`,
|
||||
instead you have to use the method `inner_html=`.
|
||||
|
||||
In Oga you'd use the following:
|
||||
|
||||
document = Oga.parse_xml('<root></root>')
|
||||
root = document.xpath('root').first
|
||||
|
||||
name = Oga::XML::Element.new(:name => 'name')
|
||||
|
||||
name.inner_text = 'Alice'
|
||||
|
||||
root.children << name
|
||||
|
||||
Adding attributes works similar for both Nokogiri and Oga. For Nokogiri you'd
|
||||
use the following:
|
||||
|
||||
element.set_attribute('class', 'foo')
|
||||
|
||||
Alternatively you can do the following:
|
||||
|
||||
element['class'] = 'foo'
|
||||
|
||||
In Oga you'd instead use the method `set`:
|
||||
|
||||
element.set('class', 'foo')
|
||||
|
||||
This method automatically creates an attribute if it doesn't exist, including
|
||||
the namespace if specified:
|
||||
|
||||
element.set('foo:class', 'foo')
|
||||
|
||||
## Serializing Documents
|
||||
|
||||
Serializing the document back to XML works the same in both libraries, simply
|
||||
call `to_xml` on a document or element and you'll get a String back containing
|
||||
the XML. There is one key difference here though: Nokogiri does not return the
|
||||
exact same output as it was given as input, for example it adds XML declaration
|
||||
tags:
|
||||
|
||||
Nokogiri::XML('<root></root>').to_xml # => "<?xml version=\"1.0\"?>\n<root/>\n"
|
||||
|
||||
Oga on the other hand does not do this:
|
||||
|
||||
Oga.parse_xml('<root></root>').to_xml # => "<root></root>"
|
||||
|
||||
Oga also doesn't insert random newlines or other possibly unexpected (or
|
||||
unwanted) data.
|
||||
|
||||
[nokogiri]: http://nokogiri.org/
|
Loading…
Reference in New Issue