oga/README.md

# Oga

**NOTE:** my spare time is limited which means I am unable to dedicate a lot of
time on Oga. If you're interested in contributing to FOSS, please take a look at
the open issues and submit a pull request to address them where possible.

Oga is an XML/HTML parser written in Ruby. It provides an easy to use API for
parsing, modifying and querying documents (using XPath expressions). Oga does
not require system libraries such as libxml, making it easier and faster to
install on various platforms. To achieve better performance Oga uses a small,
native extension (C for MRI/Rubinius, Java for JRuby).

Oga provides an API that allows you to safely parse and query documents in a
multi-threaded environment, without having to worry about your applications
blowing up.

From [Wikipedia][oga-wikipedia]:

> Oga: A large two-person saw used for ripping large boards in the days before
> power saws. One person stood on a raised platform, with the board below him,
> and the other person stood underneath them.

The name is a pun on [Nokogiri][nokogiri].

## Versioning Policy

Oga uses the version format `MAJOR.MINOR` (e.g. `2.1`). An increase of the MAJOR
version indicates backwards incompatible changes were introduced. The MINOR
version is _only_ increased when changes are backwards compatible, regardless of
whether those changes are bugfixes or new features. Up until version 1.0 the
code should be considered unstable meaning it can change (and break) at any
given moment.

APIs explicitly tagged as private (e.g. using Ruby's `private` keyword or YARD's
`@api private` tag) are not covered by these rules.

## Examples

Parsing a simple string of XML:

    Oga.parse_xml('<people><person>Alice</person></people>')

Parsing XML using strict mode (disables automatic tag insertion):

    Oga.parse_xml('<people>foo</people>', :strict => true) # works fine
    Oga.parse_xml('<people>foo', :strict => true)          # throws an error

Parsing a simple string of HTML:

    Oga.parse_html('<link rel="stylesheet" href="foo.css">')

Parsing an IO handle pointing to XML (this also works when using
`Oga.parse_html`):

    handle = File.open('path/to/file.xml')

    Oga.parse_xml(handle)

Parsing an IO handle using the pull parser:

    handle = File.open('path/to/file.xml')
    parser = Oga::XML::PullParser.new(handle)

    parser.parse do |node|
      parser.on(:text) do
        puts node.text
      end
    end

Using an Enumerator to download and parse an XML document on the fly:

    enum = Enumerator.new do |yielder|
      HTTPClient.get('http://some-website.com/some-big-file.xml') do |chunk|
        yielder << chunk
      end
    end

    document = Oga.parse_xml(enum)

Parse a string of XML using the SAX parser:

    class ElementNames
      attr_reader :names

      def initialize
        @names = []
      end

      def on_element(namespace, name, attrs = {})
        @names << name
      end
    end

    handler = ElementNames.new

    Oga.sax_parse_xml(handler, '<foo><bar></bar></foo>')

    handler.names # => ["foo", "bar"]

Querying a document using XPath:

    document = Oga.parse_xml <<-EOF
    <people>
      <person id="1">
        <name>Alice</name>
        <age>28</name>
      </person>
    </people>
    EOF

    # The "xpath" method returns an enumerable (Oga::XML::NodeSet) that you can
    # iterate over.
    document.xpath('people/person').each do |person|
      puts person.get('id') # => "1"

      # The "at_xpath" method returns a single node from a set, it's the same as
      # person.xpath('name').first.
      puts person.at_xpath('name').text # => "Alice"
    end

Querying the same document using CSS:

    document = Oga.parse_xml <<-EOF
    <people>
      <person id="1">
        <name>Alice</name>
        <age>28</name>
      </person>
    </people>
    EOF

    # The "css" method returns an enumerable (Oga::XML::NodeSet) that you can
    # iterate over.
    document.css('people person').each do |person|
      puts person.get('id') # => "1"

      # The "at_css" method returns a single node from a set, it's the same as
      # person.css('name').first.
      puts person.at_css('name').text # => "Alice"
    end

Modifying a document and serializing it back to XML:

    document = Oga.parse_xml('<people><person>Alice</person></people>')
    name     = document.at_xpath('people/person[1]/text()')

    name.text = 'Bob'

    document.to_xml # => "<people><person>Bob</person></people>"

Querying a document using a namespace:

    document = Oga.parse_xml('<root xmlns:x="foo"><x:div></x:div></root>')
    div      = document.xpath('root/x:div').first

    div.namespace # => Namespace(name: "x" uri: "foo")

## Features

* Support for parsing XML and HTML(5)
  * DOM parsing
  * Stream/pull parsing
  * SAX parsing
* Low memory footprint
* High performance (taking into account most work happens in Ruby)
* Support for XPath 1.0
* CSS3 selector support
* XML namespace support (registering, querying, etc)
* Windows support

## Requirements

| Ruby     | Required      | Recommended |
|:---------|:--------------|:------------|
| MRI      | >= 1.9.3      | >= 2.1.2    |
| Rubinius | >= 2.2        | >= 2.2.10   |
| JRuby    | >= 1.7        | >= 1.7.12   |
| Maglev   | Not supported |             |
| Topaz    | Not supported |             |
| mruby    | Not supported |             |

Maglev and Topaz are not supported due to the lack of a C API (that I know of)
and the lack of active development of these Ruby implementations. mruby is not
supported because it's a very different implementation all together.

To install Oga on MRI or Rubinius you'll need to have a working compiler such as
gcc or clang. Oga's C extension can be compiled with both. JRuby does not
require a compiler as the native extension is compiled during the Gem building
process and bundled inside the Gem itself.

## Thread Safety

Oga does not use a unsynchronized global mutable state. As a result of this you
can parse/create documents concurrently without any problems. Modifying
documents concurrently can lead to bugs as these operations are not
synchronized.

Some querying operations will cache data in instance variables, without
synchronization. An example is `Oga::XML::Element#namespace` which will cache an
element's namespace after the first call.

In general it's recommended to _not_ use the same document in multiple threads
at the same time.

## Namespace Support

Oga fully supports parsing/registering XML namespaces as well as querying them
using XPath. For example, take the following XML:

    <root xmlns="http://example.com">
        <bar>bar</bar>
    </root>

If one were to try and query the `bar` element (e.g. using XPath `root/bar`)
they'd end up with an empty node set. This is due to `<root>` defining an
alternative default namespace. Instead you can query this element using the
following XPath:

    *[local-name() = "root"]/*[local-name() = "bar"]

Alternatively, if you don't really care where the `<bar>` element is located you
can use the following:

    descendant::*[local-name() = "bar"]

And if you want to specify an explicit namespace URI, you can use this:

    descendant::*[local-name() = "bar" and namespace-uri() = "http://example.com"]

Unlike Nokogiri, Oga does _not_ provide a way to create "dynamic" namespaces.
That is, Nokogiri allows one to query the above document as following:

    document = Nokogiri::XML('<root xmlns="http://example.com"><bar>bar</bar></root>')

    document.xpath('x:root/x:bar', :x => 'http://example.com')

Oga does have a small trick you can use to cut down the size of your XPath
queries. Because Oga assigns the name "xmlns" to default namespaces you can use
this in your XPath queries:

    document = Oga.parse_xml('<root xmlns="http://example.com"><bar>bar</bar></root>')

    document.xpath('xmlns:root/xmlns:bar')

When using this you can still restrict the query to the correct namespace URI:

    document.xpath('xmlns:root[namespace-uri() = "http://example.com"]/xmlns:bar')

In the future I might add an API to ease this process, although at this time I
have little interest in providing an API similar to Nokogiri.

## HTML5 Support

Oga fully supports HTML5 including the omission of certain tags. For example,
the following is parsed just fine:

    <li>Hello
    <li>World

This is effectively parsed into:

    <li>Hello</li>
    <li>World</li>

One exception Oga makes is that it does _not_ automatically insert `html`,
`head` and `body` tags. Automatically inserting these tags requires a
distinction between documents and fragments as a user might not always want
these tags to be inserted if left out. This complicates the user facing API as
well as complicating the parsing internals of Oga. As a result I have decided
that Oga _does not_ insert these tags when left out.

A more in depth explanation can be found here:
<https://gitlab.com/yorickpeterse/oga/issues/98#note_45443992>

## Documentation

The documentation is best viewed [on the documentation website][doc-website].

* {file:CONTRIBUTING Contributing}
* {file:changelog Changelog}
* {file:migrating\_from\_nokogiri Migrating From Nokogiri}
* {Oga::XML::Parser XML Parser}
* {Oga::XML::SaxParser XML SAX Parser}
* {file:xml\_namespaces XML Namespaces}

## Why Another HTML/XML parser?

Currently there are a few existing parser out there, the most famous one being
[Nokogiri][nokogiri]. Another parser that's becoming more popular these days is
[Ox][ox]. Ruby's standard library also comes with REXML.

The sad truth is that these existing libraries are problematic in their own
ways. Nokogiri for example is extremely unstable on Rubinius. On MRI it works
because of the non concurrent nature of MRI, on JRuby it works because it's
implemented as Java. Nokogiri also uses libxml2 which is a massive beast of a
library, is not thread-safe and problematic to install on certain platforms
(apparently). I don't want to compile libxml2 every time I install Nokogiri
either.

To give an example about the issues with Nokogiri on Rubinius (or any other
Ruby implementation that is not MRI or JRuby), take a look at these issues:

* <https://github.com/rubinius/rubinius/issues/2957>
* <https://github.com/rubinius/rubinius/issues/2908>
* <https://github.com/rubinius/rubinius/issues/2462>
* <https://github.com/sparklemotion/nokogiri/issues/1047>
* <https://github.com/sparklemotion/nokogiri/issues/939>

Some of these have been fixed, some have not. The core problem remains:
Nokogiri acts in a way that there can be a large number of places where it
*might* break due to throwing around void pointers and what not and expecting
that things magically work. Note that I have nothing against the people running
these projects, I just heavily, *heavily* dislike the resulting codebase one
has to deal with today.

Ox looks very promising but it lacks a rather crucial feature: parsing HTML
(without using a SAX API). It's also again a C extension making debugging more
of a pain (at least for me).

I just want an XML/HTML parser that I can rely on stability wise and that is
written in Ruby so I can actually debug it. In theory it should also make it
easier for other Ruby developers to contribute.

## License

All source code in this repository is subject to the terms of the Mozilla Public
License, version 2.0 unless stated otherwise. A copy of this license can be
found the file "LICENSE" or at <https://www.mozilla.org/MPL/2.0/>.

[nokogiri]: https://github.com/sparklemotion/nokogiri
[oga-wikipedia]: https://en.wikipedia.org/wiki/Japanese_saw#Other_Japanese_saws
[ox]: https://github.com/ohler55/ox
[doc-website]: http://code.yorickpeterse.com/oga/latest/
Leaked Oga on Github. 2014-02-26 13:14:48 +00:00			`# Oga`

Added note about wanting more patches 2017-07-03 15:58:37 +00:00			`NOTE: my spare time is limited which means I am unable to dedicate a lot of`
			`time on Oga. If you're interested in contributing to FOSS, please take a look at`
			`the open issues and submit a pull request to address them where possible.`

Updated README intro + thread-safety section. 2014-09-11 21:39:10 +00:00			`Oga is an XML/HTML parser written in Ruby. It provides an easy to use API for`
			`parsing, modifying and querying documents (using XPath expressions). Oga does`
			`not require system libraries such as libxml, making it easier and faster to`
			`install on various platforms. To achieve better performance Oga uses a small,`
			`native extension (C for MRI/Rubinius, Java for JRuby).`
Updated the README. 2014-05-07 22:15:54 +00:00
Updated README intro + thread-safety section. 2014-09-11 21:39:10 +00:00			`Oga provides an API that allows you to safely parse and query documents in a`
			`multi-threaded environment, without having to worry about your applications`
			`blowing up.`
Leaked Oga on Github. 2014-02-26 13:14:48 +00:00
			`From [Wikipedia][oga-wikipedia]:`

			`> Oga: A large two-person saw used for ripping large boards in the days before`
			`> power saws. One person stood on a raised platform, with the board below him,`
Added a license. 2014-02-26 21:20:47 +00:00			`> and the other person stood underneath them.`
Leaked Oga on Github. 2014-02-26 13:14:48 +00:00
Clarify the name a bit more in the README 2015-05-11 19:36:21 +00:00			`The name is a pun on [Nokogiri][nokogiri].`

Simplify Oga's versioning policy 2016-01-22 01:14:51 +00:00			`## Versioning Policy`

			Oga uses the version format `MAJOR.MINOR` (e.g. `2.1`). An increase of the MAJOR
			`version indicates backwards incompatible changes were introduced. The MINOR`
			`version is _only_ increased when changes are backwards compatible, regardless of`
			`whether those changes are bugfixes or new features. Up until version 1.0 the`
			`code should be considered unstable meaning it can change (and break) at any`
			`given moment.`

			APIs explicitly tagged as private (e.g. using Ruby's `private` keyword or YARD's
			`@api private` tag) are not covered by these rules.
Define the public API using YARD/semver 2015-05-11 19:34:34 +00:00
Added a basic set of examples on using Oga. 2014-09-03 22:24:13 +00:00			`## Examples`

			`Parsing a simple string of XML:`

			`Oga.parse_xml('<people><person>Alice</person></people>')`

Support for strict parsing of XML documents Currently this only disabled the automatic insertion of closing tags, in the future this may also disable other features if deemed worth the effort. Fixes #107 2015-06-15 21:53:11 +00:00			`Parsing XML using strict mode (disables automatic tag insertion):`

			`Oga.parse_xml('<people>foo</people>', :strict => true) # works fine`
			`Oga.parse_xml('<people>foo', :strict => true) # throws an error`

Added a basic set of examples on using Oga. 2014-09-03 22:24:13 +00:00			`Parsing a simple string of HTML:`

			`Oga.parse_html('<link rel="stylesheet" href="foo.css">')`

			`Parsing an IO handle pointing to XML (this also works when using`
			`Oga.parse_html`):

			`handle = File.open('path/to/file.xml')`

			`Oga.parse_xml(handle)`

Basic example for the pull parser. 2014-09-03 22:32:39 +00:00			`Parsing an IO handle using the pull parser:`

			`handle = File.open('path/to/file.xml')`
			`parser = Oga::XML::PullParser.new(handle)`

			`parser.parse do \|node\|`
			`parser.on(:text) do`
			`puts node.text`
			`end`
			`end`

README example on using Enumerator for input. 2014-11-17 22:59:30 +00:00			`Using an Enumerator to download and parse an XML document on the fly:`

			`enum = Enumerator.new do \|yielder\|`
			`HTTPClient.get('http://some-website.com/some-big-file.xml') do \|chunk\|`
			`yielder << chunk`
			`end`
			`end`

			`document = Oga.parse_xml(enum)`

README example on using the SAX parser. 2014-09-16 12:36:02 +00:00			`Parse a string of XML using the SAX parser:`

			`class ElementNames`
			`attr_reader :names`

			`def initialize`
			`@names = []`
			`end`

			`def on_element(namespace, name, attrs = {})`
			`@names << name`
			`end`
			`end`

			`handler = ElementNames.new`

			`Oga.sax_parse_xml(handler, '<foo><bar></bar></foo>')`

			`handler.names # => ["foo", "bar"]`

Added a basic set of examples on using Oga. 2014-09-03 22:24:13 +00:00			`Querying a document using XPath:`

Expanded XPath/CSS examples in the README Fixes #83 2015-03-21 00:36:10 +00:00			`document = Oga.parse_xml <<-EOF`
			`<people>`
			`<person id="1">`
			`<name>Alice</name>`
			`<age>28</name>`
			`</person>`
			`</people>`
			`EOF`

			`# The "xpath" method returns an enumerable (Oga::XML::NodeSet) that you can`
			`# iterate over.`
			`document.xpath('people/person').each do \|person\|`
			`puts person.get('id') # => "1"`

			`# The "at_xpath" method returns a single node from a set, it's the same as`
			`# person.xpath('name').first.`
			`puts person.at_xpath('name').text # => "Alice"`
			`end`
Added README example on using CSS selectors. 2014-11-17 22:21:55 +00:00
Expanded XPath/CSS examples in the README Fixes #83 2015-03-21 00:36:10 +00:00			`Querying the same document using CSS:`

			`document = Oga.parse_xml <<-EOF`
			`<people>`
			`<person id="1">`
			`<name>Alice</name>`
			`<age>28</name>`
			`</person>`
			`</people>`
			`EOF`

			`# The "css" method returns an enumerable (Oga::XML::NodeSet) that you can`
			`# iterate over.`
			`document.css('people person').each do \|person\|`
			`puts person.get('id') # => "1"`

			`# The "at_css" method returns a single node from a set, it's the same as`
			`# person.css('name').first.`
			`puts person.at_css('name').text # => "Alice"`
			`end`
Added README example on using CSS selectors. 2014-11-17 22:21:55 +00:00
Added a basic set of examples on using Oga. 2014-09-03 22:24:13 +00:00			`Modifying a document and serializing it back to XML:`

			`document = Oga.parse_xml('<people><person>Alice</person></people>')`
			`name = document.at_xpath('people/person[1]/text()')`

			`name.text = 'Bob'`

			`document.to_xml # => "<people><person>Bob</person></people>"`

Example on querying XML namespaces. 2014-09-12 14:54:36 +00:00			`Querying a document using a namespace:`

			`document = Oga.parse_xml('<root xmlns:x="foo"><x:div></x:div></root>')`
			`div = document.xpath('root/x:div').first`

			`div.namespace # => Namespace(name: "x" uri: "foo")`

Leaked Oga on Github. 2014-02-26 13:14:48 +00:00			`## Features`

Updated the README. 2014-05-07 22:15:54 +00:00			`* Support for parsing XML and HTML(5)`
			`* DOM parsing`
			`* Stream/pull parsing`
Added SAX parsing to the list of parsing features. 2014-09-16 12:50:48 +00:00			`* SAX parsing`
Rephrased parts of the README. 2014-09-09 19:04:50 +00:00			`* Low memory footprint`
Clarify README performance feature a bit 2016-04-21 13:37:15 +00:00			`* High performance (taking into account most work happens in Ruby)`
Updated XPath node in the README. 2014-09-03 22:15:41 +00:00			`* Support for XPath 1.0`
Added CSS selectors to the list of features. 2014-11-16 21:55:44 +00:00			`* CSS3 selector support`
Added XML namespaces to the features list. 2014-09-12 14:50:12 +00:00			`* XML namespace support (registering, querying, etc)`
Added Windows support as a feature in the README 2016-02-10 18:06:58 +00:00			`* Windows support`
Leaked Oga on Github. 2014-02-26 13:14:48 +00:00
			`## Requirements`

Cleaned up the README requirements section. 2014-07-22 19:42:58 +00:00			`\| Ruby \| Required \| Recommended \|`
			`\|:---------\|:--------------\|:------------\|`
			`\| MRI \| >= 1.9.3 \| >= 2.1.2 \|`
			`\| Rubinius \| >= 2.2 \| >= 2.2.10 \|`
			`\| JRuby \| >= 1.7 \| >= 1.7.12 \|`
			`\| Maglev \| Not supported \| \|`
			`\| Topaz \| Not supported \| \|`
			`\| mruby \| Not supported \| \|`

			`Maglev and Topaz are not supported due to the lack of a C API (that I know of)`
			`and the lack of active development of these Ruby implementations. mruby is not`
			`supported because it's a very different implementation all together.`

			`To install Oga on MRI or Rubinius you'll need to have a working compiler such as`
			`gcc or clang. Oga's C extension can be compiled with both. JRuby does not`
			`require a compiler as the native extension is compiled during the Gem building`
			`process and bundled inside the Gem itself.`
Leaked Oga on Github. 2014-02-26 13:14:48 +00:00
Updated README intro + thread-safety section. 2014-09-11 21:39:10 +00:00			`## Thread Safety`

Tweaked thread safety notice in the README Querying the same document concurrently _could_ lead to problems, so lets just recommend users to not even try this. 2015-09-06 17:30:40 +00:00			`Oga does not use a unsynchronized global mutable state. As a result of this you`
			`can parse/create documents concurrently without any problems. Modifying`
			`documents concurrently can lead to bugs as these operations are not`
			`synchronized.`
Updated README intro + thread-safety section. 2014-09-11 21:39:10 +00:00
Tweaked thread safety notice in the README Querying the same document concurrently _could_ lead to problems, so lets just recommend users to not even try this. 2015-09-06 17:30:40 +00:00			`Some querying operations will cache data in instance variables, without`
			synchronization. An example is `Oga::XML::Element#namespace` which will cache an
			`element's namespace after the first call.`

			`In general it's recommended to _not_ use the same document in multiple threads`
			`at the same time.`
Updated README intro + thread-safety section. 2014-09-11 21:39:10 +00:00
Added README note on default namespaces. This closes #57. 2014-11-16 22:08:32 +00:00			`## Namespace Support`

			`Oga fully supports parsing/registering XML namespaces as well as querying them`
			`using XPath. For example, take the following XML:`

			`<root xmlns="http://example.com">`
			`<bar>bar</bar>`
			`</root>`

			If one were to try and query the `bar` element (e.g. using XPath `root/bar`)
			they'd end up with an empty node set. This is due to `<root>` defining an
			`alternative default namespace. Instead you can query this element using the`
			`following XPath:`

			`[local-name() = "root"]/[local-name() = "bar"]`

			Alternatively, if you don't really care where the `<bar>` element is located you
			`can use the following:`

			`descendant::*[local-name() = "bar"]`

Corrected "explici" type Fixes #120 2015-06-29 13:14:04 +00:00			`And if you want to specify an explicit namespace URI, you can use this:`
Added README note on default namespaces. This closes #57. 2014-11-16 22:08:32 +00:00
			`descendant::*[local-name() = "bar" and namespace-uri() = "http://example.com"]`

			`Unlike Nokogiri, Oga does _not_ provide a way to create "dynamic" namespaces.`
			`That is, Nokogiri allows one to query the above document as following:`

			`document = Nokogiri::XML('<root xmlns="http://example.com"><bar>bar</bar></root>')`

			`document.xpath('x:root/x:bar', :x => 'http://example.com')`

			`Oga does have a small trick you can use to cut down the size of your XPath`
			`queries. Because Oga assigns the name "xmlns" to default namespaces you can use`
			`this in your XPath queries:`

			`document = Oga.parse_xml('<root xmlns="http://example.com"><bar>bar</bar></root>')`

			`document.xpath('xmlns:root/xmlns:bar')`

			`When using this you can still restrict the query to the correct namespace URI:`

			`document.xpath('xmlns:root[namespace-uri() = "http://example.com"]/xmlns:bar')`

			`In the future I might add an API to ease this process, although at this time I`
			`have little interest in providing an API similar to Nokogiri.`

Clarify lack of inserting html/head/body HTML tags 2015-04-27 22:08:12 +00:00			`## HTML5 Support`

			`Oga fully supports HTML5 including the omission of certain tags. For example,`
			`the following is parsed just fine:`

			`<li>Hello`
			`<li>World`

			`This is effectively parsed into:`

			`<li>Hello</li>`
			`<li>World</li>`

			One exception Oga makes is that it does _not_ automatically insert `html`,
			`head` and `body` tags. Automatically inserting these tags requires a
			`distinction between documents and fragments as a user might not always want`
			`these tags to be inserted if left out. This complicates the user facing API as`
			`well as complicating the parsing internals of Oga. As a result I have decided`
Use "tags when left out" in the HTML5 section 2015-04-27 22:08:51 +00:00			`that Oga _does not_ insert these tags when left out.`
Clarify lack of inserting html/head/body HTML tags 2015-04-27 22:08:12 +00:00
			`A more in depth explanation can be found here:`
Move to GitLab and GitLab CI 2017-11-02 01:13:00 +00:00			`<https://gitlab.com/yorickpeterse/oga/issues/98#note_45443992>`
Clarify lack of inserting html/head/body HTML tags 2015-04-27 22:08:12 +00:00
Documentation chapter in the README. 2014-09-10 22:11:06 +00:00			`## Documentation`

			`The documentation is best viewed [on the documentation website][doc-website].`

Corrected the contributing YARD link. 2014-09-10 22:13:44 +00:00			`* {file:CONTRIBUTING Contributing}`
Documentation chapter in the README. 2014-09-10 22:11:06 +00:00			`* {file:changelog Changelog}`
			`* {file:migrating\_from\_nokogiri Migrating From Nokogiri}`
Ensure SAX after_element receives meaningful args This changes the behaviour of after_element when parsing documents using the SAX parsing API. Previously it would always receive a nil argument, which is kinda pointless. This commit changes that by making sure it receives a namespace name (if any) and the element name. This fixes #54. 2014-11-16 22:32:32 +00:00			`* {Oga::XML::Parser XML Parser}`
			`* {Oga::XML::SaxParser XML SAX Parser}`
Basic docs on using XML namespaces. 2014-11-16 22:59:05 +00:00			`* {file:xml\_namespaces XML Namespaces}`
Documentation chapter in the README. 2014-09-10 22:11:06 +00:00
Leaked Oga on Github. 2014-02-26 13:14:48 +00:00			`## Why Another HTML/XML parser?`

			`Currently there are a few existing parser out there, the most famous one being`
			`[Nokogiri][nokogiri]. Another parser that's becoming more popular these days is`
			`[Ox][ox]. Ruby's standard library also comes with REXML.`

			`The sad truth is that these existing libraries are problematic in their own`
			`ways. Nokogiri for example is extremely unstable on Rubinius. On MRI it works`
Fix typo in README 2017-07-11 10:10:05 +00:00			`because of the non concurrent nature of MRI, on JRuby it works because it's`
Leaked Oga on Github. 2014-02-26 13:14:48 +00:00			`implemented as Java. Nokogiri also uses libxml2 which is a massive beast of a`
			`library, is not thread-safe and problematic to install on certain platforms`
			`(apparently). I don't want to compile libxml2 every time I install Nokogiri`
			`either.`

Clarify some of the Nokogiri/Rbx issues. 2014-02-26 13:46:54 +00:00			`To give an example about the issues with Nokogiri on Rubinius (or any other`
			`Ruby implementation that is not MRI or JRuby), take a look at these issues:`

Don't rely on GFM to automatically render links. 2014-07-22 19:32:07 +00:00			`* <https://github.com/rubinius/rubinius/issues/2957>`
			`* <https://github.com/rubinius/rubinius/issues/2908>`
			`* <https://github.com/rubinius/rubinius/issues/2462>`
			`* <https://github.com/sparklemotion/nokogiri/issues/1047>`
			`* <https://github.com/sparklemotion/nokogiri/issues/939>`
Clarify some of the Nokogiri/Rbx issues. 2014-02-26 13:46:54 +00:00
			`Some of these have been fixed, some have not. The core problem remains:`
			`Nokogiri acts in a way that there can be a large number of places where it`
			`might break due to throwing around void pointers and what not and expecting`
			`that things magically work. Note that I have nothing against the people running`
			`these projects, I just heavily, heavily dislike the resulting codebase one`
			`has to deal with today.`

Leaked Oga on Github. 2014-02-26 13:14:48 +00:00			`Ox looks very promising but it lacks a rather crucial feature: parsing HTML`
			`(without using a SAX API). It's also again a C extension making debugging more`
			`of a pain (at least for me).`

Corrected line about what kind of parser I want. 2014-09-10 23:12:06 +00:00			`I just want an XML/HTML parser that I can rely on stability wise and that is`
Leaked Oga on Github. 2014-02-26 13:14:48 +00:00			`written in Ruby so I can actually debug it. In theory it should also make it`
			`easier for other Ruby developers to contribute.`

Added a license. 2014-02-26 21:20:47 +00:00			`## License`

Change license from MIT to MPL 2.0 While the MIT license is a fantastic license for those too lazy (or unable) to understand more complex licenses it's too lax when it comes to protecting authors (= me). For example, there are no clauses regarding patents or ownership of source code. This means that patent trolls could, in theory, drag me to court. Of course one can still do that when using the MPL, but at least it has an explicit clause regarding patents. The MPL also provides a nice balance between the MIT license and the Apache license. I don't like the Apache license as it requires listing any significant changes in every changed file. In short, I don't really care what people do with my software (they could sell it for millions for all I care), as long as they don't drag me to court or otherwise hold me accountable for something. 2015-05-15 21:48:18 +00:00			`All source code in this repository is subject to the terms of the Mozilla Public`
			`License, version 2.0 unless stated otherwise. A copy of this license can be`
			`found the file "LICENSE" or at <https://www.mozilla.org/MPL/2.0/>.`
Added a license. 2014-02-26 21:20:47 +00:00
Leaked Oga on Github. 2014-02-26 13:14:48 +00:00			`[nokogiri]: https://github.com/sparklemotion/nokogiri`
			`[oga-wikipedia]: https://en.wikipedia.org/wiki/Japanese_saw#Other_Japanese_saws`
			`[ox]: https://github.com/ohler55/ox`
Documentation chapter in the README. 2014-09-10 22:11:06 +00:00			`[doc-website]: http://code.yorickpeterse.com/oga/latest/`