4.5 KiB

Raw Blame History

Changelog

This document contains details of the various releases and their release dates. Dates are in the format yyyy-mm-dd.

0.2.0 - Unreleased

XML entities such as & and < are now encoded/decoded by the lexer, string and text nodes. See https://github.com/YorickPeterse/oga/issues/49 for more information.

Source lines are no longer included in error messages generated by the XML parser. This simplifies the code and removes the need of re-reading the input (in case of IO/Enumerable inputs).

Newlines in the XML lexer are now counted in native code (C/Java). On MRI and JRuby the improvement is quite small, but on Rubinius it's a massive improvement. See commit 8db77c0a09bf6c996dd2856a6dbe1ad076b1d30a for more information.

Performance for detecting HTML void elements (e.g. <br> and <link>) has been improved by removing String allocations that were not needed.

0.1.3 - 2014-09-24

This release fixes a problem with serializing attributes using the namespace prefix "xmlns". See https://github.com/YorickPeterse/oga/issues/47 for more information.

0.1.2 - 2014-09-23

SAX API

A SAX parser/API has been added. This API is useful when even the overhead of the pull-parser is too much memory wise. Example:

class ElementNames
  attr_reader :names

  def initialize
    @names = []
  end

  def on_element(namespace, name, attrs = {})
    @names << name
  end
end

handler = ElementNames.new

Oga.sax_parse_xml(handler, '<foo><bar></bar></foo>')

handler.names # => ["foo", "bar"]

Racc Gem

Oga will now always use the Racc gem instead of the version shipped with the Ruby standard library.

Error Reporting

XML parser errors have been made a little bit more user friendly, though they can still be quite cryptic.

Serializing Elements

Elements serialized to XML/HTML will use self-closing tags whenever possible. When parsing HTML documents only HTML void elements will use self-closing tags (e.g. <link> tags). Example:

Oga.parse_xml('<foo></foo>').to_xml        # => "<foo />"
Oga.parse_html('<script></script>').to_xml # => "<script></script>"

Default Namespaces

Namespaces are no longer removed from the attributes list when an element is created.

Default XML namespaces can now be registered using xmlns="...". Previously this would be ignored. Example:

document = Oga.parse_xml('<root xmlns="baz"></root>')
root     = document.children[0]

root.namespace # => Namespace(name: "xmlns" uri: "baz")

Lexing Incomplete Input

Oga can now lex input such as </ without entering an infinite loop. Example:

Oga.parse_xml('</') # => Document(children: NodeSet(Text("</")))

Absolute XPath Paths

Oga can now parse and evaluate the XPath expression "/" (that is, just "/"). This will return the root node (usually a Document instance). Example:

document = Oga.parse_xml('<root></root>')

document.xpath('/') # => NodeSet(Document(children: NodeSet(Element(name: "root"))))

Namespace Ordering

Namespaces available to an element are now returned in the correct order. Previously outer namespaces would take precedence over inner namespaces, instead of it being the other way around. Example:

document = Oga.parse_xml <<-EOF
<root xmlns:foo="bar">
  <container xmlns:foo="baz">
    <foo:text>Text!</foo:text>
  </container>
</root>
EOF

foo = document.at_xpath('root/container/foo:text')

foo.namespace # => Namespace(name: "foo" uri: "baz")

Parsing Capitalized HTML Void Elements

Oga is now capable of parsing capitalized HTML void elements (e.g. <BR>). Previously it could only parse lower-cased void elements. Thanks to Tero Tasanen for fixing this. Example:

Oga.parse_html('<BR>') # => Document(children: NodeSet(Element(name: "BR")))

Node Type Method Removed

The node_type method has been removed and its purpose has been moved into the XML::PullParser class itself. This method was solely used by the pull parser to provide shorthands for node classes. As such it doesn't make sense to expose this as a method to the outside world as a public method.

0.1.1 - 2014-09-13

This release fixes a problem where element attributes were not separated by spaces. Thanks to Jonathan Rochkind for reporting it and Bill Dueber providing an initial patch for this problem.

0.1.0 - 2014-09-12

The first public release of Oga. This release contains support for parsing XML, basic support for parsing HTML, support for querying documents using XPath and more.

4.5 KiB Raw Blame History