Instead of relying on String#count for counting newlines in text nodes, Oga now
does this in C/Java. String#count isn't exactly the fastest way of counting
characters. Performance was measured using
benchmark/xml/lexer/string_average_bench.rb. Before this patch the results were
as following:
MRI: 0.529s
Rbx: 4.965s
JRuby: 0.622s
After this patch:
MRI: 0.424s
Rbx: 1.942s
JRuby: 0.665s => numbers vary a bit, seem roughly the same as before
The commands used for benchmarking:
$ rake clean # to make sure that C exts aren't shared between MRI/Rbx
$ rake generate
$ rake fixtures
$ ruby benchmark/xml/lexer/string_average_bench.rb
The big difference for Rbx is probably due to the implementation of String#count
not being super fast. Some changes were made
(https://github.com/rubinius/rubinius/pull/3133) to the method, but this hasn't
been released yet.
JRuby seems to perform in a similar way, so either it was already optimizing
things for me or I suck at writing well performing Java code.
This fixes#51.
This ensures we only call String#downcase if we can't find an all lowercased
*and* all uppercased version of the element name. This in turn can save as many
object allocations as there are HTML opening tags.
This fixes#52.
Instead of using `namespace.name` lets just use `namespace_name`. This fixes the
problem of serializing attributes where the namespace prefix is "xmlns" as the
namespace for this isn't registered by default.
This fixes#47.
This API is a little bit dodgy (similar to Nokogiri's API) due to the use of
separate parser and handler classes. This is done to ensure that the return
values of callback methods (e.g. on_element) aren't used by Racc for building
AST trees. This also ensures that whatever variables are set by the handler
don't conflict with any variables of the parser.
This fixes#42.
While still a bit cryptic this is probably as best as we can get it. An example:
Oga.parse_xml("<namefoo:bar=\"10\"")
parser.rb:116:in `on_error': Unexpected string on line 1: (Racc::ParseError)
=> 1: <namefoo:bar="10"
This fixes#43.
When an XML element has no child nodes a self-closing tag is used. When parsing
documents/elements in HTML mode this is only done if the element is a so called
"void element" (e.g. <link> tags).
This fixes#46.
When the default namespace is registered (using xmlns="...") Oga now properly
sets the namespace of the container and all child elements.
This fixes#44.
This was originally reported by @jrochkind and partially patched by @billdueber.
My patches are built upon the latter, but without the need of using Array#map,
Array#join, etc. They also contain a few style changes.
This fixes#32 and #33.