This was originally reported by @jrochkind and partially patched by @billdueber.
My patches are built upon the latter, but without the need of using Array#map,
Array#join, etc. They also contain a few style changes.
This fixes#32 and #33.
The methods XML::Element#add_attribute and XML::Element#set can be used to more
easily add attributes to elements. The first method simply adds an Attribute
instance and links it to the element. This allows for fine grained control over
what data the attribute should contain. The second method ("set") simply sets an
attribute based on a name and value, optionally creating the attribute if it
doesn't already exist.
By using NodeSet#concat we can further reduce the amount of object allocations.
This in turn greatly reduces the time it takes to query large documents using
descendant-or-self.
This ensures that Oga can lex the following properly:
<input value="" />
Previously Ragel would stop upon finding the empty string. This was caused due
to the string rules being declared as following:
string_dquote = (dquote ^dquote+ dquote);
string_squote = (squote ^squote+ squote);
These rules only match strings _with_ content, not without. Since Ragel stops
consuming input the moment it finds unhandled data this resulted in incorrect
tokens being emitted.
Previously this wouldn't display anything due to the IO object being exhausted.
To fix this the input has to be wound back to the start, which means re-reading
it. Sadly I can't think of a way around this that doesn't require buffering
lines while parsing them (which massively increases memory usage).
This ensures the current context node is set correctly when using the "self"
axis inside a path that's inside a predicate, e.g.
foo/bar[baz/. = "something"]
Here the "self" axis should refer to foo/bar/baz, _not_ foo/bar.
Thanks to some heavy rubberducking with @whitequark the lexer is now a little
bit better at lexing T_TEXT nodes. For example, previously the following could
not be lexed properly:
"foo < bar"
There might still be some tweaking to do but we're getting there.