By using NodeSet#concat we can further reduce the amount of object allocations.
This in turn greatly reduces the time it takes to query large documents using
descendant-or-self.
Previously this wouldn't display anything due to the IO object being exhausted.
To fix this the input has to be wound back to the start, which means re-reading
it. Sadly I can't think of a way around this that doesn't require buffering
lines while parsing them (which massively increases memory usage).
This ensures the current context node is set correctly when using the "self"
axis inside a path that's inside a predicate, e.g.
foo/bar[baz/. = "something"]
Here the "self" axis should refer to foo/bar/baz, _not_ foo/bar.
The XPath number() function should also be capable of converting booleans to
numbers, something it previously was not able to do. In order to do this
reliably we can't rely on the string() function as this would make it impossible
to distinguish between literal string values and booleans. This is due to
true(), which returns a TrueClass, being converted to the string "true". This
string in turn can't be converted to a float.
When an attribute is prefixed with "xml" the default namespace should be used
automatically. This namespace is not registered on element level by default as
this namespace isn't registered manually, instead it's a "magic" namespace. This
also ensures we match the behaviour of libxml more closely, hopefully reducing
confusion.
When calling the string() XPath function floats with zero decimals (10.0, 5.0,
etc) should result in a string without any decimals. Ruby converts 10.0 to
"10.0" whereas XPath expects "10".
The particular case of string(10) having to return "10" instead of "10.0" will
be handled separately. Returning integers breaks behaviour/expectations
elsewhere.
This reverts commit 431a253000.
This allows filtering of nodes by indexes (e.g. using last() or a literal
number) and uses the correct index range (1 to N in XPath). The function
position() is not yet supported as this requires access to the current node,
which isn't passed down the call stack just yet.
Namespaces aren't scoped per document but instead per element, thus this method
doesn't make that much sense. This also fixes the remaining, failing XPath test.
The method XPath::Evaluator#node_matches? now has a special case to handle
"type-test" nodes. This in turn fixes a bunch of failing tests such as those for
the XPath query "parent::node()".
Unlike what I thought before syntax such as "node()" is not a function call.
Instead this is a special node test that tests the *types* of nodes, not their
names.
This separates namespace handling into namespace names and namespace objects.
The namespace objects are retrieved from the element an attribute belongs to.
Once retrieved the namespace is cached, due to the overhead of retrieving
namespaces in large documents.
The old code used for generating Object#inspect values has been ripped out (for
the most part). The result is a non indented but far more compact #inspect
output. The code for this is also easier and doesn't break the signature of
Object#inspect.
Oga won't be handling URIs any time soon. The rationale is that they server zero
purpose when it comes to just parsing XML. Another goal of Oga is to make it
easy to modify and reserialize documents back to XML. If namespaces would also
store the URIs this would make this process more difficult.
This also comes with some small cleanups regarding
XPath::Evaluator#node_matches?. This change removes the need to, every time,
also use can_match_node?() to prevent NoMethodError errors from popping up.
This still uses a stack but at least no longer relies on the call stack. I
decided not to go with the Morris in-order algorithm [1] as it modifies the tree
during a search. This would not work well if a document were to be accessed from
multiple threads at once (which should be possible for read-only operations).
I might change this method to actually perform a search (opposed to just
returning everything). This will require some closer inspection of the
available XPath axes to determine if this is needed.
Tests will also be added once I've taken care of the above.
[1]: http://en.wikipedia.org/wiki/Tree_traversal#Morris_in-order_traversal_using_threading
The previous commit was nonsense as I didn't understand XPath's "following" axis
properly. This commit introduces proper tests and a note for future me so that I
can implement it properly.
The evaluation of axes has been fixed by changing the initial context as well as
the behaviour of some of the handler methods.
The initial context has been changed so that it's simply a NodeSet of whatever
the root object is, either a Document or an Element instance. Previously this
would be set to the child nodes of a Document in case the root object was a
Document. This in turn would prevent "child" axes from operating correctly.
When parsing a bare node test such as "A" this is now parsed as following:
(axis "child" (test nil "A"))
Instead of this:
(test nil "A")
According to the XPath specification both are identical and this simplifies some
of the code in the XPath evaluator.
Upon further investigation this change turned out to be useless. Nokogiri/libxml
does not allow the use of long axes without tests, instead it ends up
lexing/parsing such a value as a simple node test.
This reverts commit f699b0d097.
An axes such as "." is the same as "self::node()". To simplify things on
parser/evaluator level we'll emit the corresponding tokens for a "node()"
function call for these axes.
Instead of using a raw Hash Oga now uses the XML::Attribute class for storing
information about element attributes.
Attributes are stored as an Array of XML::Attribute instances. This allows the
attributes to be more easily modified. If they were stored as a Hash you'd not
only have to update the attributes themselves but also the Hash that contains
them.
While using an Array has a slight runtime cost in most cases the amount of
attributes is small enough that this doesn't really pose a problem. If webscale
performance is desired at some point in the future Oga could most likely cache
the lookup of an attribute. This however is something for the future.