This also comes with some small cleanups regarding
XPath::Evaluator#node_matches?. This change removes the need to, every time,
also use can_match_node?() to prevent NoMethodError errors from popping up.
This still uses a stack but at least no longer relies on the call stack. I
decided not to go with the Morris in-order algorithm [1] as it modifies the tree
during a search. This would not work well if a document were to be accessed from
multiple threads at once (which should be possible for read-only operations).
I might change this method to actually perform a search (opposed to just
returning everything). This will require some closer inspection of the
available XPath axes to determine if this is needed.
Tests will also be added once I've taken care of the above.
[1]: http://en.wikipedia.org/wiki/Tree_traversal#Morris_in-order_traversal_using_threading
The previous commit was nonsense as I didn't understand XPath's "following" axis
properly. This commit introduces proper tests and a note for future me so that I
can implement it properly.
The evaluation of axes has been fixed by changing the initial context as well as
the behaviour of some of the handler methods.
The initial context has been changed so that it's simply a NodeSet of whatever
the root object is, either a Document or an Element instance. Previously this
would be set to the child nodes of a Document in case the root object was a
Document. This in turn would prevent "child" axes from operating correctly.
When parsing a bare node test such as "A" this is now parsed as following:
(axis "child" (test nil "A"))
Instead of this:
(test nil "A")
According to the XPath specification both are identical and this simplifies some
of the code in the XPath evaluator.
Upon further investigation this change turned out to be useless. Nokogiri/libxml
does not allow the use of long axes without tests, instead it ends up
lexing/parsing such a value as a simple node test.
This reverts commit f699b0d097.
An axes such as "." is the same as "self::node()". To simplify things on
parser/evaluator level we'll emit the corresponding tokens for a "node()"
function call for these axes.
Instead of using a raw Hash Oga now uses the XML::Attribute class for storing
information about element attributes.
Attributes are stored as an Array of XML::Attribute instances. This allows the
attributes to be more easily modified. If they were stored as a Hash you'd not
only have to update the attributes themselves but also the Hash that contains
them.
While using an Array has a slight runtime cost in most cases the amount of
attributes is small enough that this doesn't really pose a problem. If webscale
performance is desired at some point in the future Oga could most likely cache
the lookup of an attribute. This however is something for the future.
Instead of keeping track of an internal state in @stack and @context the various
processing methods now take the context as an extra argument and return the
nodes they produced. This makes it easier to recursively call certain methods, a
requirement for processing XPath axes (e.g. the "ancestor" axis).
The AST has been simplified by adjusting the way (path) nodes are nested.
Operators now also use `paths` instead of `expression` to allow for expressions
such as `/A or /B`. Sadly this introduces quite a bunch of conflicts in the
parser but we'll deal with those later if needed.
Come to think of it it might actually be easier to implement the evaluator as
an actual VM. That is, instead of directly running on the AST it runs on some
flavour of bytecode. Alternatively it runs directly on the AST but behaves more
like a (stack based) VM. This would most likely be easier than passing a cursor
to every node processing method.