Basically this will process the left-hand side first, assign the result
to a variable and then append this set with the nodes from the
right-hand side.
Fixes#149
Certain entities when decoded will produce a String with an invalid
encoding. This commit ensures that instead of raising an EncodingError
further down the line (e.g. when calling "inspect" on a document) the
entities are preserved as-is.
Fixes#143
HTML identifiers containing colons should be treated in two ways:
* For element names the prefix (= the namespace prefix in case of XML)
should be ignored as HTML doesn't support/use namespaces.
* For attribute names a colon is a valid character, thus "foo:bar:baz"
should be treated as a single attribute name.
This fixes#142.
Re-using the Binding of the XPath::Compiler#compile method would lead to
race conditions, and possibly a memory leak due to the Binding sticking
around for compiled Proc's lifetime.
By using a dedicated class (and its corresponding Binding) we can work
around this. Access to this class is not synchronized as compiled Procs
don't mutate their enclosing environment.
The race condition can be demonstrated using code such as the
following:
xml = <<-EOF
<people>
<person>
<name>Alice</name>
</person>
<person>
<name>Bob</name>
</person>
<person>
<name>Eve</name>
</person>
</people>
EOF
4.times.map do
Thread.new do
10_000.times do
document = Oga.parse_xml(xml)
document.at_xpath('people/person/name').text
end
end
end.each(&:join)
Running this code would result in NoMethodErrors due to "at_xpath"
returning a NilClass opposed to an Oga::XML::Element.
Escaping hash characters and whitespace is _not_ supported as neither
are valid element/attribute names (e.g. <foo#bar /> is invalid
XML/HTML).
Escaping single/double quotes also won't be supported for the time
being. It's quite a pain to get this to work right in not just CSS but
also XPath and XML/HTML, for very little gain. Should there be enough
users with an actual use case (other than "But the spec says ...!") I'll
look into this again.
Fixes#124
This does _not_ support element states such as DISABLED, nor does it
support the special handling of namespaces (e.g. *|*:not(*)). Instead
this selector basically acts as a negation, some examples:
:not(foo) # All but any "foo" nodes
:not(#foo) # Skips nodes with id="foo"
:not(.foo) # Skips nodes with a class "foo"
Fixes#125
This changes the XPath AST so that every segment in a path (e.g.
foo/bar) is parsed as a child node of the node that precedes it. For
example, take the following expression:
foo/bar
This used to be parsed into the following AST:
(path
(axis "child" (test nil "foo"))
(axis "child" (test nil "bar")))
This is now parsed into the following AST:
(axis "child"
(test nil "foo")
(axis "child"
(test nil "bar")))
This new AST is much easier to deal with in the XPath::Compiler class,
especially when trying to ensure that each segment operates on the
correct input.
This commit also fixes parsing of type tests with predicates, such as:
comment()[10]
This used to throw a parser error.