Commit Graph

765 Commits

Author SHA1 Message Date
Yorick Peterse 83d0759998 Release 2.1 2016-02-09 20:17:54 +01:00
Yorick Peterse 5bfc2d50f2 Preserve entities that can't be decoded
Certain entities when decoded will produce a String with an invalid
encoding. This commit ensures that instead of raising an EncodingError
further down the line (e.g. when calling "inspect" on a document) the
entities are preserved as-is.

Fixes #143
2016-02-09 19:51:53 +01:00
Yorick Peterse fd1570870e Release 2.0.0 2015-12-26 20:46:24 +01:00
Yorick Peterse 66fc4b1dfc Fixed parsing HTML identifiers containing colons
HTML identifiers containing colons should be treated in two ways:

* For element names the prefix (= the namespace prefix in case of XML)
  should be ignored as HTML doesn't support/use namespaces.
* For attribute names a colon is a valid character, thus "foo:bar:baz"
  should be treated as a single attribute name.

This fixes #142.
2015-12-26 20:28:35 +01:00
Yorick Peterse 9bb908f8b1 Use #== in Conversion.boolean?
On JRuby 9.0.1.0 this is a bit faster than using "is_a?":

    require 'benchmark/ips'

    input = false

    Benchmark.ips do |bench|
      bench.report 'is_a?' do
        input.is_a?(TrueClass) || input.is_a?(FalseClass)
      end

      bench.report '==' do
        input == true || input == false
      end

      bench.compare!
    end

This outputs:

    Calculating -------------------------------------
                   is_a?    86.129k i/100ms
                      ==   112.837k i/100ms
    -------------------------------------------------
                   is_a?      7.375M (±15.3%) i/s -     35.227M
                      ==     10.428M (±12.0%) i/s -     50.889M

    Comparison:
                      ==: 10427617.5 i/s
                   is_a?:  7374666.2 i/s - 1.41x slower

On both MRI 2.2 and Rubinius 2.5.8 there's little to no difference
between these two methods.
2015-09-23 16:35:09 +02:00
Yorick Peterse 0fd6fd8645 Release 1.3.1 2015-09-07 14:11:00 +02:00
Yorick Peterse bd48dc15cc Evaluate compiled blocks in an isolated Binding
Re-using the Binding of the XPath::Compiler#compile method would lead to
race conditions, and possibly a memory leak due to the Binding sticking
around for compiled Proc's lifetime.

By using a dedicated class (and its corresponding Binding) we can work
around this. Access to this class is not synchronized as compiled Procs
don't mutate their enclosing environment.

The race condition can be demonstrated using code such as the
following:

    xml = <<-EOF
    <people>
      <person>
        <name>Alice</name>
      </person>

      <person>
        <name>Bob</name>
      </person>

      <person>
        <name>Eve</name>
      </person>
    </people>
    EOF

    4.times.map do
      Thread.new do
        10_000.times do
          document = Oga.parse_xml(xml)

          document.at_xpath('people/person/name').text
        end
      end
    end.each(&:join)

Running this code would result in NoMethodErrors due to "at_xpath"
returning a NilClass opposed to an Oga::XML::Element.
2015-09-07 14:02:31 +02:00
Yorick Peterse 4c79468091 Release 1.3.0 2015-09-06 19:20:45 +02:00
Yorick Peterse f753f08f18 Revamp CSS parser for better axis support
This makes it possible to parse expressions such as "foo>bar", "> .bar",
"> foo.bar", and similar expressions.

This fixes #126 and fixes #131.
2015-09-04 16:06:20 +02:00
Yorick Peterse c713f6250f Lexer/parser specs for CSS axes without whitespace 2015-09-04 15:13:38 +02:00
Yorick Peterse 37c5b819fa Unicode support for CSS/XPath
Fixes #140
2015-09-03 11:21:45 +02:00
Yorick Peterse 44630c27ff Support escaping dots in CSS identifiers
Escaping hash characters and whitespace is _not_ supported as neither
are valid element/attribute names (e.g. <foo#bar /> is invalid
XML/HTML).

Escaping single/double quotes also won't be supported for the time
being. It's quite a pain to get this to work right in not just CSS but
also XPath and XML/HTML, for very little gain. Should there be enough
users with an actual use case (other than "But the spec says ...!") I'll
look into this again.

Fixes #124
2015-09-02 20:18:52 +02:00
Yorick Peterse aef7c510c2 Basic support for the CSS :not pseudo class
This does _not_ support element states such as DISABLED, nor does it
support the special handling of namespaces (e.g. *|*:not(*)). Instead
this selector basically acts as a negation, some examples:

    :not(foo)  # All but any "foo" nodes
    :not(#foo) # Skips nodes with id="foo"
    :not(.foo) # Skips nodes with a class "foo"

Fixes #125
2015-09-01 22:05:46 +02:00
Yorick Peterse b7b38255d3 Fixed YARD formatting 2015-09-01 20:03:56 +02:00
Yorick Peterse 94f8ed5421 Removed start/end comments of YARD blocks 2015-09-01 19:59:52 +02:00
Yorick Peterse 929a521641 Added better docs/examples to XML::Querying 2015-09-01 10:12:17 +02:00
Yorick Peterse 604d0d9337 Case insensitive matching of nodes
This re-applies the patch added in #134 to the new XPath compiler.

Fixes #135.
2015-08-30 18:30:04 +02:00
Yorick Peterse 67ada1168e Fix starts-with() for JRuby 1.7
''.start_with?('') returns false on JRuby 1.7. While I'd love to drop
support for shit like this, JRuby 1.7 is still in common use today, so
lets just work around this for now.
2015-08-30 02:10:49 +02:00
Yorick Peterse bf0ca7c907 Alias Ruby::Node#to_ary to #to_a
JRuby 1.7 uses to_ary opposed to to_a.
2015-08-30 02:06:10 +02:00
Yorick Peterse 435115c454 Removed various unused variables 2015-08-30 01:46:52 +02:00
Yorick Peterse 1b62dd3256 Revamped compiler type test specs 2015-08-30 01:45:51 +02:00
Yorick Peterse 001c57e0ad Tag XPath::Conversion's API as private 2015-08-30 01:26:40 +02:00
Yorick Peterse 31a574e7f8 Removed the XPath::Evaluator class 2015-08-30 01:26:03 +02:00
Yorick Peterse e4919d7c31 Use XPath::Compiler in XML::Querying 2015-08-30 01:22:33 +02:00
Yorick Peterse 5a736aa25c Removed Compiler#node_literal 2015-08-28 17:00:21 +02:00
Yorick Peterse 4ad4b89860 Revamp compiler specs for "self"
This also includes a fix for node() so that it matches attributes.
2015-08-28 16:57:24 +02:00
Yorick Peterse e8377b360a Revamp compiler "preceding" specs
This also includes some fixes to make this axis behave correctly when
evaluate relative to a document.
2015-08-28 16:49:59 +02:00
Yorick Peterse 6b2874c507 Revamped compiler "preceding-sibling" specs 2015-08-28 16:30:26 +02:00
Yorick Peterse 84a9315b24 Revamped compiler specs for "parent" 2015-08-28 16:22:49 +02:00
Yorick Peterse 07658dadb1 Added Attribute#parent 2015-08-28 16:22:42 +02:00
Yorick Peterse a1e7d2d07f Revamp compiler "following" specs 2015-08-28 15:53:57 +02:00
Yorick Peterse 824c897467 Revamp compiler specs for following-sibling 2015-08-28 15:48:03 +02:00
Yorick Peterse aa3fbcf522 Revamp descendant compiler specs 2015-08-28 15:29:09 +02:00
Yorick Peterse 70bea2071c Fixed ancestor-or-self relative to attributes
Per libxml behaviour this axis shouldn't match attributes when using
"ancestor-or-self::*".
2015-08-27 10:49:32 +02:00
Yorick Peterse d5aad9c1c9 Revamp descendant-or-self compiler specs 2015-08-27 10:34:25 +02:00
Yorick Peterse ed31b9f1d3 Revamp compiler specs for the attribute axis 2015-08-26 22:51:04 +02:00
Yorick Peterse 5e3b0a4023 Started updating Compiler for the new XPath AST
This also includes fixes for ancestor and ancestor-or-self so that these
axes can be used relative to documents and attributes.
2015-08-26 22:40:00 +02:00
Yorick Peterse 9899a419b7 Added Attribute#each_ancestor 2015-08-26 22:26:46 +02:00
Yorick Peterse 083d048e63 Remove (path) usage from the CSS parser
This updates the CSS parser to make it compatible with the XPath AST
changes introduced in commit 365a9e9fa9.
This also, finally, means I can get rid of some of the hacks that were
used for "+ foo" selectors and building (path) nodes.
2015-08-26 19:15:12 +02:00
Yorick Peterse 365a9e9fa9 Replace (path) nodes with nested nodes
This changes the XPath AST so that every segment in a path (e.g.
foo/bar) is parsed as a child node of the node that precedes it. For
example, take the following expression:

    foo/bar

This used to be parsed into the following AST:

    (path
      (axis "child" (test nil "foo"))
      (axis "child" (test nil "bar")))

This is now parsed into the following AST:

    (axis "child"
      (test nil "foo")
      (axis "child"
        (test nil "bar")))

This new AST is much easier to deal with in the XPath::Compiler class,
especially when trying to ensure that each segment operates on the
correct input.

This commit also fixes parsing of type tests with predicates, such as:

    comment()[10]

This used to throw a parser error.
2015-08-26 10:16:48 +02:00
Yorick Peterse 866044f94f Removed useless block passes in the XPath compiler 2015-08-22 14:28:20 +02:00
Yorick Peterse c5b30d1eae Refactor XPath compiler support for predicates
Handling of predicates is delegated to 3 different methods:

* on_predicate_direct: for predicates such as foo[bar] and foo[x < y].
* on_predicate_temporary: for predicates that use the last() function
  somewhere.
* on_predicate_index: for predicates that only contain a literal index,
  foo[10] being an example.

This enables the compiler to use more optimized code depending on the
type of predicate while still being able to support last() and
position().

The code is currently still quite rough on the edges, this will be taken
care of in following commits.
2015-08-20 01:01:30 +02:00
Yorick Peterse ee44907ca5 Started removing usage of backup_variable
This was a hack that never should've made it into the code in the first
place. This breaks some of the CSS specs, but this should be taken care
of when I refactor predicate support.
2015-08-19 20:14:24 +02:00
Yorick Peterse 15ecca4a64 Removed extra throw from following-sibling()
I can't quite recall why this throw was present in the Evaluator in the
first place, but removing it doesn't seem to break anything (in the
Evaluator). In fact, removing it actually fixes some of the CSS specs
that were broken when using the Compiler.
2015-08-19 20:14:24 +02:00
Yorick Peterse d72f5eb07f Removed left-over assignment 2015-08-19 20:14:24 +02:00
Yorick Peterse f41f6ff0c8 Only wrap followed_by nodes in begin/end 2015-08-19 20:14:24 +02:00
Yorick Peterse a661bf8c12 Predicate support for all on_call handlers 2015-08-19 20:14:24 +02:00
Yorick Peterse a30cdba8d0 Fixed XPath compiler support for not() 2015-08-19 20:14:24 +02:00
Yorick Peterse 7cbc53c64e XPath compiler support for not() in predicates 2015-08-19 20:14:24 +02:00
Yorick Peterse 0dcee637d3 Added Ruby::Node#if_false 2015-08-19 20:14:24 +02:00
Yorick Peterse 4b50a161ed Wrap all compiler assignments in begin/end
This is much safer than having to explicitly call "wrap" in a potential
large amount of places.
2015-08-19 20:14:24 +02:00
Yorick Peterse b217aab2cb XPath compiler support for translate() 2015-08-19 20:14:24 +02:00
Yorick Peterse 106d83e780 XPath compiler support for sum() 2015-08-19 20:14:24 +02:00
Yorick Peterse e7e5b123cf XPath compiler support for substring() 2015-08-19 20:14:24 +02:00
Yorick Peterse 32dba554d7 Ruby generator support for Ranges 2015-08-19 20:14:23 +02:00
Yorick Peterse ec19875530 XPath compiler support for substring-before() 2015-08-19 20:14:23 +02:00
Yorick Peterse eeab14af4d XPath compiler support for substring-after() 2015-08-19 20:14:23 +02:00
Yorick Peterse d6b3461a9a XPath compiler support for string() 2015-08-19 20:14:23 +02:00
Yorick Peterse 728bd45e48 XPath compiler support for string-length() 2015-08-19 20:14:23 +02:00
Yorick Peterse 0e3451881b Specs for contains/starts-with in a predicate 2015-08-19 20:14:23 +02:00
Yorick Peterse 37a410a012 XPath compiler support for starts-with() 2015-08-19 20:14:23 +02:00
Yorick Peterse 64d9ecfd53 XPath compiler support for number() 2015-08-19 20:14:23 +02:00
Yorick Peterse 58aa8f0833 Boolean support for Conversion.to_float 2015-08-19 20:14:23 +02:00
Yorick Peterse 43dab548e9 XPath compiler support for not() 2015-08-19 20:14:23 +02:00
Yorick Peterse 2585fbd0b7 XPath compiler support for normalize-space() 2015-08-19 20:14:23 +02:00
Yorick Peterse e677a5abdf XPath compiler support for namespace-uri() 2015-08-19 20:14:23 +02:00
Yorick Peterse 9cb589cb34 XPath compiler support for name() 2015-08-19 20:14:23 +02:00
Yorick Peterse d408989499 Added expanded_name for Element and Attribute 2015-08-19 20:14:23 +02:00
Yorick Peterse 25e2f57a8d XPath compiler support for local-name() 2015-08-19 20:14:23 +02:00
Yorick Peterse 49a7c2c782 Fixed incorrect spelling of "predicate" 2015-08-19 20:14:23 +02:00
Yorick Peterse db93f845f3 Renamed on_predicate methods
This ensures they all start with "on_predicate".
2015-08-19 20:14:23 +02:00
Yorick Peterse c026244f6e XPath compiler support for id() in predicates 2015-08-19 20:14:23 +02:00
Yorick Peterse 49196c285f XPath compiler support for lang() 2015-08-19 20:14:23 +02:00
Yorick Peterse eb6cf68140 Tidied up YARD documentation of the XPath compiler 2015-08-19 20:14:23 +02:00
Yorick Peterse 54473d9865 Allow followed_by to take a block
This removes the need for a lot of local variables in the Compiler
class, at the cost of some extra indentation levels.
2015-08-19 20:14:22 +02:00
Yorick Peterse 7bcd462e22 XPath compiler support for id()
This has been largely ported over from the Evaluator implementation.
2015-08-19 20:14:22 +02:00
Yorick Peterse 2ff1f9ab4f XPath compiler support for a bucket of functions
This includes the following functions:

* boolean()
* ceiling()
* floor()
* round()
* concat()
* contains()
* count()
2015-08-19 20:14:22 +02:00
Yorick Peterse e3b45fddfc to_float support for non String values 2015-08-19 20:14:22 +02:00
Yorick Peterse f3f3c7d31c Cleaned up literal usage in the XPath compiler 2015-08-19 20:14:22 +02:00
Yorick Peterse 44bd0751bc Removed commented out code
Something something I should just get a decent debugging mode.
2015-08-19 20:14:22 +02:00
Yorick Peterse e4e777ac4a XPath compiler support for preceding-sibling 2015-08-19 20:14:22 +02:00
Yorick Peterse 7362a783b8 XPath compiler support for the preceding axis 2015-08-19 20:14:22 +02:00
Yorick Peterse 67bc338474 XPath compiler support for the namespace axis 2015-08-19 20:14:22 +02:00
Yorick Peterse 07b52fb48a Added Ruby::Node#not
This is a shortcut for "!foo". Using this method one doesn't have to
worry about how the "!" operator binds. For example, this:

    !foo.or(bar)

would be parsed/evaluated as this:

    !(foo.or(bar))

when instead we want it to be this:

    (!foo).or(bar)

Using explicit parenthesis leads to ugly code, so now we can do this
instead:

    foo.not.or(bar)
2015-08-19 20:14:22 +02:00
Yorick Peterse 4698e98632 XPath compiler support for the following axis 2015-08-19 20:14:22 +02:00
Yorick Peterse e04ca9be35 XPath compiler Support for following-sibling
This is a direct port of the same handler used in the Evaluator class.
The code is a bit rough on the edges but this will be cleaned up in
upcoming commits.
2015-08-19 20:14:22 +02:00
Yorick Peterse 02ee35122a XPath compiler support for the self axis 2015-08-19 20:14:22 +02:00
Yorick Peterse bc49af02a2 XPath compiler support for the parent axis 2015-08-19 20:14:22 +02:00
Yorick Peterse ca7930cbf4 XPath compiler support for "descendant" 2015-08-19 20:14:22 +02:00
Yorick Peterse f8671a96b7 Fixed compiler support for descendant-or-self 2015-08-19 20:14:22 +02:00
Yorick Peterse 23379d6467 Fixed node() type test for the XPath compiler
This ensures the handler behaves the same as in the old XPath evaluator.
2015-08-19 20:14:22 +02:00
Yorick Peterse 8a49d9c0ee Basic compiler support for descendant-or-self
The generated code isn't entirely correct which considering the tests do
pass means the tests need to be fixed too.
2015-08-19 20:14:22 +02:00
Yorick Peterse d50f89cdf1 Use "node" for axis variables
This ensures that any nested code uses the right variables.
2015-08-19 20:14:22 +02:00
Yorick Peterse 045cfe4ab8 XPath compiler support for operators in predicates
Previously the operator methods would ignore any blocks set by the
on_predicate family of methods.
2015-08-19 20:14:22 +02:00
Yorick Peterse aa8386e6f3 Cleaned up compiler handling of axes/calls
Function call handlers don't receive a single AST node, instead they
receive the XPath arguments as separate method arguments.
2015-08-19 20:14:21 +02:00
Yorick Peterse bfc970a95a XPath compiler support for type tests 2015-08-19 20:14:21 +02:00
Yorick Peterse ef59f160c7 XPath compiler support for true()/false() 2015-08-19 20:14:21 +02:00
Yorick Peterse 2eb12eced6 XPath compiler support for all operators
Some specs still fail due to true()/false() not being implemented but
the operators themselves should work just fine.
2015-08-19 20:14:21 +02:00
Yorick Peterse 3a18d23792 to_boolean support for truthy Ruby values 2015-08-19 20:14:21 +02:00
Yorick Peterse 06ae1503d4 nodes/attributes support in to_compatible_types
This extends XPath::Conversion.to_compatible_types so that it can also
take XML::Node and XML::Attribute objects as input.
2015-08-19 20:14:21 +02:00