When tokenising CSS expressions we now strip leading and trailing
whitespace from the input string. This is performed without any checks
as a check + `String#strip` ended up being slower compared to just
running `String#strip`. On top of that we cache expressions anyway, so
the overhead of `String#strip` is very small.
Fixes https://gitlab.com/yorickpeterse/oga/issues/187
This ensures that Oga is able to tokenize input such as the following:
<script<script>foo</script>
Oga will now treat this as:
<script>foo</script>
This is based on libxml behaviour, which seems to differ a bit from
Chromium which treats the node as a text node. This however would
require complex look-ahead logic (as far as I can tell) that I really
don't want to implement in Oga.
Fixes#186
As the community progressively moves to a useful practice of enabling
ruby warnings on tests, knowingly redefining a method produces a
distracting warning that has to be special-cased when running automated
tests. We thus skip dynamic definitions of methods we know will be
redefined right after.
As the community progressively moves to a useful practice of enabling
ruby warnings on tests, assigning an instance variable before use becomes
a necessary practice. Here we set some variables at initialization that
were previously lazily or conditionally set:
- `decoded` is assigned false which seems to make more semantic sense
than than using nil
- `namespace` is assigned nil, its value being lazily computed later
- `available_namespaces` is assigned nil so as to respect the cache
invalidation mechanism
In Ruby 2.4, Fixnum is deprecated and the following produces a warning:
$ irb
irb(main):001:0> puts RUBY_VERSION
2.4.0
=> nil
irb(main):002:0> require 'oga'
=> true
irb(main):003:0> Oga.parse_xml('<people><person>Alice</person></people>').css(':nth-child(1)')
/Users/pcheah/.rbenv/versions/2.4.0/lib/ruby/gems/2.4.0/gems/oga-2.7/lib/oga/xpath/conversion.rb:79: warning: constant ::Fixnum is deprecated
=> NodeSet(Element(name: "people" children: NodeSet(Element(name: "person" children: NodeSet(Text("Alice"))))), Element(name: "person" children: NodeSet(Text("Alice"))))
irb(main):004:0>
$
So oga/xpath/conversion.rb needs to test for Integer instead of Fixnum.
Also, it appears that json 1.8.3 no longer builds under Ruby 2.4, so the gemfile has to be upgraded to json 2.0.
This commit fixes two problems:
1. Doctypes introducing too many newlines
2. Elements with siblings and a common parent not being closed properly
== Doctypes
When generating the XML for a doctype the XML::Generator class would
append a trailing newline. This however meant that if the next text node
was also a newline you'd now have two newlines. In previous versions of
Oga this worked because the old XML generation code would call
String#strip on the XML to add after the doctype.
To support this in the new version we perform a lookahead in
XML::Generator#on_doctype to remove any trailing newlines added by this
method in case the first child node is a newline text node.
== Closing Elements
When an element has a sibling following it _and_ does not have any child
nodes it would not be closed properly when generating XML. This is due
to the "until next_node = ..." expression evaluating to true, thus never
executing its body.
There's probably some way to work around this by using the "loop"
method, but considering it's 02:09 I think the current approach is good
enough. Future me will probably hate me for it.
This allows Oga to parse documents that contain an XML declaration at a
place other than at the document root. Oga still only assigns the XML
declaration to the document whenever it is at the top-level. This
matches libxml/XML specification behaviour as far as I can tell.