Yorick Peterse
e6d2ba4e0e
Warm up caches in the XPath big XML benchmark
2015-04-12 20:25:21 +02:00
Yorick Peterse
b42f9aaf32
Cache output of Element#available_namespaces
...
This cache is flushed whenever Element#register_namespace is called.
When this cache is flushed it's also recursively flushed for all child
elements. This makes calls to Element#register_namespace a bit more
expensive but in turn calls to Element#available_namespaces will be a
lot faster.
2015-04-12 20:22:33 +02:00
Yorick Peterse
fa838154fc
Flush Element#namespace cache
...
When setting a new namespace name using Element#namespace_name= the
cache used by Element#namespace is flushed properly.
2015-04-11 19:20:50 +02:00
Yorick Peterse
b0359b37e5
Cache Node#html? and Node#root_node
...
The results of these methods is now cached until a Node is moved into
another NodeSet. This reduces the time spent in the
xpath/evaluator/big_xml_average_bench.rb benchmark from roughly 10
seconds to roughly 5 seconds per iteration.
2015-04-11 19:12:26 +02:00
Yorick Peterse
421e6e910b
Release 0.3.1
2015-04-08 14:58:53 +02:00
Yorick Peterse
4bdc8a3fdc
Don't convert entities in script/style elements
...
In HTML the text of a script/style tag should be left untouched, no
entities must be converted. Doing so would break Javascript such as the
following:
foo&&bar;
Such code is often the result of minifiers doing their dirty business.
2015-04-08 14:32:09 +02:00
Yorick Peterse
6a1010c287
Fixed decoding entities in attribute values
...
This was broken by introducing the process of lazy decoding of XML/HTML
entities. The new setup works similar to how XML::Text#text decodes any
entities that may be present.
Fixes #91
2015-04-07 21:18:22 +02:00
Yorick Peterse
ef7f50137a
Added Oga::EntityDecoder
...
This module removes some of the code duplication needed to determine
what entity decoder to use.
2015-04-07 21:18:15 +02:00
Yorick Peterse
3f6aa04e91
Release 0.3.0
2015-04-03 21:11:15 +02:00
Yorick Peterse
0800654c96
Support lexing or carriage returns
...
Fixes #89 .
2015-04-03 00:46:37 +02:00
Yorick Peterse
602e231840
XPath benchmark for querying large XML documents
2015-03-31 22:17:48 +02:00
Yorick Peterse
3176459307
Ignore declared namespaces in HTML documents
...
The HTML spec states that any declared namespaces, including the default
namespace are to be ignored.
This fixes #85
2015-03-26 22:38:39 +01:00
Yorick Peterse
5adeae18d0
XPath queries match nodes in the default namespace
...
When querying an XML document that explicitly defines the default XML
namespace the XPath evaluator now correctly matches all nodes within
that namespace if no namespace prefix is given in the query. Previously
this would always return an empty set.
2015-03-26 01:13:55 +01:00
Yorick Peterse
f175414917
Added XML::Element#default_namespace?
2015-03-26 01:10:20 +01:00
Yorick Peterse
b6fcd326ef
Added XML::Node#html? and XML::Node#xml?
...
The former has been moved over from XML::Text, the latter just inverts
html?.
2015-03-26 01:02:32 +01:00
Yorick Peterse
4ad502958d
Added XML::Attribute#==
...
Overwriting this method makes it easier to check if a given namespace
equals the default XML (and soon HTML) namespace.
2015-03-26 00:53:16 +01:00
Yorick Peterse
f2d69af33b
Distinguish default attribute/element namespaces
...
The previous commit messed this up because I wasn't fully awake.
2015-03-26 00:43:50 +01:00
Yorick Peterse
68ada997a8
Moved default namespace into Oga::XML
...
The default namespace is now located at Oga::XML::DEFAULT_NAMESPACE
instead of Oga::XML::Attribute::DEFAULT_NAMESPACE.
2015-03-26 00:35:28 +01:00
Yorick Peterse
5802d9d62c
Use RbConfig::CONFIG['CC'] vs 'cc'
2015-03-23 19:46:44 +01:00
Yorick Peterse
62488e7291
Download Bundler manually on AppVeyor
2015-03-23 13:34:38 +01:00
Yorick Peterse
3cdcdf6daa
Corrected YARD formatting
2015-03-23 00:31:56 +01:00
Yorick Peterse
66fa9f62ef
Added LRU#maximum=/maximum
...
This allows one to change the maximum amount of keys stored in the
XPath/CSS caches, for example:
Oga::XPath::Parser::CACHE.maximum = 2056
2015-03-23 00:26:48 +01:00
Yorick Peterse
12aa21fb50
Use parse_with_cache when querying xpath/css
2015-03-23 00:23:46 +01:00
Yorick Peterse
2c4e490614
Added CSS/XPath Parser.parse_with_cache
...
This method parses and caches ASTs using Oga::LRU. Currently the default
of 1024 keys is used.
See #71 for more information.
2015-03-23 00:22:59 +01:00
Yorick Peterse
67d7d9af88
Added thread-safe LRU class
...
This class will be used for storing parser XPath/CSS ASTs.
See #71 for more information.
2015-03-23 00:21:52 +01:00
Yorick Peterse
45d84d31da
Renamed rspec helper files
2015-03-22 22:50:03 +01:00
Yorick Peterse
3bb67ddf28
XPath evaluation bench to test parsing times
...
This benchmark is simple enough that the overhead of evaluation is not
far greater than parsing. This makes it suitable for benchmarking the
performance increase of caching XPath ASTs.
2015-03-22 18:31:42 +01:00
Yorick Peterse
ad26446a5f
Removed Ragel Ruby patches
...
These are included in Ragel 6.9.
2015-03-22 15:32:29 +01:00
Yorick Peterse
88f3dfc749
Setup devkit for rake-compiler/Windows
2015-03-22 15:26:59 +01:00
Yorick Peterse
9efe1e7a41
CI section in the contributing guide
2015-03-22 14:04:51 +01:00
Yorick Peterse
ffe24bd991
Added AppVeyor config file
2015-03-22 14:04:04 +01:00
Yorick Peterse
2bf5fe3061
Updated extconf.rb for Windows support
2015-03-22 14:03:37 +01:00
Yorick Peterse
076d82507a
Added require 'bundler' for Bundler 1.9
...
See https://github.com/bundler/bundler/issues/3492 for more info.
2015-03-21 20:24:07 +01:00
Yorick Peterse
3969b5ef51
Expanded XPath/CSS examples in the README
...
Fixes #83
2015-03-21 01:36:10 +01:00
Yorick Peterse
31e93e54f9
Removed Mutex usage from XML::Text
...
Instead of trying to make this class thread-safe I'm going with the
option of simply declaring it unsafe to mutate instances of XML::Text
while reading it in parallel. This removes the need for Mutex
allocations and keeps the code simple.
Fixes #82
2015-03-21 01:27:00 +01:00
Yorick Peterse
c647f064b5
Remove remaining Racc parsing bits
2015-03-21 01:23:00 +01:00
Yorick Peterse
ed14981044
Ported the CSS parser to ruby-ll
2015-03-21 01:23:00 +01:00
Yorick Peterse
70e4942d3e
CSS parser spec for "+ b"
2015-03-21 01:23:00 +01:00
Yorick Peterse
2714dbe419
Use the ? operator in the XPath parser
2015-03-21 01:23:00 +01:00
Yorick Peterse
3b74a55d73
Use the ? operator in the XML parser
2015-03-21 01:23:00 +01:00
Yorick Peterse
a4be89aca7
Use ruby-ll 2.1 or newer
2015-03-21 01:23:00 +01:00
Yorick Peterse
2bbb7d2b10
Use new operators in the XML parser
...
This allows the removal of quite a bit of recursion based code.
2015-03-21 01:23:00 +01:00
Yorick Peterse
02da47c1f0
Replaced some XPath parser recursion with *
2015-03-21 01:23:00 +01:00
Yorick Peterse
3b06780802
Removed Racc based XPath parser
2015-03-21 01:23:00 +01:00
Yorick Peterse
588c225c53
Proper XPath operator parsing precedence
2015-03-21 01:23:00 +01:00
Yorick Peterse
6039e1dbeb
XPath parsing spec for axes with predicates
2015-03-21 01:23:00 +01:00
Yorick Peterse
7b8c596ccc
Require ruby-ll 2.0 or newer
2015-03-21 01:23:00 +01:00
Yorick Peterse
62fa2a9cc5
Spec for XPath functions inside predicates.
2015-03-21 01:23:00 +01:00
Yorick Peterse
0fa9d4df88
Ported remaining XPath parsing bits to ruby-ll.
...
Currently all operators are left-associative with no particular precedence. This
causes a few specs to fail for now. Outside of that the new parser should be
able to parse the same input as the Racc based parser.
2015-03-21 01:22:59 +01:00
Yorick Peterse
194d981996
XPath specs for paths with multiple members.
2015-03-21 01:22:59 +01:00