Commit Graph

165 Commits

Author SHA1 Message Date
Yorick Peterse 5073056831 Added require() for StringIO.
This is needed since the lexer checks if the input is a StringIO instance.
2014-07-01 09:37:52 +02:00
Yorick Peterse 8314d24435 First pass at using the NodeSet class.
The documentation still leaves a lot to be desired and so does the API. There
also appears to be a problem where NodeSet#remove doesn't properly remove all
nodes from a set. Outside of that we're making slow progress towards a proper
DOM API.
2014-06-30 23:03:48 +02:00
Yorick Peterse e71fe3d6fa Indexing of NodeSet instances.
Similar to arrays the NodeSet class now allows one to retrieve nodes for a
given index.
2014-06-29 23:59:27 +02:00
Yorick Peterse 253575dc37 Basic docs for the NodeSet class. 2014-06-29 21:34:18 +02:00
Yorick Peterse d2e74d8a0b Specs for the NodeSet class. 2014-06-26 19:52:03 +02:00
Yorick Peterse eb9d4fbccc Changed NodeSet to behave more like an Array. 2014-06-26 09:37:54 +02:00
Yorick Peterse a98f50b63b NodeSet#push should not take node ownership. 2014-06-26 09:37:30 +02:00
Yorick Peterse 15fa7a2068 Remove explicit index tracking of NodeSet.
Instead the various nodes can use NodeSet#index (aka Array#index) instead. This
has a slight performance overhead on very large (millions) of nodes but should
be fine in most other cases.
2014-06-25 09:41:58 +02:00
Yorick Peterse 884dbd9563 Rough sketch for a NodeSet class. 2014-06-24 19:06:45 +02:00
Yorick Peterse b8cc6b5031 XPath::Parser#parse returns a XPath::Node instance 2014-06-23 20:23:08 +02:00
Yorick Peterse 03be9f241c Docs on Racc operator precedence. 2014-06-23 10:36:42 +02:00
Yorick Peterse 96a7a40fdc Disable code coverage for JRuby specific code. 2014-06-23 09:42:14 +02:00
Yorick Peterse d2f15e37d0 Corrected XPath operator precedence.
The previous commit didn't fully change the operator precedence according to
the XPath 1.0 specification. Also thanks to @whitequark for clearing up a few
things about Racc's operator precedence system.
2014-06-23 00:30:42 +02:00
Yorick Peterse a440d3f003 Fixed XPath operator precedence.
Apparently using multiple `left` rules with T_AND and T_OR being separate
solves this problem. Riiiiight....
2014-06-23 00:15:43 +02:00
Yorick Peterse 6a2f4fa82d Parsing support for more XPath operators.
This still messes up some tests due to botched token precedence (by the looks
of it).
2014-06-22 21:27:53 +02:00
Yorick Peterse 514c342cab Lex wildcards as T_IDENT instead of T_STAR. 2014-06-20 20:37:34 +02:00
Yorick Peterse 45db337c76 Use Array#unshift for multiple xpath call args. 2014-06-17 20:12:25 +02:00
Yorick Peterse b3ffc28cc7 Removed shift/reduce conflict in the xpath parser. 2014-06-17 20:09:44 +02:00
Yorick Peterse 497f57ccd2 Basic parser setup for XPath function calls. 2014-06-17 19:57:17 +02:00
Yorick Peterse 894de7f909 Lex all XPath expressions in a single machine.
This allows literal values such as strings and numbers to be used as function
arguments.
2014-06-17 19:56:57 +02:00
Yorick Peterse bb7af98257 Updated used ASTs for all XPath parser specs. 2014-06-17 18:51:33 +02:00
Yorick Peterse 2298ef618b Reworked handling of relative vs absolute XPaths. 2014-06-16 20:19:39 +02:00
Yorick Peterse eba2d9954d Support for parsing basic XPath expressions. 2014-06-12 00:20:46 +02:00
Yorick Peterse 70f3b7fa92 Lex XPath operators using individual tokens.
Instead of lexing every operator as T_OP they now use individual tokens such as
T_EQ and T_LT.
2014-06-09 23:35:54 +02:00
Yorick Peterse 7244e28eec Corrected docs of the xpath lexer. 2014-06-04 19:33:46 +02:00
Yorick Peterse 1d2f9e6db6 Added T_STAR as an XPath parser token. 2014-06-02 09:27:25 +02:00
Yorick Peterse e11b9ed32c Basic XPath parser setup. 2014-06-01 23:02:28 +02:00
Yorick Peterse 54de2df0c7 Support for lexing XPath wildcard expressions.
To support this we need to require whitespace around the "*" operator. This is
not ideal but it will do for now.
2014-06-01 23:01:24 +02:00
Yorick Peterse 8dd8d7a519 Basic working XPath lexer.
This doesn't lex everything of the XPath specification just yet and needs more
tests.
2014-06-01 19:24:35 +02:00
Yorick Peterse a50b76a2d8 Cleaned up XPath lexer boilerplate a bit. 2014-05-29 19:25:49 +02:00
Yorick Peterse e0b07332d9 Boilerplate for the XPath lexer. 2014-05-29 19:25:49 +02:00
Yorick Peterse be3f8fb494 Removed the on_newline XML lexer callback. 2014-05-29 14:21:48 +02:00
Yorick Peterse ead5c71fee Cleaned up the XML parser grammar.
This resolves all shift/reduce and reduce/reduce conflicts that were previously
present.
2014-05-29 01:37:19 +02:00
Yorick Peterse 49780e2b04 Fix for useless XML parser rules.
Something tells me that using : and | in your syntax might not be the best
decision.
2014-05-28 21:36:06 +02:00
Yorick Peterse 28edc7726f Rewind IO input upon resetting the lexer. 2014-05-26 00:33:20 +02:00
Yorick Peterse 629dcd3fe6 Support for IO inputs in the lexer.
Using IO/StringIO objects one can parse large XML files without first having to
read the entire file into memory. This can potentially save a lot of memory at
the cost of a slightly slower runtime.

For IO like instances the lexer will consume the input line by line. If a
String is given it's consumed as a whole instead. A small side effect of
reading the input line by line is that text such as "foo\nbar" will be lexed as
two tokens instead of one.

Fixes #19.
2014-05-26 00:30:39 +02:00
Yorick Peterse 6b9d65923a Use a method for getting input in the XML lexer.
Instead of directly accessing the `data` instance variable the C/Java code now
uses the method `read_data`. This is part of one of the various steps required
to allow Oga to read data from IO like instances. It also means I can freely
change the name of the instance variable without also having to change the
C/Java code.
2014-05-21 00:27:23 +02:00
Yorick Peterse cd0f3380c4 Merge multiple CDATA tokens into a single token.
The tokens T_CDATA_START, T_TEXT and T_CDATA_END have been merged together into
T_CDATA.
2014-05-19 09:36:19 +02:00
Yorick Peterse a4fb5c1299 Merge multiple comment tokens into a single one.
The tokens T_COMMENT_START, T_TEXT and T_COMMENT_END have been merged into a
single token: T_COMMENT. This simplifies both the lexer and the parser.
2014-05-19 09:30:30 +02:00
Yorick Peterse c891dd88cb Removed useless code from the XML parser. 2014-05-18 23:30:26 +02:00
Yorick Peterse 81a81f0ab0 Don't create Arrays when not needed. 2014-05-16 17:05:42 +02:00
Yorick Peterse fd2f727183 Only set explicit ivars in the lexer. 2014-05-15 19:48:18 +02:00
Yorick Peterse 44bf1dd1ca Split up handling of element names/namespaces.
This is now split up on Ragel level, simplifying the corresponding Ruby code.
2014-05-15 10:22:05 +02:00
Yorick Peterse 723a273e4f Enforce symbols for element attributes.
This comes with a little bit of memory overhead but this should be minor in
most cases.
2014-05-15 01:04:26 +02:00
Yorick Peterse f4b9bbd4ac Removed lazy way of setting instance variables.
This process is quite a bit slower compared to setting instance variables
directly.
2014-05-15 00:43:13 +02:00
Yorick Peterse 19f04f98f7 Support for lexing/parsing inline doctypes. 2014-05-10 00:28:11 +02:00
Yorick Peterse fe74d60138 Manually bootstrap JRuby after all.
After discussing this with @headius I've decided to do this the manual way
anyway. Apparently the basic load service stuff is deprecated and not very
reliable.
2014-05-07 22:32:34 +02:00
Yorick Peterse b8efed5177 Renamed on_start_doctype to on_doctype_start. 2014-05-06 23:18:44 +02:00
Yorick Peterse 2053018d07 Slap JRuby so that it can load the .jar file. 2014-05-06 20:45:26 +02:00
Yorick Peterse 6e685378e0 Setup Ragel for JRuby and load things the hard way 2014-05-06 19:06:04 +02:00