core/oga - oga

Commit Graph

Author	SHA1	Message	Date
Yorick Peterse	ed14981044	Ported the CSS parser to ruby-ll	2015-03-21 01:23:00 +01:00
Yorick Peterse	2714dbe419	Use the ? operator in the XPath parser	2015-03-21 01:23:00 +01:00
Yorick Peterse	3b74a55d73	Use the ? operator in the XML parser	2015-03-21 01:23:00 +01:00
Yorick Peterse	2bbb7d2b10	Use new operators in the XML parser This allows the removal of quite a bit of recursion based code.	2015-03-21 01:23:00 +01:00
Yorick Peterse	02da47c1f0	Replaced some XPath parser recursion with *	2015-03-21 01:23:00 +01:00
Yorick Peterse	3b06780802	Removed Racc based XPath parser	2015-03-21 01:23:00 +01:00
Yorick Peterse	588c225c53	Proper XPath operator parsing precedence	2015-03-21 01:23:00 +01:00
Yorick Peterse	0fa9d4df88	Ported remaining XPath parsing bits to ruby-ll. Currently all operators are left-associative with no particular precedence. This causes a few specs to fail for now. Outside of that the new parser should be able to parse the same input as the Racc based parser.	2015-03-21 01:22:59 +01:00
Yorick Peterse	4ebfc849a4	Start porting the XPath parser to ruby-ll. There are still a few bits left to do such as supporting parenthesis and assigning the correct precedence to the others.	2015-03-21 01:22:59 +01:00
Yorick Peterse	cbdaeb21f4	Unwrap a few lines in the XML parser.	2015-03-21 01:22:59 +01:00
Yorick Peterse	cfc6749556	Use splat instead of Array#unshift for attributes.	2015-03-21 01:22:59 +01:00
Yorick Peterse	d210c9fb57	Compacted a few XML parser rules.	2015-03-21 01:22:59 +01:00
Yorick Peterse	a5cd75cb7e	Removed useless string allocs from the XML parser.	2015-03-21 01:22:59 +01:00
Yorick Peterse	fdcd712ffe	Don't use Array#uniq in NodeSet#initialize. Removing this makes the process of parsing larger XML documents a bit faster. The downside is that NodeSet#initialize will no longer filter out duplicate nodes, though this is not something Oga itself relies upon. Methods such as NodeSet#push still do ignore elements already present.	2015-03-21 01:22:59 +01:00
Yorick Peterse	c36b35ac0f	Skip ownership iteration when there's no owner. There's no point in iterating over all the nodes and assigning ownership if there's no owner to begin with.	2015-03-21 01:22:59 +01:00
Yorick Peterse	006ef4d51a	Port over most of the old XML error handling. Some messages are a bit different due to ruby-ll's error handling, other than that it's largely the same stuff as before.	2015-03-21 01:22:59 +01:00
Yorick Peterse	1a326fc516	Remove Racc based XML parser.	2015-03-21 01:22:59 +01:00
Yorick Peterse	d8b9725b82	Fixed SAX parsing of XML attributes. This was utterly broken, mainly due to me overlooking it. There are now 2 new callbacks to handle this properly: * on_attribute: to handle a single attribute/value pair * on_attributes: to handle a collection of attributes (as returned by on_attribute) By default on_attribut returns a Hash, on_attributes in turn merges all attribute hashes into a single one. This ensures that on_element _actually_ receives the attributes as a Hash, instead of an Array with random nil/XML::Attribute values.	2015-03-21 01:22:59 +01:00
Yorick Peterse	dd626c10d3	Use Array#unshift in the LL XML grammar. Using Array#+ for large sets (e.g. in the benchmarks) is _really_ slow. Interesting enough Array#unshift uses as much memory as the Racc parser and is about as fast, even though it has to move memory around.	2015-03-21 01:22:59 +01:00
Yorick Peterse	f94407ee9d	Parser callback for XML attributes.	2015-03-21 01:22:59 +01:00
Yorick Peterse	a023b35e78	Fixed the pull parser for the XML LL parser.	2015-03-21 01:22:59 +01:00
Yorick Peterse	5eed0d31d6	Ported over most of the XML parser to ruby-ll. This is still missing the error handling previously present.	2015-03-21 01:22:59 +01:00
Yorick Peterse	15a3ab9ba5	ruby-ll: full support for parsing doctypes.	2015-03-21 01:22:59 +01:00
Yorick Peterse	71aefb53cc	Started porting the XML parser to ruby-ll This is far from done.	2015-03-21 01:22:59 +01:00
Yorick Peterse	2ec91f130f	Lazy decoding of XML/HTML entities. Instead of decoding entities in the lexer we'll do this whenever XML::Text#text is called. This removes the overhead from the parsing phase and ensures the process is only triggered when actually needed. Note that calling #to_xml and/or the #inspect methods on a Text (or parent) instance will also trigger the entity conversion process. The new entity decoding API supports both regular entities (e.g. &) as well as codepoint based entities (both regular and hexadecimal codepoints). To allow safe read-only access to Text instances from multiple threads a mutex is used. This mutex ensures that only 1 thread can trigger the conversion process. Fixes #68	2015-03-05 23:00:43 +01:00
Yorick Peterse	3e05593536	Release 0.2.3	2015-03-04 11:56:23 +01:00
Yorick Peterse	78e40b55c0	Handle parsing of HTML <style> tags. This basically re-applies the technique used for HTML <script> tags. With this extra addition I decided to rename/normalize a few things so it's easier to add any extra tags in the future. One downside of this setup is that the following will not be parsed by Oga: <style> </script> </style> The same applies to script tags containing a literal </style> tag. Since this particular case is rather unlikely to occur I'm OK with not supporting it as it _does_ simplify the lexer quite a bit. Fixes #80	2015-03-03 16:28:05 +01:00
Yorick Peterse	73534375d5	Release 0.2.2	2015-03-03 13:36:32 +01:00
Yorick Peterse	142b467277	Set parent of nodes set using Element#inner_text= This ensures that any text nodes created using Element#inner_text= have their parent node set correctly.	2015-03-03 13:13:05 +01:00
Yorick Peterse	503efc32cd	Release 0.2.1	2015-03-02 22:12:49 +01:00
Yorick Peterse	874d7124af	Don't convert <script> text to XML entities. Fixes #79.	2015-03-02 17:32:19 +01:00
Yorick Peterse	9a586363e9	Added XML::Document#html?	2015-03-02 16:39:40 +01:00
Yorick Peterse	ba2177e2cf	Lex contents of <script> tags as plain text. When lexing input in HTML mode the lexer has to treat _all_ content of a <script> tag as plain text. This ensures that the lexer can process input such as "x <y" and "// <foo>" correctly. Fixes #70.	2015-03-02 16:22:09 +01:00
Yorick Peterse	e138aa15ac	Removed stray comment in the XPath parser.	2014-12-28 23:55:33 +01:00
Yorick Peterse	746c8052dd	Remove all nodes when calling Element#inner_text= This fixes #64.	2014-12-14 23:32:43 +01:00
Dmitry Krasnoukhov	26baf89440	Add missing entities to the decode/encode lists	2014-11-21 01:53:11 +02:00
Yorick Peterse	cbb2815146	Support for inline doctype rules plus newlines. This adds support for lexing/parsing XML documents that use an IO as input _and_ contain doctype rules with newlines in them. This fixes #63.	2014-11-18 20:02:55 +01:00
Yorick Peterse	922cee913d	Release 0.2.0	2014-11-17 23:26:19 +01:00
Yorick Peterse	ad4f650c5d	Fixed XML entity encoding/decoding ordering. Thanks to @krasnoukhov for providing the initial patch, which this commit is largely based on. This fixes #49.	2014-11-17 22:39:43 +01:00
Yorick Peterse	cd86d5d294	Allow removal of element attributes.	2014-11-17 09:00:40 +01:00
Yorick Peterse	804646cc5e	Don't modify raw namespaces. When calling Element#available_namespaces the list of namespaces returned by Element#namespaces must not be modified.	2014-11-17 00:01:16 +01:00
Yorick Peterse	6753d6a26d	Slightly better docs for the XPath/CSS parsers.	2014-11-16 23:40:19 +01:00
Yorick Peterse	57adabc068	Ensure SAX after_element receives meaningful args This changes the behaviour of after_element when parsing documents using the SAX parsing API. Previously it would always receive a nil argument, which is kinda pointless. This commit changes that by making sure it receives a namespace name (if any) and the element name. This fixes #54.	2014-11-16 23:32:32 +01:00
Yorick Peterse	23b408fe4f	Cleaned up CSS parser code for counting siblings.	2014-11-15 18:31:08 +01:00
Yorick Peterse	b464815577	Fixed AST generation for nth-(first\|last)-of-type.	2014-11-15 18:27:15 +01:00
Yorick Peterse	9eead81a7c	Fixed AST for :only-of-type	2014-11-15 18:08:26 +01:00
Yorick Peterse	1c301d40e2	Properly fixed AST for first-of-type/last-of-type This requires keeping track of the current element being processed. This in turn allows the usage of count() + preceding-sibling/following-sibling.	2014-11-15 17:58:56 +01:00
Yorick Peterse	f1d574f342	Evaluate XPath predicates for every context node. Instead of evaluating a predicate once for all context nodes, they should instead be evaluated separately per context node.	2014-11-15 00:31:44 +01:00
Yorick Peterse	6daa3e7a00	Reverted AST changes for first-of-type Functions can't be used in combination with axes, so I'll just need to fix the position() function to work properly.	2014-11-14 23:51:46 +01:00
Yorick Peterse	2d6a2be2e8	Revert "Fixed XPath AST for :last-of-type" Axes can't be used in combination with functions. This reverts commit `b0b572a584`.	2014-11-14 23:49:49 +01:00

1 2 3 4 5 ...

492 Commits