core/oga - oga

Commit Graph

Author	SHA1	Message	Date
Yorick Peterse	70e4942d3e	CSS parser spec for "+ b"	2015-03-21 01:23:00 +01:00
Yorick Peterse	2714dbe419	Use the ? operator in the XPath parser	2015-03-21 01:23:00 +01:00
Yorick Peterse	3b74a55d73	Use the ? operator in the XML parser	2015-03-21 01:23:00 +01:00
Yorick Peterse	a4be89aca7	Use ruby-ll 2.1 or newer	2015-03-21 01:23:00 +01:00
Yorick Peterse	2bbb7d2b10	Use new operators in the XML parser This allows the removal of quite a bit of recursion based code.	2015-03-21 01:23:00 +01:00
Yorick Peterse	02da47c1f0	Replaced some XPath parser recursion with *	2015-03-21 01:23:00 +01:00
Yorick Peterse	3b06780802	Removed Racc based XPath parser	2015-03-21 01:23:00 +01:00
Yorick Peterse	588c225c53	Proper XPath operator parsing precedence	2015-03-21 01:23:00 +01:00
Yorick Peterse	6039e1dbeb	XPath parsing spec for axes with predicates	2015-03-21 01:23:00 +01:00
Yorick Peterse	7b8c596ccc	Require ruby-ll 2.0 or newer	2015-03-21 01:23:00 +01:00
Yorick Peterse	62fa2a9cc5	Spec for XPath functions inside predicates.	2015-03-21 01:23:00 +01:00
Yorick Peterse	0fa9d4df88	Ported remaining XPath parsing bits to ruby-ll. Currently all operators are left-associative with no particular precedence. This causes a few specs to fail for now. Outside of that the new parser should be able to parse the same input as the Racc based parser.	2015-03-21 01:22:59 +01:00
Yorick Peterse	194d981996	XPath specs for paths with multiple members.	2015-03-21 01:22:59 +01:00
Yorick Peterse	4ebfc849a4	Start porting the XPath parser to ruby-ll. There are still a few bits left to do such as supporting parenthesis and assigning the correct precedence to the others.	2015-03-21 01:22:59 +01:00
Yorick Peterse	cbdaeb21f4	Unwrap a few lines in the XML parser.	2015-03-21 01:22:59 +01:00
Yorick Peterse	cfc6749556	Use splat instead of Array#unshift for attributes.	2015-03-21 01:22:59 +01:00
Yorick Peterse	c528a0d6a7	Build all branches on Travis.	2015-03-21 01:22:59 +01:00
Yorick Peterse	d210c9fb57	Compacted a few XML parser rules.	2015-03-21 01:22:59 +01:00
Yorick Peterse	a5cd75cb7e	Removed useless string allocs from the XML parser.	2015-03-21 01:22:59 +01:00
Yorick Peterse	fdcd712ffe	Don't use Array#uniq in NodeSet#initialize. Removing this makes the process of parsing larger XML documents a bit faster. The downside is that NodeSet#initialize will no longer filter out duplicate nodes, though this is not something Oga itself relies upon. Methods such as NodeSet#push still do ignore elements already present.	2015-03-21 01:22:59 +01:00
Yorick Peterse	c36b35ac0f	Skip ownership iteration when there's no owner. There's no point in iterating over all the nodes and assigning ownership if there's no owner to begin with.	2015-03-21 01:22:59 +01:00
Yorick Peterse	f83c03aaec	Fixed typo in NodeSet spec.	2015-03-21 01:22:59 +01:00
Yorick Peterse	9621fe1fc8	Moved changelog to the root directory.	2015-03-21 01:22:59 +01:00
Yorick Peterse	006ef4d51a	Port over most of the old XML error handling. Some messages are a bit different due to ruby-ll's error handling, other than that it's largely the same stuff as before.	2015-03-21 01:22:59 +01:00
Yorick Peterse	1a326fc516	Remove Racc based XML parser.	2015-03-21 01:22:59 +01:00
Yorick Peterse	1b9a4db268	Depend on ruby-ll 1.1 or newer.	2015-03-21 01:22:59 +01:00
Yorick Peterse	d8b9725b82	Fixed SAX parsing of XML attributes. This was utterly broken, mainly due to me overlooking it. There are now 2 new callbacks to handle this properly: * on_attribute: to handle a single attribute/value pair * on_attributes: to handle a collection of attributes (as returned by on_attribute) By default on_attribut returns a Hash, on_attributes in turn merges all attribute hashes into a single one. This ensures that on_element _actually_ receives the attributes as a Hash, instead of an Array with random nil/XML::Attribute values.	2015-03-21 01:22:59 +01:00
Yorick Peterse	605d565104	Use sax_parse_html for HTML documents. I suspect the only reason this test ever passed due to Racc's error handling. Either way this was using the wrong method.	2015-03-21 01:22:59 +01:00
Yorick Peterse	dd626c10d3	Use Array#unshift in the LL XML grammar. Using Array#+ for large sets (e.g. in the benchmarks) is _really_ slow. Interesting enough Array#unshift uses as much memory as the Racc parser and is about as fast, even though it has to move memory around.	2015-03-21 01:22:59 +01:00
Yorick Peterse	f94407ee9d	Parser callback for XML attributes.	2015-03-21 01:22:59 +01:00
Yorick Peterse	a023b35e78	Fixed the pull parser for the XML LL parser.	2015-03-21 01:22:59 +01:00
Yorick Peterse	5eed0d31d6	Ported over most of the XML parser to ruby-ll. This is still missing the error handling previously present.	2015-03-21 01:22:59 +01:00
Yorick Peterse	15a3ab9ba5	ruby-ll: full support for parsing doctypes.	2015-03-21 01:22:59 +01:00
Yorick Peterse	71aefb53cc	Started porting the XML parser to ruby-ll This is far from done.	2015-03-21 01:22:59 +01:00
Yorick Peterse	2f67399784	Use 72 characters for Git instead of 80. This follows the Linux/universal Git guidelines more closely.	2015-03-16 14:58:20 +01:00
Yorick Peterse	2ec91f130f	Lazy decoding of XML/HTML entities. Instead of decoding entities in the lexer we'll do this whenever XML::Text#text is called. This removes the overhead from the parsing phase and ensures the process is only triggered when actually needed. Note that calling #to_xml and/or the #inspect methods on a Text (or parent) instance will also trigger the entity conversion process. The new entity decoding API supports both regular entities (e.g. &) as well as codepoint based entities (both regular and hexadecimal codepoints). To allow safe read-only access to Text instances from multiple threads a mutex is used. This mutex ensures that only 1 thread can trigger the conversion process. Fixes #68	2015-03-05 23:00:43 +01:00
Yorick Peterse	7409257702	Replaced HTML benchmark fixtures. The new fixture is the HTML of a person article which contains a few HTML entities.	2015-03-05 22:58:22 +01:00
Yorick Peterse	7e847a0ae9	Make C90 happy.	2015-03-05 22:57:51 +01:00
Yorick Peterse	33c46a1841	Use ID instead of VALUE for callback names in C.	2015-03-05 22:57:51 +01:00
Yorick Peterse	3e05593536	Release 0.2.3	2015-03-04 11:56:23 +01:00
Yorick Peterse	aa42cc9ce7	Updated changelog for 0.2.3	2015-03-04 11:49:24 +01:00
Yorick Peterse	3b2055a30b	Refactored handling of literal HTML elements. This ensures newlines can appear in <style> / <script> tags when using IOs as input.	2015-03-04 11:44:31 +01:00
Yorick Peterse	78e40b55c0	Handle parsing of HTML <style> tags. This basically re-applies the technique used for HTML <script> tags. With this extra addition I decided to rename/normalize a few things so it's easier to add any extra tags in the future. One downside of this setup is that the following will not be parsed by Oga: <style> </script> </style> The same applies to script tags containing a literal </style> tag. Since this particular case is rather unlikely to occur I'm OK with not supporting it as it _does_ simplify the lexer quite a bit. Fixes #80	2015-03-03 16:28:05 +01:00
Yorick Peterse	73534375d5	Release 0.2.2	2015-03-03 13:36:32 +01:00
Yorick Peterse	142b467277	Set parent of nodes set using Element#inner_text= This ensures that any text nodes created using Element#inner_text= have their parent node set correctly.	2015-03-03 13:13:05 +01:00
Yorick Peterse	503efc32cd	Release 0.2.1	2015-03-02 22:12:49 +01:00
Yorick Peterse	bc74d31bb5	Updated changelog for 0.2.1.	2015-03-02 17:44:08 +01:00
Yorick Peterse	874d7124af	Don't convert <script> text to XML entities. Fixes #79.	2015-03-02 17:32:19 +01:00
Yorick Peterse	9a586363e9	Added XML::Document#html?	2015-03-02 16:39:40 +01:00
Yorick Peterse	ba2177e2cf	Lex contents of <script> tags as plain text. When lexing input in HTML mode the lexer has to treat _all_ content of a <script> tag as plain text. This ensures that the lexer can process input such as "x <y" and "// <foo>" correctly. Fixes #70.	2015-03-02 16:22:09 +01:00

1 2 3 4 5 ...

881 Commits All Branches Search

881 Commits

All Branches