core/oga - oga

Commit Graph

Author	SHA1	Message	Date
Yorick Peterse	96a7a40fdc	Disable code coverage for JRuby specific code.	2014-06-23 09:42:14 +02:00
Yorick Peterse	e11b9ed32c	Basic XPath parser setup.	2014-06-01 23:02:28 +02:00
Yorick Peterse	e0b07332d9	Boilerplate for the XPath lexer.	2014-05-29 19:25:49 +02:00
Yorick Peterse	fe74d60138	Manually bootstrap JRuby after all. After discussing this with @headius I've decided to do this the manual way anyway. Apparently the basic load service stuff is deprecated and not very reliable.	2014-05-07 22:32:34 +02:00
Yorick Peterse	2053018d07	Slap JRuby so that it can load the .jar file.	2014-05-06 20:45:26 +02:00
Yorick Peterse	6e685378e0	Setup Ragel for JRuby and load things the hard way	2014-05-06 19:06:04 +02:00
Yorick Peterse	2689d3f65a	Initial setup using a C extension. While I've tried to keep Oga pure Ruby for as long as possible the performance of Ragel's Ruby output was not worth the trouble. For example, lexing 10MB of XML would take 5 to 6 seconds at least. Nokogiri on the other hand can parse that same XML into a DOM document in about 300 miliseconds. Such a big performance difference is not acceptable. To work around this the XML/HTML lexer will be implemented in C for MRI/Rubinius and Java for JRuby. For now there's only a C extension as I haven't read up yet on the JRuby API. The end goal is to provide some sort of Ragel "template" that can be used to generate the corresponding C/Java extension code. This would remove the need of duplicating the grammar and associated code. The native extension setup is a hybrid between native and Ruby. The raw Ragel stuff happens in C/Java while the actual logic of actions happens in Ruby. This adds a small amount of overhead but makes it much easier to maintain the lexer. Even with this extra overhead the performance is much better than pure Ruby. The 10MB of XML mentioned above is lexed in about 600 miliseconds. In other words, it's 10 times faster.	2014-05-05 00:31:28 +02:00
Yorick Peterse	030a0068bd	Basic pull parsing setup. This parser extends the regular DOM parser but instead delegates certain nodes to a block instead of building a DOM tree. The API is a bit raw in its current form but I'll extend it and make it a bit more user friendly in the following commits. In particular I want to make it easier to figure out if a certain node is nested inside another node.	2014-04-28 17:22:17 +02:00
Yorick Peterse	ecf6851711	Revert "Move linking of child nodes to a dedicated mixin." This doesn't actually make things any easier. It also introduces a weirdly named mixin. This reverts commit `0968465f0c`.	2014-04-24 21:16:31 +02:00
Yorick Peterse	0968465f0c	Move linking of child nodes to a dedicated mixin.	2014-04-24 09:43:50 +02:00
Yorick Peterse	08d412da7e	First shot at removing the AST layer. The AST layer is being removed because it doesn't really serve a useful purpose. In particular when creating a streaming parser the AST nodes would only introduce extra overhead. As a result of this the parser will instead emit a DOM tree directly instead of first emitting an AST.	2014-04-21 23:05:39 +02:00
Yorick Peterse	25edd2de00	Use a Set for storing void element names.	2014-04-10 12:28:47 +02:00
Yorick Peterse	c077988dd6	Tree building of doctypes.	2014-04-03 22:44:00 +02:00
Yorick Peterse	bdb76cefc5	Dedicated handling of XML declaration nodes.	2014-04-02 22:30:45 +02:00
Yorick Peterse	6d866523b8	Renamed XML::Builder to XML::TreeBuilder.	2014-03-28 16:37:37 +01:00
Yorick Peterse	e141c084f9	Dedicated DOM builder class for CDATA tags.	2014-03-28 09:27:53 +01:00
Yorick Peterse	2b250bbf42	Rough DOM building setup.	2014-03-28 08:59:48 +01:00
Yorick Peterse	6ae52c1b12	Initial rough sketches for the DOM API.	2014-03-26 18:12:00 +01:00
Yorick Peterse	79818eb349	Added a convenience class for parsing HTML. This removes the need for users having to set the `:html` option themselves.	2014-03-25 09:40:24 +01:00
Yorick Peterse	eae13d21ed	Namespaced the lexer/parser under Oga::XML. With the upcoming XPath and CSS selector lexers/parsers it will be confusing to keep these in the root namespace.	2014-03-25 09:34:38 +01:00
Yorick Peterse	8ce76be050	Moved the parser class to Oga::Parser. Oga will use the same parser for XML and HTML so it doesn't make sense to separate the two into different namespaces (at least for now).	2014-03-11 22:01:50 +01:00
Yorick Peterse	e764ba640a	Basic parser setup without tests. Who needs tests anyway!	2014-02-26 22:17:47 +01:00
Yorick Peterse	5755c325bd	Imported a half-assed lexer.	2014-02-26 19:54:11 +01:00
Yorick Peterse	702477ca28	Basic project layout.	2014-02-26 19:50:16 +01:00

24 Commits