Commit Graph

27 Commits

Author SHA1 Message Date
Yorick Peterse c12afdc477 Rake task for generating docs. 2014-07-22 21:29:16 +02:00
Yorick Peterse 713d8a092b Rake task for displaying TODO/FIXME notes. 2014-07-22 16:38:05 +02:00
Yorick Peterse e11b9ed32c Basic XPath parser setup. 2014-06-01 23:02:28 +02:00
Yorick Peterse e0b07332d9 Boilerplate for the XPath lexer. 2014-05-29 19:25:49 +02:00
Yorick Peterse ba3d96c819 Re-build lexers when base_lexer.rl changes.
Thanks to @avdi for bringing up on how to do this when using rule() blocks.
2014-05-10 00:28:23 +02:00
Yorick Peterse 90fabe3f21 Compile when running `rake generate`. 2014-05-07 20:07:31 +02:00
Yorick Peterse 3c621bf22e Removed the manifest file + task.
Using a Dir.glob() is much easier when dealing with a bunch of generated files.
2014-05-07 11:11:29 +02:00
Yorick Peterse f25f8a3d15 Break up the Ragel C grammar.
The grammar is now broken up in to a base lexer and a C lexer. This allows the
same grammar to also be used in the Java code.
2014-05-07 00:50:34 +02:00
Yorick Peterse 9abc5c1c92 Separated the Java and C ext codebases. 2014-05-07 00:29:10 +02:00
Yorick Peterse aea8378fbb Removed Cliver from the parser task. 2014-05-06 15:25:57 +02:00
Yorick Peterse 64c9e18651 Setup for Java and Ragel. 2014-05-06 10:24:07 +02:00
Yorick Peterse 2652bc0103 Removed Cliver as a dependency.
Since I'm not using any Ragel version specific features it's not really needed
to check for the version.
2014-05-06 10:18:52 +02:00
Yorick Peterse b9cb7c2d7c Corrected various extension paths. 2014-05-06 08:47:02 +02:00
Yorick Peterse c30d3a7627 Half-assed JRuby boilerplate.
Blowing my brains out over getting this fat pig to do what I want but we're
getting there.
2014-05-06 00:23:07 +02:00
Yorick Peterse 2689d3f65a Initial setup using a C extension.
While I've tried to keep Oga pure Ruby for as long as possible the performance
of Ragel's Ruby output was not worth the trouble. For example, lexing 10MB of
XML would take 5 to 6 seconds at least. Nokogiri on the other hand can parse
that same XML into a DOM document in about 300 miliseconds. Such a big
performance difference is not acceptable.

To work around this the XML/HTML lexer will be implemented in C for
MRI/Rubinius and Java for JRuby. For now there's only a C extension as I
haven't read up yet on the JRuby API. The end goal is to provide some sort of
Ragel "template" that can be used to generate the corresponding C/Java
extension code. This would remove the need of duplicating the grammar and
associated code.

The native extension setup is a hybrid between native and Ruby. The raw Ragel
stuff happens in C/Java while the actual logic of actions happens in Ruby. This
adds a small amount of overhead but makes it much easier to maintain the lexer.
Even with this extra overhead the performance is much better than pure Ruby.
The 10MB of XML mentioned above is lexed in about 600 miliseconds. In other
words, it's 10 times faster.
2014-05-05 00:31:28 +02:00
Yorick Peterse 57255012b7 Patch the Ragel lexer after generating it.
This further increases throughput of the lexer. On MRI this seems to save
around one second or so. It now sits at ~6,8 seconds in the big XML benchmark.

On JRuby, combined with some JIT options and invoke dynamic enabled, this can
reduce the average lexing time to around 3,5 seconds.  Rubinius, also with a
few aggressive JIT options, seems to stick around 9 seocnds.
2014-05-02 00:40:10 +02:00
Yorick Peterse 503b254216 Generate files before generting the manifest. 2014-04-29 20:41:02 +02:00
Yorick Peterse 59dae873e4 Don't rely on Git for generating the MANIFEST.
When using Git the resulting Gem will contain far too many useless files. For
example, the profile/ and spec/ directories are not needed when building Gems.
2014-04-29 20:39:20 +02:00
Yorick Peterse c8c9da2922 Track the XML fixture in Git.
To make running benchmarks easier we'll track the XML file in Git in its
compressed form. I also decreased the size of the XML file from ~50 MB to
~10MB.
2014-04-19 01:03:14 +02:00
Yorick Peterse 97d8450cba Removed the `regenerate` task. 2014-04-19 00:59:09 +02:00
Yorick Peterse 6f1ce17b31 Benchmark for lexer lines/second.
This benchmark uses a fixture file that is automatically downloaded.
2014-04-17 20:06:24 +02:00
Yorick Peterse c366a96ce8 Rake task for generating code coverage. 2014-03-28 16:33:47 +01:00
Yorick Peterse 7c03de0e2f Renamed HTML_PARSER to PARSER_OUTPUT.
This keeps it consistent with the lexer.
2014-03-25 09:35:48 +01:00
Yorick Peterse 422832fd68 Lowered the required Ragel version to 6.7. 2014-03-18 00:12:21 +01:00
Yorick Peterse e764ba640a Basic parser setup without tests.
Who needs tests anyway!
2014-02-26 22:17:47 +01:00
Yorick Peterse d32888f803 Basic lexer setup/tests.
Too lazy to do this the right way. ᕕ(ᐛ)ᕗ
2014-02-26 21:36:30 +01:00
Yorick Peterse 702477ca28 Basic project layout. 2014-02-26 19:50:16 +01:00