Commit Graph

9 Commits

Author SHA1 Message Date
Yorick Peterse eeeeb0efad Don't track the generated Java lexer. 2014-05-06 10:11:19 +02:00
Yorick Peterse 2689d3f65a Initial setup using a C extension.
While I've tried to keep Oga pure Ruby for as long as possible the performance
of Ragel's Ruby output was not worth the trouble. For example, lexing 10MB of
XML would take 5 to 6 seconds at least. Nokogiri on the other hand can parse
that same XML into a DOM document in about 300 miliseconds. Such a big
performance difference is not acceptable.

To work around this the XML/HTML lexer will be implemented in C for
MRI/Rubinius and Java for JRuby. For now there's only a C extension as I
haven't read up yet on the JRuby API. The end goal is to provide some sort of
Ragel "template" that can be used to generate the corresponding C/Java
extension code. This would remove the need of duplicating the grammar and
associated code.

The native extension setup is a hybrid between native and Ruby. The raw Ragel
stuff happens in C/Java while the actual logic of actions happens in Ruby. This
adds a small amount of overhead but makes it much easier to maintain the lexer.
Even with this extra overhead the performance is much better than pure Ruby.
The 10MB of XML mentioned above is lexed in about 600 miliseconds. In other
words, it's 10 times faster.
2014-05-05 00:31:28 +02:00
Yorick Peterse d5e59c38ac Profiling setup for the DOM parser. 2014-04-29 13:47:55 +02:00
Yorick Peterse 53c45c621b Basic memory profiling setup.
This makes it a bit easier to profile memory usage of certain components and
plot them using Gnuplot. In the past I would write one-off scripts for this and
throw them away, only to figure out I needed them again later on.

Profiling samples are written to profile/samples and can be plotted using
corresponding Gnuplot scripts found in profile/plot. The latter requires
Gnuplot to be installed.
2014-04-29 13:38:56 +02:00
Yorick Peterse 6f1ce17b31 Benchmark for lexer lines/second.
This benchmark uses a fixture file that is automatically downloaded.
2014-04-17 20:06:24 +02:00
Yorick Peterse 4a48647d1e Removed generated lexer/parser.
I am a dumbass.
2014-03-25 21:47:40 +01:00
Yorick Peterse c07edc767b Updated the gitignore entry for the parser. 2014-03-11 22:03:02 +01:00
Yorick Peterse 4f04fa0d30 Untrack Racc generated files.
Yorick, you can stop being bad now.
2014-02-26 22:18:33 +01:00
Yorick Peterse 2dede8725b Added Git ignore rules. 2014-02-26 19:51:08 +01:00