oga/lib
Yorick Peterse 70516b7447 Yield tokens in the lexer and parser.
After some digging I found out that Racc has a method called `yyparse`. Using
this method (and a custom callback method) you can `yield` tokens as a form of
input. This makes it a lot easier to feed tokens as a stream from the lexer.

Sadly the current performance of the lexer is still total garbage. Most of the
memory usage also comes from using String#unpack, especially on large XML
inputs (e.g. 100 MB of XML). It looks like the resulting memory usage is about
10x the input size.

One option might be some kind of wrapper around String. This wrapper would have
a sliding window of, say, 1024 bytes. When you create it the first 1024 bytes
of the input would be unpacked. When seeking through the input this window
would move forward.

In theory this means that you'd only end up with having only 1024 Fixnum
instances around at any given time instead of "a very big number". I have to
test how efficient this is in practise.
2014-04-17 00:39:41 +02:00
..
oga Yield tokens in the lexer and parser. 2014-04-17 00:39:41 +02:00
oga.rb Use a Set for storing void element names. 2014-04-10 12:28:47 +02:00