Commit Graph

15 Commits

Author SHA1 Message Date
Yorick Peterse 27d877ccce Updated the Gem description. 2014-09-12 14:40:01 +02:00
Yorick Peterse 8601cf6e74 Removed manifest from the Gemspec. 2014-09-04 14:02:56 +02:00
Yorick Peterse c69d77109b Require benchmark-ips 2.0 or newer. 2014-09-02 09:58:14 +02:00
Yorick Peterse bcbdf5e4e7 Require at least Racc 1.4.12.
This release contains proper JRuby support.
2014-08-26 20:53:36 +02:00
Yorick Peterse 114bc0d6e8 Upgrade to RSpec 3.0.
For this I've enabled both the old expectation and stubbing/mocking syntax. The
old syntax is much more compact and to me reads nicer. For example, consider
the following:

    lex('<foo></foo>').should == [...]

To me this reads much nicer than this:

    expect(lex('<foo></foo>')).to eq([...])
2014-06-02 12:09:38 +02:00
Yorick Peterse 48bf1a0628 Tweak Gemspec file list a bit.
This ensures it also takes files such as "Rakefile" into account when needed.
2014-06-01 22:16:55 +02:00
Yorick Peterse 3c621bf22e Removed the manifest file + task.
Using a Dir.glob() is much easier when dealing with a bunch of generated files.
2014-05-07 11:11:29 +02:00
Yorick Peterse 9abc5c1c92 Separated the Java and C ext codebases. 2014-05-07 00:29:10 +02:00
Yorick Peterse 2652bc0103 Removed Cliver as a dependency.
Since I'm not using any Ragel version specific features it's not really needed
to check for the version.
2014-05-06 10:18:52 +02:00
Yorick Peterse b9cb7c2d7c Corrected various extension paths. 2014-05-06 08:47:02 +02:00
Yorick Peterse c30d3a7627 Half-assed JRuby boilerplate.
Blowing my brains out over getting this fat pig to do what I want but we're
getting there.
2014-05-06 00:23:07 +02:00
Yorick Peterse 2689d3f65a Initial setup using a C extension.
While I've tried to keep Oga pure Ruby for as long as possible the performance
of Ragel's Ruby output was not worth the trouble. For example, lexing 10MB of
XML would take 5 to 6 seconds at least. Nokogiri on the other hand can parse
that same XML into a DOM document in about 300 miliseconds. Such a big
performance difference is not acceptable.

To work around this the XML/HTML lexer will be implemented in C for
MRI/Rubinius and Java for JRuby. For now there's only a C extension as I
haven't read up yet on the JRuby API. The end goal is to provide some sort of
Ragel "template" that can be used to generate the corresponding C/Java
extension code. This would remove the need of duplicating the grammar and
associated code.

The native extension setup is a hybrid between native and Ruby. The raw Ragel
stuff happens in C/Java while the actual logic of actions happens in Ruby. This
adds a small amount of overhead but makes it much easier to maintain the lexer.
Even with this extra overhead the performance is much better than pure Ruby.
The 10MB of XML mentioned above is lexed in about 600 miliseconds. In other
words, it's 10 times faster.
2014-05-05 00:31:28 +02:00
Yorick Peterse 08d412da7e First shot at removing the AST layer.
The AST layer is being removed because it doesn't really serve a useful
purpose. In particular when creating a streaming parser the AST nodes would
only introduce extra overhead.

As a result of this the parser will instead emit a DOM tree directly instead of
first emitting an AST.
2014-04-21 23:05:39 +02:00
Yorick Peterse 2852afce9b Benchmark for measuring CDATA lexing. 2014-03-21 16:59:44 +01:00
Yorick Peterse 702477ca28 Basic project layout. 2014-02-26 19:50:16 +01:00