Yorick Peterse
44bf1dd1ca
Split up handling of element names/namespaces.
...
This is now split up on Ragel level, simplifying the corresponding Ruby code.
2014-05-15 10:22:05 +02:00
Yorick Peterse
723a273e4f
Enforce symbols for element attributes.
...
This comes with a little bit of memory overhead but this should be minor in
most cases.
2014-05-15 01:04:26 +02:00
Yorick Peterse
f4b9bbd4ac
Removed lazy way of setting instance variables.
...
This process is quite a bit slower compared to setting instance variables
directly.
2014-05-15 00:43:13 +02:00
Yorick Peterse
043ea9a366
Fall back to ps in the profiler.
...
If the /proc filesystem doesn't exist we'll fall back to using the `ps` shell
command.
2014-05-11 21:15:33 +02:00
Yorick Peterse
1b58723e7d
Removed stdioh. #include.
...
This header is also not needed.
2014-05-11 21:06:55 +02:00
Yorick Peterse
e2b9fc75ca
Removed #include for malloc.h
...
Apparently some OS' move this to malloc/malloc.h. Since it's not needed lets
just get rid of it.
2014-05-11 21:06:02 +02:00
Yorick Peterse
ba3d96c819
Re-build lexers when base_lexer.rl changes.
...
Thanks to @avdi for bringing up on how to do this when using rule() blocks.
2014-05-10 00:28:23 +02:00
Yorick Peterse
19f04f98f7
Support for lexing/parsing inline doctypes.
2014-05-10 00:28:11 +02:00
Yorick Peterse
a92023fe94
Removed outdated paragraph from the README.
...
Ironically Oga now uses native extensions for the lexer.
2014-05-09 00:34:25 +02:00
Yorick Peterse
a8bf6be00e
Added a contributing guide.
2014-05-09 00:32:44 +02:00
Yorick Peterse
2dd5d996c4
Travis: don't notify for every failure.
2014-05-08 10:20:35 +02:00
Yorick Peterse
c472ceac6f
Docs for the shared Ragel grammar.
2014-05-08 00:21:23 +02:00
Yorick Peterse
98db796205
Updated editor configuration.
2014-05-08 00:17:12 +02:00
Yorick Peterse
51c1f3c32d
Updated the README.
2014-05-08 00:15:54 +02:00
Yorick Peterse
fe74d60138
Manually bootstrap JRuby after all.
...
After discussing this with @headius I've decided to do this the manual way
anyway. Apparently the basic load service stuff is deprecated and not very
reliable.
2014-05-07 22:32:34 +02:00
Yorick Peterse
90fabe3f21
Compile when running `rake generate`.
2014-05-07 20:07:31 +02:00
Yorick Peterse
3c621bf22e
Removed the manifest file + task.
...
Using a Dir.glob() is much easier when dealing with a bunch of generated files.
2014-05-07 11:11:29 +02:00
Yorick Peterse
ee78b2c382
Don't redefine namespaces in C.
...
The Oga::XML namespace should be set up by Ruby, not by C.
2014-05-07 10:52:06 +02:00
Yorick Peterse
bbdc7966db
Documentation for the JRuby extension.
2014-05-07 10:24:24 +02:00
Yorick Peterse
3afef5f7cc
Lexer support for JRuby.
...
JRuby now passes all tests. Benchmark wise it completes the big XML benchmark
in about 500-600 milliseconds.
2014-05-07 09:40:22 +02:00
Yorick Peterse
b9a4038e42
Callback boilerplate for the Java lexer.
2014-05-07 01:01:24 +02:00
Yorick Peterse
e271298984
Use macros in the C lexer.
2014-05-07 00:57:25 +02:00
Yorick Peterse
f25f8a3d15
Break up the Ragel C grammar.
...
The grammar is now broken up in to a base lexer and a C lexer. This allows the
same grammar to also be used in the Java code.
2014-05-07 00:50:34 +02:00
Yorick Peterse
49939fa687
Updated editor configuration.
2014-05-07 00:33:24 +02:00
Yorick Peterse
9abc5c1c92
Separated the Java and C ext codebases.
2014-05-07 00:29:10 +02:00
Yorick Peterse
b8efed5177
Renamed on_start_doctype to on_doctype_start.
2014-05-06 23:18:44 +02:00
Yorick Peterse
f39fe5d857
JRuby lexer boilerplate with actual input.
...
This doesn't actually lex anything just yet but at least the input from Ruby is
in place.
2014-05-06 22:43:55 +02:00
Yorick Peterse
fea5ec7946
Removed the package line in LibogaService.java
2014-05-06 20:52:42 +02:00
Yorick Peterse
2053018d07
Slap JRuby so that it can load the .jar file.
2014-05-06 20:45:26 +02:00
Yorick Peterse
6e685378e0
Setup Ragel for JRuby and load things the hard way
2014-05-06 19:06:04 +02:00
Yorick Peterse
aea8378fbb
Removed Cliver from the parser task.
2014-05-06 15:25:57 +02:00
Yorick Peterse
00e778d0d9
Removed unused cliver require.
...
Dumbass.
2014-05-06 13:59:26 +02:00
Yorick Peterse
127aea5ca6
Remove the Java output when cleaning.
2014-05-06 10:24:57 +02:00
Yorick Peterse
64c9e18651
Setup for Java and Ragel.
2014-05-06 10:24:07 +02:00
Yorick Peterse
2652bc0103
Removed Cliver as a dependency.
...
Since I'm not using any Ragel version specific features it's not really needed
to check for the version.
2014-05-06 10:18:52 +02:00
Yorick Peterse
eeeeb0efad
Don't track the generated Java lexer.
2014-05-06 10:11:19 +02:00
Yorick Peterse
d2742cfdde
Use 4 spaces for C/Java code.
2014-05-06 09:41:36 +02:00
Yorick Peterse
4e2dca2fd9
Updated the list of files to clean.
2014-05-06 09:29:02 +02:00
Yorick Peterse
b9cb7c2d7c
Corrected various extension paths.
2014-05-06 08:47:02 +02:00
Yorick Peterse
01a4a53a53
Merge branch 'native-ext' of github.com:YorickPeterse/oga into native-ext
2014-05-06 08:44:57 +02:00
Yorick Peterse
c30d3a7627
Half-assed JRuby boilerplate.
...
Blowing my brains out over getting this fat pig to do what I want but we're
getting there.
2014-05-06 00:23:07 +02:00
Yorick Peterse
2b3a6be24d
Use liboga as a prefix in the C code.
...
Namespaces? What are those?
2014-05-05 21:19:50 +02:00
Yorick Peterse
ee756037e7
Removed unused YARD tag.
2014-05-05 09:45:10 +02:00
Yorick Peterse
aeab885a7f
Docs for the Ruby part of the XML lexer.
2014-05-05 09:44:35 +02:00
Yorick Peterse
57fd4dff64
Docs for the C lexer.
2014-05-05 09:40:08 +02:00
Yorick Peterse
335f3cc6d6
Use rb_enc_str_new instead of rb_enc_str_new_cstr.
...
The latter in combination with strndup() would leak large amounts of memory.
2014-05-05 00:34:19 +02:00
Yorick Peterse
2689d3f65a
Initial setup using a C extension.
...
While I've tried to keep Oga pure Ruby for as long as possible the performance
of Ragel's Ruby output was not worth the trouble. For example, lexing 10MB of
XML would take 5 to 6 seconds at least. Nokogiri on the other hand can parse
that same XML into a DOM document in about 300 miliseconds. Such a big
performance difference is not acceptable.
To work around this the XML/HTML lexer will be implemented in C for
MRI/Rubinius and Java for JRuby. For now there's only a C extension as I
haven't read up yet on the JRuby API. The end goal is to provide some sort of
Ragel "template" that can be used to generate the corresponding C/Java
extension code. This would remove the need of duplicating the grammar and
associated code.
The native extension setup is a hybrid between native and Ruby. The raw Ragel
stuff happens in C/Java while the actual logic of actions happens in Ruby. This
adds a small amount of overhead but makes it much easier to maintain the lexer.
Even with this extra overhead the performance is much better than pure Ruby.
The 10MB of XML mentioned above is lexed in about 600 miliseconds. In other
words, it's 10 times faster.
2014-05-05 00:31:28 +02:00
Yorick Peterse
baaa24a760
Indentation fix in the lexer.
2014-05-04 18:06:43 +02:00
Yorick Peterse
f18e8893de
Removed the buffering crap from the lexer.
2014-05-04 17:39:08 +02:00
Yorick Peterse
57255012b7
Patch the Ragel lexer after generating it.
...
This further increases throughput of the lexer. On MRI this seems to save
around one second or so. It now sits at ~6,8 seconds in the big XML benchmark.
On JRuby, combined with some JIT options and invoke dynamic enabled, this can
reduce the average lexing time to around 3,5 seconds. Rubinius, also with a
few aggressive JIT options, seems to stick around 9 seocnds.
2014-05-02 00:40:10 +02:00