Yorick Peterse
28edc7726f
Rewind IO input upon resetting the lexer.
2014-05-26 00:33:20 +02:00
Yorick Peterse
c81c6db74e
Benchmarks/profilers for IO inputs in the lexer.
2014-05-26 00:31:15 +02:00
Yorick Peterse
629dcd3fe6
Support for IO inputs in the lexer.
...
Using IO/StringIO objects one can parse large XML files without first having to
read the entire file into memory. This can potentially save a lot of memory at
the cost of a slightly slower runtime.
For IO like instances the lexer will consume the input line by line. If a
String is given it's consumed as a whole instead. A small side effect of
reading the input line by line is that text such as "foo\nbar" will be lexed as
two tokens instead of one.
Fixes #19 .
2014-05-26 00:30:39 +02:00
Yorick Peterse
6b9d65923a
Use a method for getting input in the XML lexer.
...
Instead of directly accessing the `data` instance variable the C/Java code now
uses the method `read_data`. This is part of one of the various steps required
to allow Oga to read data from IO like instances. It also means I can freely
change the name of the instance variable without also having to change the
C/Java code.
2014-05-21 00:27:23 +02:00
Yorick Peterse
418b4ef498
Cleaned up documentation of the XML lexer.
2014-05-21 00:21:21 +02:00
Yorick Peterse
3a8582030d
Removed remaining fhold call in the XML lexer.
...
There's no particular need any more for this fhold call so we're getting rid of
it.
2014-05-21 00:11:39 +02:00
Yorick Peterse
4542f06d0f
Replaced fcall/fret with fnext in the XML lexer.
...
With the rules being cleaned up/moved around a bit we can drop the use of
fcall/fret. This saves the need of having to maintain a stack (position).
2014-05-21 00:08:48 +02:00
Yorick Peterse
c56b0395e4
Moved various rules around for the XML lexer.
...
This moves the element related rules to the element_head machine (where they
belong). This in turn makes it possible to lex ">" as a text node, previously
this was impossible.
2014-05-21 00:04:53 +02:00
Yorick Peterse
feaf28d423
Remove dedicated string machine in the XML lexer.
...
This removes the need for another fcall/fret combination.
2014-05-19 20:26:07 +02:00
Yorick Peterse
93b9718406
Cleaned up the XML lexer documentation.
2014-05-19 09:39:35 +02:00
Yorick Peterse
cd0f3380c4
Merge multiple CDATA tokens into a single token.
...
The tokens T_CDATA_START, T_TEXT and T_CDATA_END have been merged together into
T_CDATA.
2014-05-19 09:36:19 +02:00
Yorick Peterse
a4fb5c1299
Merge multiple comment tokens into a single one.
...
The tokens T_COMMENT_START, T_TEXT and T_COMMENT_END have been merged into a
single token: T_COMMENT. This simplifies both the lexer and the parser.
2014-05-19 09:30:30 +02:00
Yorick Peterse
c891dd88cb
Removed useless code from the XML parser.
2014-05-18 23:30:26 +02:00
Yorick Peterse
31ec76c90a
Fixed guard in the lexer header.
2014-05-18 16:51:17 +02:00
Yorick Peterse
81a81f0ab0
Don't create Arrays when not needed.
2014-05-16 17:05:42 +02:00
Yorick Peterse
854936f30b
Added average benchmarks for the parser.
2014-05-16 16:38:27 +02:00
Yorick Peterse
ad67cd708f
Only include debug info when DEBUG is set.
2014-05-15 20:43:48 +02:00
Yorick Peterse
fd2f727183
Only set explicit ivars in the lexer.
2014-05-15 19:48:18 +02:00
Yorick Peterse
44bf1dd1ca
Split up handling of element names/namespaces.
...
This is now split up on Ragel level, simplifying the corresponding Ruby code.
2014-05-15 10:22:05 +02:00
Yorick Peterse
723a273e4f
Enforce symbols for element attributes.
...
This comes with a little bit of memory overhead but this should be minor in
most cases.
2014-05-15 01:04:26 +02:00
Yorick Peterse
f4b9bbd4ac
Removed lazy way of setting instance variables.
...
This process is quite a bit slower compared to setting instance variables
directly.
2014-05-15 00:43:13 +02:00
Yorick Peterse
043ea9a366
Fall back to ps in the profiler.
...
If the /proc filesystem doesn't exist we'll fall back to using the `ps` shell
command.
2014-05-11 21:15:33 +02:00
Yorick Peterse
1b58723e7d
Removed stdioh. #include.
...
This header is also not needed.
2014-05-11 21:06:55 +02:00
Yorick Peterse
e2b9fc75ca
Removed #include for malloc.h
...
Apparently some OS' move this to malloc/malloc.h. Since it's not needed lets
just get rid of it.
2014-05-11 21:06:02 +02:00
Yorick Peterse
ba3d96c819
Re-build lexers when base_lexer.rl changes.
...
Thanks to @avdi for bringing up on how to do this when using rule() blocks.
2014-05-10 00:28:23 +02:00
Yorick Peterse
19f04f98f7
Support for lexing/parsing inline doctypes.
2014-05-10 00:28:11 +02:00
Yorick Peterse
a92023fe94
Removed outdated paragraph from the README.
...
Ironically Oga now uses native extensions for the lexer.
2014-05-09 00:34:25 +02:00
Yorick Peterse
a8bf6be00e
Added a contributing guide.
2014-05-09 00:32:44 +02:00
Yorick Peterse
2dd5d996c4
Travis: don't notify for every failure.
2014-05-08 10:20:35 +02:00
Yorick Peterse
c472ceac6f
Docs for the shared Ragel grammar.
2014-05-08 00:21:23 +02:00
Yorick Peterse
98db796205
Updated editor configuration.
2014-05-08 00:17:12 +02:00
Yorick Peterse
51c1f3c32d
Updated the README.
2014-05-08 00:15:54 +02:00
Yorick Peterse
fe74d60138
Manually bootstrap JRuby after all.
...
After discussing this with @headius I've decided to do this the manual way
anyway. Apparently the basic load service stuff is deprecated and not very
reliable.
2014-05-07 22:32:34 +02:00
Yorick Peterse
90fabe3f21
Compile when running `rake generate`.
2014-05-07 20:07:31 +02:00
Yorick Peterse
3c621bf22e
Removed the manifest file + task.
...
Using a Dir.glob() is much easier when dealing with a bunch of generated files.
2014-05-07 11:11:29 +02:00
Yorick Peterse
ee78b2c382
Don't redefine namespaces in C.
...
The Oga::XML namespace should be set up by Ruby, not by C.
2014-05-07 10:52:06 +02:00
Yorick Peterse
bbdc7966db
Documentation for the JRuby extension.
2014-05-07 10:24:24 +02:00
Yorick Peterse
3afef5f7cc
Lexer support for JRuby.
...
JRuby now passes all tests. Benchmark wise it completes the big XML benchmark
in about 500-600 milliseconds.
2014-05-07 09:40:22 +02:00
Yorick Peterse
b9a4038e42
Callback boilerplate for the Java lexer.
2014-05-07 01:01:24 +02:00
Yorick Peterse
e271298984
Use macros in the C lexer.
2014-05-07 00:57:25 +02:00
Yorick Peterse
f25f8a3d15
Break up the Ragel C grammar.
...
The grammar is now broken up in to a base lexer and a C lexer. This allows the
same grammar to also be used in the Java code.
2014-05-07 00:50:34 +02:00
Yorick Peterse
49939fa687
Updated editor configuration.
2014-05-07 00:33:24 +02:00
Yorick Peterse
9abc5c1c92
Separated the Java and C ext codebases.
2014-05-07 00:29:10 +02:00
Yorick Peterse
b8efed5177
Renamed on_start_doctype to on_doctype_start.
2014-05-06 23:18:44 +02:00
Yorick Peterse
f39fe5d857
JRuby lexer boilerplate with actual input.
...
This doesn't actually lex anything just yet but at least the input from Ruby is
in place.
2014-05-06 22:43:55 +02:00
Yorick Peterse
fea5ec7946
Removed the package line in LibogaService.java
2014-05-06 20:52:42 +02:00
Yorick Peterse
2053018d07
Slap JRuby so that it can load the .jar file.
2014-05-06 20:45:26 +02:00
Yorick Peterse
6e685378e0
Setup Ragel for JRuby and load things the hard way
2014-05-06 19:06:04 +02:00
Yorick Peterse
aea8378fbb
Removed Cliver from the parser task.
2014-05-06 15:25:57 +02:00
Yorick Peterse
00e778d0d9
Removed unused cliver require.
...
Dumbass.
2014-05-06 13:59:26 +02:00