core/oga - oga

Commit Graph

Author	SHA1	Message	Date
Yorick Peterse	1400a859ce	Make sure C strings always end with a NULL. Haven't bumped into any problems just yet. However, in theory all sorts of evil could happen here. Which is part of the problem of C: so much shit is undefined behaviour that you can take a single step and fall in 15 holes at the same time. In theory, because nobody bothered to actually specify it properly.	2014-09-28 22:28:55 +02:00
Yorick Peterse	8db77c0a09	Count newlines of text nodes in native code. Instead of relying on String#count for counting newlines in text nodes, Oga now does this in C/Java. String#count isn't exactly the fastest way of counting characters. Performance was measured using benchmark/xml/lexer/string_average_bench.rb. Before this patch the results were as following: MRI: 0.529s Rbx: 4.965s JRuby: 0.622s After this patch: MRI: 0.424s Rbx: 1.942s JRuby: 0.665s => numbers vary a bit, seem roughly the same as before The commands used for benchmarking: $ rake clean # to make sure that C exts aren't shared between MRI/Rbx $ rake generate $ rake fixtures $ ruby benchmark/xml/lexer/string_average_bench.rb The big difference for Rbx is probably due to the implementation of String#count not being super fast. Some changes were made (https://github.com/rubinius/rubinius/pull/3133) to the method, but this hasn't been released yet. JRuby seems to perform in a similar way, so either it was already optimizing things for me or I suck at writing well performing Java code. This fixes #51.	2014-09-25 22:49:11 +02:00
Yorick Peterse	81edce2eb8	Fixed lexing of XML comments. The previous setup would consume too much. For example the following HTML: <a><!--foo--><b><!--bar--></b></a> would result in the following T_COMMENT token: "foo--><b><!--bar" The new setup requires the marking of a start position. I'm not a huge fan of this but there doesn't appear to be a way around this.	2014-08-15 20:42:32 +02:00
Yorick Peterse	629dcd3fe6	Support for IO inputs in the lexer. Using IO/StringIO objects one can parse large XML files without first having to read the entire file into memory. This can potentially save a lot of memory at the cost of a slightly slower runtime. For IO like instances the lexer will consume the input line by line. If a String is given it's consumed as a whole instead. A small side effect of reading the input line by line is that text such as "foo\nbar" will be lexed as two tokens instead of one. Fixes #19.	2014-05-26 00:30:39 +02:00
Yorick Peterse	6b9d65923a	Use a method for getting input in the XML lexer. Instead of directly accessing the `data` instance variable the C/Java code now uses the method `read_data`. This is part of one of the various steps required to allow Oga to read data from IO like instances. It also means I can freely change the name of the instance variable without also having to change the C/Java code.	2014-05-21 00:27:23 +02:00
Yorick Peterse	4542f06d0f	Replaced fcall/fret with fnext in the XML lexer. With the rules being cleaned up/moved around a bit we can drop the use of fcall/fret. This saves the need of having to maintain a stack (position).	2014-05-21 00:08:48 +02:00
Yorick Peterse	ee78b2c382	Don't redefine namespaces in C. The Oga::XML namespace should be set up by Ruby, not by C.	2014-05-07 10:52:06 +02:00
Yorick Peterse	e271298984	Use macros in the C lexer.	2014-05-07 00:57:25 +02:00
Yorick Peterse	f25f8a3d15	Break up the Ragel C grammar. The grammar is now broken up in to a base lexer and a C lexer. This allows the same grammar to also be used in the Java code.	2014-05-07 00:50:34 +02:00
Yorick Peterse	9abc5c1c92	Separated the Java and C ext codebases.	2014-05-07 00:29:10 +02:00

10 Commits