854936f30b 
								
							 
						 
						
							
							
								
								Added average benchmarks for the parser.  
							
							
							
						 
						
							2014-05-16 16:38:27 +02:00  
				
					
						
							
							
								 
						
							
								ad67cd708f 
								
							 
						 
						
							
							
								
								Only include debug info when DEBUG is set.  
							
							
							
						 
						
							2014-05-15 20:43:48 +02:00  
				
					
						
							
							
								 
						
							
								fd2f727183 
								
							 
						 
						
							
							
								
								Only set explicit ivars in the lexer.  
							
							
							
						 
						
							2014-05-15 19:48:18 +02:00  
				
					
						
							
							
								 
						
							
								44bf1dd1ca 
								
							 
						 
						
							
							
								
								Split up handling of element names/namespaces.  
							
							... 
							
							
							
							This is now split up on Ragel level, simplifying the corresponding Ruby code. 
							
						 
						
							2014-05-15 10:22:05 +02:00  
				
					
						
							
							
								 
						
							
								723a273e4f 
								
							 
						 
						
							
							
								
								Enforce symbols for element attributes.  
							
							... 
							
							
							
							This comes with a little bit of memory overhead but this should be minor in
most cases. 
							
						 
						
							2014-05-15 01:04:26 +02:00  
				
					
						
							
							
								 
						
							
								f4b9bbd4ac 
								
							 
						 
						
							
							
								
								Removed lazy way of setting instance variables.  
							
							... 
							
							
							
							This process is quite a bit slower compared to setting instance variables
directly. 
							
						 
						
							2014-05-15 00:43:13 +02:00  
				
					
						
							
							
								 
						
							
								043ea9a366 
								
							 
						 
						
							
							
								
								Fall back to ps in the profiler.  
							
							... 
							
							
							
							If the /proc filesystem doesn't exist we'll fall back to using the `ps` shell
command. 
							
						 
						
							2014-05-11 21:15:33 +02:00  
				
					
						
							
							
								 
						
							
								1b58723e7d 
								
							 
						 
						
							
							
								
								Removed stdioh. #include.  
							
							... 
							
							
							
							This header is also not needed. 
							
						 
						
							2014-05-11 21:06:55 +02:00  
				
					
						
							
							
								 
						
							
								e2b9fc75ca 
								
							 
						 
						
							
							
								
								Removed #include for malloc.h  
							
							... 
							
							
							
							Apparently some OS' move this to malloc/malloc.h. Since it's not needed lets
just get rid of it. 
							
						 
						
							2014-05-11 21:06:02 +02:00  
				
					
						
							
							
								 
						
							
								ba3d96c819 
								
							 
						 
						
							
							
								
								Re-build lexers when base_lexer.rl changes.  
							
							... 
							
							
							
							Thanks to @avdi for bringing up on how to do this when using rule() blocks. 
							
						 
						
							2014-05-10 00:28:23 +02:00  
				
					
						
							
							
								 
						
							
								19f04f98f7 
								
							 
						 
						
							
							
								
								Support for lexing/parsing inline doctypes.  
							
							
							
						 
						
							2014-05-10 00:28:11 +02:00  
				
					
						
							
							
								 
						
							
								a92023fe94 
								
							 
						 
						
							
							
								
								Removed outdated paragraph from the README.  
							
							... 
							
							
							
							Ironically Oga now uses native extensions for the lexer. 
							
						 
						
							2014-05-09 00:34:25 +02:00  
				
					
						
							
							
								 
						
							
								a8bf6be00e 
								
							 
						 
						
							
							
								
								Added a contributing guide.  
							
							
							
						 
						
							2014-05-09 00:32:44 +02:00  
				
					
						
							
							
								 
						
							
								2dd5d996c4 
								
							 
						 
						
							
							
								
								Travis: don't notify for every failure.  
							
							
							
						 
						
							2014-05-08 10:20:35 +02:00  
				
					
						
							
							
								 
						
							
								c472ceac6f 
								
							 
						 
						
							
							
								
								Docs for the shared Ragel grammar.  
							
							
							
						 
						
							2014-05-08 00:21:23 +02:00  
				
					
						
							
							
								 
						
							
								98db796205 
								
							 
						 
						
							
							
								
								Updated editor configuration.  
							
							
							
						 
						
							2014-05-08 00:17:12 +02:00  
				
					
						
							
							
								 
						
							
								51c1f3c32d 
								
							 
						 
						
							
							
								
								Updated the README.  
							
							
							
						 
						
							2014-05-08 00:15:54 +02:00  
				
					
						
							
							
								 
						
							
								fe74d60138 
								
							 
						 
						
							
							
								
								Manually bootstrap JRuby after all.  
							
							... 
							
							
							
							After discussing this with @headius I've decided to do this the manual way
anyway. Apparently the basic load service stuff is deprecated and not very
reliable. 
							
						 
						
							2014-05-07 22:32:34 +02:00  
				
					
						
							
							
								 
						
							
								90fabe3f21 
								
							 
						 
						
							
							
								
								Compile when running `rake generate`.  
							
							
							
						 
						
							2014-05-07 20:07:31 +02:00  
				
					
						
							
							
								 
						
							
								3c621bf22e 
								
							 
						 
						
							
							
								
								Removed the manifest file + task.  
							
							... 
							
							
							
							Using a Dir.glob() is much easier when dealing with a bunch of generated files. 
							
						 
						
							2014-05-07 11:11:29 +02:00  
				
					
						
							
							
								 
						
							
								ee78b2c382 
								
							 
						 
						
							
							
								
								Don't redefine namespaces in C.  
							
							... 
							
							
							
							The Oga::XML namespace should be set up by Ruby, not by C. 
							
						 
						
							2014-05-07 10:52:06 +02:00  
				
					
						
							
							
								 
						
							
								bbdc7966db 
								
							 
						 
						
							
							
								
								Documentation for the JRuby extension.  
							
							
							
						 
						
							2014-05-07 10:24:24 +02:00  
				
					
						
							
							
								 
						
							
								3afef5f7cc 
								
							 
						 
						
							
							
								
								Lexer support for JRuby.  
							
							... 
							
							
							
							JRuby now passes all tests. Benchmark wise it completes the big XML benchmark
in about 500-600 milliseconds. 
							
						 
						
							2014-05-07 09:40:22 +02:00  
				
					
						
							
							
								 
						
							
								b9a4038e42 
								
							 
						 
						
							
							
								
								Callback boilerplate for the Java lexer.  
							
							
							
						 
						
							2014-05-07 01:01:24 +02:00  
				
					
						
							
							
								 
						
							
								e271298984 
								
							 
						 
						
							
							
								
								Use macros in the C lexer.  
							
							
							
						 
						
							2014-05-07 00:57:25 +02:00  
				
					
						
							
							
								 
						
							
								f25f8a3d15 
								
							 
						 
						
							
							
								
								Break up the Ragel C grammar.  
							
							... 
							
							
							
							The grammar is now broken up in to a base lexer and a C lexer. This allows the
same grammar to also be used in the Java code. 
							
						 
						
							2014-05-07 00:50:34 +02:00  
				
					
						
							
							
								 
						
							
								49939fa687 
								
							 
						 
						
							
							
								
								Updated editor configuration.  
							
							
							
						 
						
							2014-05-07 00:33:24 +02:00  
				
					
						
							
							
								 
						
							
								9abc5c1c92 
								
							 
						 
						
							
							
								
								Separated the Java and C ext codebases.  
							
							
							
						 
						
							2014-05-07 00:29:10 +02:00  
				
					
						
							
							
								 
						
							
								b8efed5177 
								
							 
						 
						
							
							
								
								Renamed on_start_doctype to on_doctype_start.  
							
							
							
						 
						
							2014-05-06 23:18:44 +02:00  
				
					
						
							
							
								 
						
							
								f39fe5d857 
								
							 
						 
						
							
							
								
								JRuby lexer boilerplate with actual input.  
							
							... 
							
							
							
							This doesn't actually lex anything just yet but at least the input from Ruby is
in place. 
							
						 
						
							2014-05-06 22:43:55 +02:00  
				
					
						
							
							
								 
						
							
								fea5ec7946 
								
							 
						 
						
							
							
								
								Removed the package line in LibogaService.java  
							
							
							
						 
						
							2014-05-06 20:52:42 +02:00  
				
					
						
							
							
								 
						
							
								2053018d07 
								
							 
						 
						
							
							
								
								Slap JRuby so that it can load the .jar file.  
							
							
							
						 
						
							2014-05-06 20:45:26 +02:00  
				
					
						
							
							
								 
						
							
								6e685378e0 
								
							 
						 
						
							
							
								
								Setup Ragel for JRuby and load things the hard way  
							
							
							
						 
						
							2014-05-06 19:06:04 +02:00  
				
					
						
							
							
								 
						
							
								aea8378fbb 
								
							 
						 
						
							
							
								
								Removed Cliver from the parser task.  
							
							
							
						 
						
							2014-05-06 15:25:57 +02:00  
				
					
						
							
							
								 
						
							
								00e778d0d9 
								
							 
						 
						
							
							
								
								Removed unused cliver require.  
							
							... 
							
							
							
							Dumbass. 
							
						 
						
							2014-05-06 13:59:26 +02:00  
				
					
						
							
							
								 
						
							
								127aea5ca6 
								
							 
						 
						
							
							
								
								Remove the Java output when cleaning.  
							
							
							
						 
						
							2014-05-06 10:24:57 +02:00  
				
					
						
							
							
								 
						
							
								64c9e18651 
								
							 
						 
						
							
							
								
								Setup for Java and Ragel.  
							
							
							
						 
						
							2014-05-06 10:24:07 +02:00  
				
					
						
							
							
								 
						
							
								2652bc0103 
								
							 
						 
						
							
							
								
								Removed Cliver as a dependency.  
							
							... 
							
							
							
							Since I'm not using any Ragel version specific features it's not really needed
to check for the version. 
							
						 
						
							2014-05-06 10:18:52 +02:00  
				
					
						
							
							
								 
						
							
								eeeeb0efad 
								
							 
						 
						
							
							
								
								Don't track the generated Java lexer.  
							
							
							
						 
						
							2014-05-06 10:11:19 +02:00  
				
					
						
							
							
								 
						
							
								d2742cfdde 
								
							 
						 
						
							
							
								
								Use 4 spaces for C/Java code.  
							
							
							
						 
						
							2014-05-06 09:41:36 +02:00  
				
					
						
							
							
								 
						
							
								4e2dca2fd9 
								
							 
						 
						
							
							
								
								Updated the list of files to clean.  
							
							
							
						 
						
							2014-05-06 09:29:02 +02:00  
				
					
						
							
							
								 
						
							
								b9cb7c2d7c 
								
							 
						 
						
							
							
								
								Corrected various extension paths.  
							
							
							
						 
						
							2014-05-06 08:47:02 +02:00  
				
					
						
							
							
								 
						
							
								01a4a53a53 
								
							 
						 
						
							
							
								
								Merge branch 'native-ext' of github.com:YorickPeterse/oga into native-ext  
							
							
							
						 
						
							2014-05-06 08:44:57 +02:00  
				
					
						
							
							
								 
						
							
								c30d3a7627 
								
							 
						 
						
							
							
								
								Half-assed JRuby boilerplate.  
							
							... 
							
							
							
							Blowing my brains out over getting this fat pig to do what I want but we're
getting there. 
							
						 
						
							2014-05-06 00:23:07 +02:00  
				
					
						
							
							
								 
						
							
								2b3a6be24d 
								
							 
						 
						
							
							
								
								Use liboga as a prefix in the C code.  
							
							... 
							
							
							
							Namespaces? What are those? 
							
						 
						
							2014-05-05 21:19:50 +02:00  
				
					
						
							
							
								 
						
							
								ee756037e7 
								
							 
						 
						
							
							
								
								Removed unused YARD tag.  
							
							
							
						 
						
							2014-05-05 09:45:10 +02:00  
				
					
						
							
							
								 
						
							
								aeab885a7f 
								
							 
						 
						
							
							
								
								Docs for the Ruby part of the XML lexer.  
							
							
							
						 
						
							2014-05-05 09:44:35 +02:00  
				
					
						
							
							
								 
						
							
								57fd4dff64 
								
							 
						 
						
							
							
								
								Docs for the C lexer.  
							
							
							
						 
						
							2014-05-05 09:40:08 +02:00  
				
					
						
							
							
								 
						
							
								335f3cc6d6 
								
							 
						 
						
							
							
								
								Use rb_enc_str_new instead of rb_enc_str_new_cstr.  
							
							... 
							
							
							
							The latter in combination with strndup() would leak large amounts of memory. 
							
						 
						
							2014-05-05 00:34:19 +02:00  
				
					
						
							
							
								 
						
							
								2689d3f65a 
								
							 
						 
						
							
							
								
								Initial setup using a C extension.  
							
							... 
							
							
							
							While I've tried to keep Oga pure Ruby for as long as possible the performance
of Ragel's Ruby output was not worth the trouble. For example, lexing 10MB of
XML would take 5 to 6 seconds at least. Nokogiri on the other hand can parse
that same XML into a DOM document in about 300 miliseconds. Such a big
performance difference is not acceptable.
To work around this the XML/HTML lexer will be implemented in C for
MRI/Rubinius and Java for JRuby. For now there's only a C extension as I
haven't read up yet on the JRuby API. The end goal is to provide some sort of
Ragel "template" that can be used to generate the corresponding C/Java
extension code. This would remove the need of duplicating the grammar and
associated code.
The native extension setup is a hybrid between native and Ruby. The raw Ragel
stuff happens in C/Java while the actual logic of actions happens in Ruby. This
adds a small amount of overhead but makes it much easier to maintain the lexer.
Even with this extra overhead the performance is much better than pure Ruby.
The 10MB of XML mentioned above is lexed in about 600 miliseconds. In other
words, it's 10 times faster. 
							
						 
						
							2014-05-05 00:31:28 +02:00