core/oga - oga

Commit Graph

Author	SHA1	Message	Date
Yorick Peterse	f25f8a3d15	Break up the Ragel C grammar. The grammar is now broken up in to a base lexer and a C lexer. This allows the same grammar to also be used in the Java code.	2014-05-07 00:50:34 +02:00
Yorick Peterse	49939fa687	Updated editor configuration.	2014-05-07 00:33:24 +02:00
Yorick Peterse	9abc5c1c92	Separated the Java and C ext codebases.	2014-05-07 00:29:10 +02:00
Yorick Peterse	b8efed5177	Renamed on_start_doctype to on_doctype_start.	2014-05-06 23:18:44 +02:00
Yorick Peterse	f39fe5d857	JRuby lexer boilerplate with actual input. This doesn't actually lex anything just yet but at least the input from Ruby is in place.	2014-05-06 22:43:55 +02:00
Yorick Peterse	fea5ec7946	Removed the package line in LibogaService.java	2014-05-06 20:52:42 +02:00
Yorick Peterse	2053018d07	Slap JRuby so that it can load the .jar file.	2014-05-06 20:45:26 +02:00
Yorick Peterse	6e685378e0	Setup Ragel for JRuby and load things the hard way	2014-05-06 19:06:04 +02:00
Yorick Peterse	aea8378fbb	Removed Cliver from the parser task.	2014-05-06 15:25:57 +02:00
Yorick Peterse	00e778d0d9	Removed unused cliver require. Dumbass.	2014-05-06 13:59:26 +02:00
Yorick Peterse	127aea5ca6	Remove the Java output when cleaning.	2014-05-06 10:24:57 +02:00
Yorick Peterse	64c9e18651	Setup for Java and Ragel.	2014-05-06 10:24:07 +02:00
Yorick Peterse	2652bc0103	Removed Cliver as a dependency. Since I'm not using any Ragel version specific features it's not really needed to check for the version.	2014-05-06 10:18:52 +02:00
Yorick Peterse	eeeeb0efad	Don't track the generated Java lexer.	2014-05-06 10:11:19 +02:00
Yorick Peterse	d2742cfdde	Use 4 spaces for C/Java code.	2014-05-06 09:41:36 +02:00
Yorick Peterse	4e2dca2fd9	Updated the list of files to clean.	2014-05-06 09:29:02 +02:00
Yorick Peterse	b9cb7c2d7c	Corrected various extension paths.	2014-05-06 08:47:02 +02:00
Yorick Peterse	01a4a53a53	Merge branch 'native-ext' of github.com:YorickPeterse/oga into native-ext	2014-05-06 08:44:57 +02:00
Yorick Peterse	c30d3a7627	Half-assed JRuby boilerplate. Blowing my brains out over getting this fat pig to do what I want but we're getting there.	2014-05-06 00:23:07 +02:00
Yorick Peterse	2b3a6be24d	Use liboga as a prefix in the C code. Namespaces? What are those?	2014-05-05 21:19:50 +02:00
Yorick Peterse	ee756037e7	Removed unused YARD tag.	2014-05-05 09:45:10 +02:00
Yorick Peterse	aeab885a7f	Docs for the Ruby part of the XML lexer.	2014-05-05 09:44:35 +02:00
Yorick Peterse	57fd4dff64	Docs for the C lexer.	2014-05-05 09:40:08 +02:00
Yorick Peterse	335f3cc6d6	Use rb_enc_str_new instead of rb_enc_str_new_cstr. The latter in combination with strndup() would leak large amounts of memory.	2014-05-05 00:34:19 +02:00
Yorick Peterse	2689d3f65a	Initial setup using a C extension. While I've tried to keep Oga pure Ruby for as long as possible the performance of Ragel's Ruby output was not worth the trouble. For example, lexing 10MB of XML would take 5 to 6 seconds at least. Nokogiri on the other hand can parse that same XML into a DOM document in about 300 miliseconds. Such a big performance difference is not acceptable. To work around this the XML/HTML lexer will be implemented in C for MRI/Rubinius and Java for JRuby. For now there's only a C extension as I haven't read up yet on the JRuby API. The end goal is to provide some sort of Ragel "template" that can be used to generate the corresponding C/Java extension code. This would remove the need of duplicating the grammar and associated code. The native extension setup is a hybrid between native and Ruby. The raw Ragel stuff happens in C/Java while the actual logic of actions happens in Ruby. This adds a small amount of overhead but makes it much easier to maintain the lexer. Even with this extra overhead the performance is much better than pure Ruby. The 10MB of XML mentioned above is lexed in about 600 miliseconds. In other words, it's 10 times faster.	2014-05-05 00:31:28 +02:00
Yorick Peterse	baaa24a760	Indentation fix in the lexer.	2014-05-04 18:06:43 +02:00
Yorick Peterse	f18e8893de	Removed the buffering crap from the lexer.	2014-05-04 17:39:08 +02:00
Yorick Peterse	57255012b7	Patch the Ragel lexer after generating it. This further increases throughput of the lexer. On MRI this seems to save around one second or so. It now sits at ~6,8 seconds in the big XML benchmark. On JRuby, combined with some JIT options and invoke dynamic enabled, this can reduce the average lexing time to around 3,5 seconds. Rubinius, also with a few aggressive JIT options, seems to stick around 9 seocnds.	2014-05-02 00:40:10 +02:00
Yorick Peterse	9dfdefee47	Removed XML::Lexer#buffering? Instead of wrapping a predicate method around the ivar we'll just access it directly. This reduces average lexing times in the big XML benchmark from 7,5 to ~7 seconds.	2014-05-01 22:59:56 +02:00
Yorick Peterse	b854f737cd	Run memory profiling for 60 seconds.	2014-05-01 21:47:51 +02:00
Yorick Peterse	676a5333c0	Use a default gnuplot script.	2014-05-01 21:27:08 +02:00
Yorick Peterse	3344f373bd	Plot time offsets on X axes when profiling.	2014-05-01 21:26:05 +02:00
Yorick Peterse	f4a71d7f63	Use wx as a gnuplot terminal. This allows users to zoom in and such, which doesn't work on the qt terminal for some reason.	2014-05-01 21:01:25 +02:00
Yorick Peterse	e33bb6f901	Remove sample files when running rake clean.	2014-05-01 20:57:17 +02:00
Yorick Peterse	1c35317165	Revamped the profiling setup. This removes the need for dozens of standalone gnuplot scripts, adds extra profiling data and makes the actual profiling easier.	2014-05-01 20:54:25 +02:00
Yorick Peterse	e54d77fc2f	Cleaned up the average timing benchmark.	2014-05-01 13:43:33 +02:00
Yorick Peterse	203aea6b1a	Cleaned up benchmarking code.	2014-05-01 13:08:44 +02:00
Yorick Peterse	ebf9099f0e	Dropped the benchmark_ prefixes. These files reside in a benchmark/ directory. Gee, I wonder what they do.	2014-05-01 13:03:21 +02:00
Yorick Peterse	20f2f256f6	Benchmark for measuring average lexing times.	2014-05-01 13:01:52 +02:00
Yorick Peterse	f607cf50dc	Use local variables for Ragel. Instead of using instance variables for ts, te, etc we'll use local variables. Grand wizard overloard @whitequark suggested that this would be quite a bit faster, which turns out to be true. For example, the big XML lexer benchmark would, prior to this commit, complete in about 9 - 9,3 seconds. With this commit that hovers around 8,5 seconds.	2014-05-01 13:00:29 +02:00
Yorick Peterse	e26d5a8664	Removed unused variable in a lexer benchmark.	2014-05-01 12:25:49 +02:00
Yorick Peterse	2f36692abe	Fixed the big XML lexer benchmark.	2014-04-30 09:28:28 +02:00
Yorick Peterse	83f6d5437e	Contextual pull parsing. This adds the ability to more easily act upon specific node types and nestings when using the pull parsing API. A basic example of this API looks like the following (only including relevant code): parser.parse do \|node\| parser.on(:element, %w{people person}) do people << {:name => nil, :age => nil} end parser.on(:text, %w{people person name}) do people.last[:name] = node.text end parser.on(:text, %w{people person age}) do people.last[:age] = node.text.to_i end end This fixes #6.	2014-04-29 23:05:49 +02:00
Yorick Peterse	1a413998a3	Track the current node in the pull parser. The current node is tracked in the instance method `node`.	2014-04-29 21:21:05 +02:00
Yorick Peterse	d0b3653785	Updated the manifest, again.	2014-04-29 20:42:17 +02:00
Yorick Peterse	5339664f33	Include .yardopts in the Gem.	2014-04-29 20:42:09 +02:00
Yorick Peterse	8522a82cf9	Updated the manifest.	2014-04-29 20:41:11 +02:00
Yorick Peterse	503b254216	Generate files before generting the manifest.	2014-04-29 20:41:02 +02:00
Yorick Peterse	586c8f1d46	Generated an initial manifest.	2014-04-29 20:40:34 +02:00
Yorick Peterse	59dae873e4	Don't rely on Git for generating the MANIFEST. When using Git the resulting Gem will contain far too many useless files. For example, the profile/ and spec/ directories are not needed when building Gems.	2014-04-29 20:39:20 +02:00

1 2 3 4 5

204 Commits All Branches Search

204 Commits

All Branches