core/oga - oga

Commit Graph

Author	SHA1	Message	Date
Yorick Peterse	c528a0d6a7	Build all branches on Travis.	2015-03-21 01:22:59 +01:00
Yorick Peterse	d210c9fb57	Compacted a few XML parser rules.	2015-03-21 01:22:59 +01:00
Yorick Peterse	a5cd75cb7e	Removed useless string allocs from the XML parser.	2015-03-21 01:22:59 +01:00
Yorick Peterse	fdcd712ffe	Don't use Array#uniq in NodeSet#initialize. Removing this makes the process of parsing larger XML documents a bit faster. The downside is that NodeSet#initialize will no longer filter out duplicate nodes, though this is not something Oga itself relies upon. Methods such as NodeSet#push still do ignore elements already present.	2015-03-21 01:22:59 +01:00
Yorick Peterse	c36b35ac0f	Skip ownership iteration when there's no owner. There's no point in iterating over all the nodes and assigning ownership if there's no owner to begin with.	2015-03-21 01:22:59 +01:00
Yorick Peterse	f83c03aaec	Fixed typo in NodeSet spec.	2015-03-21 01:22:59 +01:00
Yorick Peterse	9621fe1fc8	Moved changelog to the root directory.	2015-03-21 01:22:59 +01:00
Yorick Peterse	006ef4d51a	Port over most of the old XML error handling. Some messages are a bit different due to ruby-ll's error handling, other than that it's largely the same stuff as before.	2015-03-21 01:22:59 +01:00
Yorick Peterse	1a326fc516	Remove Racc based XML parser.	2015-03-21 01:22:59 +01:00
Yorick Peterse	1b9a4db268	Depend on ruby-ll 1.1 or newer.	2015-03-21 01:22:59 +01:00
Yorick Peterse	d8b9725b82	Fixed SAX parsing of XML attributes. This was utterly broken, mainly due to me overlooking it. There are now 2 new callbacks to handle this properly: * on_attribute: to handle a single attribute/value pair * on_attributes: to handle a collection of attributes (as returned by on_attribute) By default on_attribut returns a Hash, on_attributes in turn merges all attribute hashes into a single one. This ensures that on_element _actually_ receives the attributes as a Hash, instead of an Array with random nil/XML::Attribute values.	2015-03-21 01:22:59 +01:00
Yorick Peterse	605d565104	Use sax_parse_html for HTML documents. I suspect the only reason this test ever passed due to Racc's error handling. Either way this was using the wrong method.	2015-03-21 01:22:59 +01:00
Yorick Peterse	dd626c10d3	Use Array#unshift in the LL XML grammar. Using Array#+ for large sets (e.g. in the benchmarks) is _really_ slow. Interesting enough Array#unshift uses as much memory as the Racc parser and is about as fast, even though it has to move memory around.	2015-03-21 01:22:59 +01:00
Yorick Peterse	f94407ee9d	Parser callback for XML attributes.	2015-03-21 01:22:59 +01:00
Yorick Peterse	a023b35e78	Fixed the pull parser for the XML LL parser.	2015-03-21 01:22:59 +01:00
Yorick Peterse	5eed0d31d6	Ported over most of the XML parser to ruby-ll. This is still missing the error handling previously present.	2015-03-21 01:22:59 +01:00
Yorick Peterse	15a3ab9ba5	ruby-ll: full support for parsing doctypes.	2015-03-21 01:22:59 +01:00
Yorick Peterse	71aefb53cc	Started porting the XML parser to ruby-ll This is far from done.	2015-03-21 01:22:59 +01:00
Yorick Peterse	2f67399784	Use 72 characters for Git instead of 80. This follows the Linux/universal Git guidelines more closely.	2015-03-16 14:58:20 +01:00
Yorick Peterse	2ec91f130f	Lazy decoding of XML/HTML entities. Instead of decoding entities in the lexer we'll do this whenever XML::Text#text is called. This removes the overhead from the parsing phase and ensures the process is only triggered when actually needed. Note that calling #to_xml and/or the #inspect methods on a Text (or parent) instance will also trigger the entity conversion process. The new entity decoding API supports both regular entities (e.g. &) as well as codepoint based entities (both regular and hexadecimal codepoints). To allow safe read-only access to Text instances from multiple threads a mutex is used. This mutex ensures that only 1 thread can trigger the conversion process. Fixes #68	2015-03-05 23:00:43 +01:00
Yorick Peterse	7409257702	Replaced HTML benchmark fixtures. The new fixture is the HTML of a person article which contains a few HTML entities.	2015-03-05 22:58:22 +01:00
Yorick Peterse	7e847a0ae9	Make C90 happy.	2015-03-05 22:57:51 +01:00
Yorick Peterse	33c46a1841	Use ID instead of VALUE for callback names in C.	2015-03-05 22:57:51 +01:00
Yorick Peterse	3e05593536	Release 0.2.3	2015-03-04 11:56:23 +01:00
Yorick Peterse	aa42cc9ce7	Updated changelog for 0.2.3	2015-03-04 11:49:24 +01:00
Yorick Peterse	3b2055a30b	Refactored handling of literal HTML elements. This ensures newlines can appear in <style> / <script> tags when using IOs as input.	2015-03-04 11:44:31 +01:00
Yorick Peterse	78e40b55c0	Handle parsing of HTML <style> tags. This basically re-applies the technique used for HTML <script> tags. With this extra addition I decided to rename/normalize a few things so it's easier to add any extra tags in the future. One downside of this setup is that the following will not be parsed by Oga: <style> </script> </style> The same applies to script tags containing a literal </style> tag. Since this particular case is rather unlikely to occur I'm OK with not supporting it as it _does_ simplify the lexer quite a bit. Fixes #80	2015-03-03 16:28:05 +01:00
Yorick Peterse	73534375d5	Release 0.2.2	2015-03-03 13:36:32 +01:00
Yorick Peterse	142b467277	Set parent of nodes set using Element#inner_text= This ensures that any text nodes created using Element#inner_text= have their parent node set correctly.	2015-03-03 13:13:05 +01:00
Yorick Peterse	503efc32cd	Release 0.2.1	2015-03-02 22:12:49 +01:00
Yorick Peterse	bc74d31bb5	Updated changelog for 0.2.1.	2015-03-02 17:44:08 +01:00
Yorick Peterse	874d7124af	Don't convert <script> text to XML entities. Fixes #79.	2015-03-02 17:32:19 +01:00
Yorick Peterse	9a586363e9	Added XML::Document#html?	2015-03-02 16:39:40 +01:00
Yorick Peterse	ba2177e2cf	Lex contents of <script> tags as plain text. When lexing input in HTML mode the lexer has to treat _all_ content of a <script> tag as plain text. This ensures that the lexer can process input such as "x <y" and "// <foo>" correctly. Fixes #70.	2015-03-02 16:22:09 +01:00
Yorick Peterse	351b5ac004	Added spec for lexing inline HTML script tags. Related issue: #70	2015-03-02 16:20:06 +01:00
Yorick Peterse	8fdf27dcef	Removed unused C lexer macros.	2015-03-02 15:43:47 +01:00
Yorick Peterse	8b910c700d	Updated EditorConfig file for ruby-ll files.	2015-02-13 09:38:29 +01:00
Yorick Peterse	c68b038e53	Added benchmark for the CSS parser.	2015-02-13 09:36:24 +01:00
Yorick Peterse	f94461a9ca	Upload docs to S3.	2015-01-17 18:00:05 +01:00
Yorick Peterse	2d03ce8e51	Run tests on MRI 2.2.	2015-01-09 21:37:09 +01:00
Yorick Peterse	47a3c5e7f8	Use describe/it instead of context/example. This keeps things consistent with the general testing guidelines in the Ruby community. This in turn should hopefully make my life easier as I don't have to tell people to use this rather odd stlye I was using before.	2015-01-08 23:01:53 +01:00
Yorick Peterse	e138aa15ac	Removed stray comment in the XPath parser.	2014-12-28 23:55:33 +01:00
Yorick Peterse	746c8052dd	Remove all nodes when calling Element#inner_text= This fixes #64.	2014-12-14 23:32:43 +01:00
Yorick Peterse	739f885078	Use ID instead of VALUE for C Symbols. Thanks to @cremno for bringing this up.	2014-11-29 12:53:55 +01:00
Yorick Peterse	b006289c5f	Removed extra space in c/lexer.rl	2014-11-23 22:12:18 +01:00
Yorick Peterse	5e24a3d1e5	Short docs on lexer callback names.	2014-11-23 20:20:14 +01:00
Yorick Peterse	4fa88fcbde	Cache rb_intern/symbol lookups in the lexer. For JRuby this has little to no benefits as it uses strings for method names. However, both MRI and Rubinius will perform a Symbol lookup whenever rb_intern() is called. By doing this once for all callback names and caching the resulting VALUE objects the lexer timings can be reduced by about 25%. In case of the benchmark benchmark/xml/lexer/string_average_bench.rb this means it runs in around 500ms instead of 700ms.	2014-11-22 01:53:37 +01:00
Yorick Peterse	a10fe855d7	Merge pull request #67 from krasnoukhov/xml-entities Add missing entities to the decode/encode lists	2014-11-21 01:12:24 +01:00
Dmitry Krasnoukhov	26baf89440	Add missing entities to the decode/encode lists	2014-11-21 01:53:11 +02:00
Yorick Peterse	81c49b5101	Contributing notes on thread-safety/require usage.	2014-11-20 20:09:41 +01:00

... 5 6 7 8 9 ...

1115 Commits All Branches Search

1115 Commits

All Branches