core/oga - oga

Commit Graph

Author	SHA1	Message	Date
Yorick Peterse	2ec91f130f	Lazy decoding of XML/HTML entities. Instead of decoding entities in the lexer we'll do this whenever XML::Text#text is called. This removes the overhead from the parsing phase and ensures the process is only triggered when actually needed. Note that calling #to_xml and/or the #inspect methods on a Text (or parent) instance will also trigger the entity conversion process. The new entity decoding API supports both regular entities (e.g. &) as well as codepoint based entities (both regular and hexadecimal codepoints). To allow safe read-only access to Text instances from multiple threads a mutex is used. This mutex ensures that only 1 thread can trigger the conversion process. Fixes #68	2015-03-05 23:00:43 +01:00
Yorick Peterse	7409257702	Replaced HTML benchmark fixtures. The new fixture is the HTML of a person article which contains a few HTML entities.	2015-03-05 22:58:22 +01:00
Yorick Peterse	7e847a0ae9	Make C90 happy.	2015-03-05 22:57:51 +01:00
Yorick Peterse	33c46a1841	Use ID instead of VALUE for callback names in C.	2015-03-05 22:57:51 +01:00
Yorick Peterse	3e05593536	Release 0.2.3	2015-03-04 11:56:23 +01:00
Yorick Peterse	aa42cc9ce7	Updated changelog for 0.2.3	2015-03-04 11:49:24 +01:00
Yorick Peterse	3b2055a30b	Refactored handling of literal HTML elements. This ensures newlines can appear in <style> / <script> tags when using IOs as input.	2015-03-04 11:44:31 +01:00
Yorick Peterse	78e40b55c0	Handle parsing of HTML <style> tags. This basically re-applies the technique used for HTML <script> tags. With this extra addition I decided to rename/normalize a few things so it's easier to add any extra tags in the future. One downside of this setup is that the following will not be parsed by Oga: <style> </script> </style> The same applies to script tags containing a literal </style> tag. Since this particular case is rather unlikely to occur I'm OK with not supporting it as it _does_ simplify the lexer quite a bit. Fixes #80	2015-03-03 16:28:05 +01:00
Yorick Peterse	73534375d5	Release 0.2.2	2015-03-03 13:36:32 +01:00
Yorick Peterse	142b467277	Set parent of nodes set using Element#inner_text= This ensures that any text nodes created using Element#inner_text= have their parent node set correctly.	2015-03-03 13:13:05 +01:00
Yorick Peterse	503efc32cd	Release 0.2.1	2015-03-02 22:12:49 +01:00
Yorick Peterse	bc74d31bb5	Updated changelog for 0.2.1.	2015-03-02 17:44:08 +01:00
Yorick Peterse	874d7124af	Don't convert <script> text to XML entities. Fixes #79.	2015-03-02 17:32:19 +01:00
Yorick Peterse	9a586363e9	Added XML::Document#html?	2015-03-02 16:39:40 +01:00
Yorick Peterse	ba2177e2cf	Lex contents of <script> tags as plain text. When lexing input in HTML mode the lexer has to treat _all_ content of a <script> tag as plain text. This ensures that the lexer can process input such as "x <y" and "// <foo>" correctly. Fixes #70.	2015-03-02 16:22:09 +01:00
Yorick Peterse	351b5ac004	Added spec for lexing inline HTML script tags. Related issue: #70	2015-03-02 16:20:06 +01:00
Yorick Peterse	8fdf27dcef	Removed unused C lexer macros.	2015-03-02 15:43:47 +01:00
Yorick Peterse	8b910c700d	Updated EditorConfig file for ruby-ll files.	2015-02-13 09:38:29 +01:00
Yorick Peterse	c68b038e53	Added benchmark for the CSS parser.	2015-02-13 09:36:24 +01:00
Yorick Peterse	f94461a9ca	Upload docs to S3.	2015-01-17 18:00:05 +01:00
Yorick Peterse	2d03ce8e51	Run tests on MRI 2.2.	2015-01-09 21:37:09 +01:00
Yorick Peterse	47a3c5e7f8	Use describe/it instead of context/example. This keeps things consistent with the general testing guidelines in the Ruby community. This in turn should hopefully make my life easier as I don't have to tell people to use this rather odd stlye I was using before.	2015-01-08 23:01:53 +01:00
Yorick Peterse	e138aa15ac	Removed stray comment in the XPath parser.	2014-12-28 23:55:33 +01:00
Yorick Peterse	746c8052dd	Remove all nodes when calling Element#inner_text= This fixes #64.	2014-12-14 23:32:43 +01:00
Yorick Peterse	739f885078	Use ID instead of VALUE for C Symbols. Thanks to @cremno for bringing this up.	2014-11-29 12:53:55 +01:00
Yorick Peterse	b006289c5f	Removed extra space in c/lexer.rl	2014-11-23 22:12:18 +01:00
Yorick Peterse	5e24a3d1e5	Short docs on lexer callback names.	2014-11-23 20:20:14 +01:00
Yorick Peterse	4fa88fcbde	Cache rb_intern/symbol lookups in the lexer. For JRuby this has little to no benefits as it uses strings for method names. However, both MRI and Rubinius will perform a Symbol lookup whenever rb_intern() is called. By doing this once for all callback names and caching the resulting VALUE objects the lexer timings can be reduced by about 25%. In case of the benchmark benchmark/xml/lexer/string_average_bench.rb this means it runs in around 500ms instead of 700ms.	2014-11-22 01:53:37 +01:00
Yorick Peterse	a10fe855d7	Merge pull request #67 from krasnoukhov/xml-entities Add missing entities to the decode/encode lists	2014-11-21 01:12:24 +01:00
Dmitry Krasnoukhov	26baf89440	Add missing entities to the decode/encode lists	2014-11-21 01:53:11 +02:00
Yorick Peterse	81c49b5101	Contributing notes on thread-safety/require usage.	2014-11-20 20:09:41 +01:00
Yorick Peterse	cbb2815146	Support for inline doctype rules plus newlines. This adds support for lexing/parsing XML documents that use an IO as input _and_ contain doctype rules with newlines in them. This fixes #63.	2014-11-18 20:02:55 +01:00
Yorick Peterse	f88df486ba	README example on using Enumerator for input.	2014-11-17 23:59:30 +01:00
Yorick Peterse	b8f9d04b17	Added checksums for v0.2.0	2014-11-17 23:31:40 +01:00
Yorick Peterse	ae17e7f137	Clean before building any Gem.	2014-11-17 23:28:57 +01:00
Yorick Peterse	922cee913d	Release 0.2.0	2014-11-17 23:26:19 +01:00
Yorick Peterse	6f50e79f15	Added changelog link to the CSS ticket.	2014-11-17 23:24:35 +01:00
Yorick Peterse	c4a5e8d4b4	Updated the changelog.	2014-11-17 23:23:13 +01:00
Yorick Peterse	72c8cafcb1	Added README example on using CSS selectors.	2014-11-17 23:21:55 +01:00
Yorick Peterse	c253254e24	Prefix version tags with "v". This makes them stand out as versions a bit more.	2014-11-17 22:42:16 +01:00
Yorick Peterse	ad4f650c5d	Fixed XML entity encoding/decoding ordering. Thanks to @krasnoukhov for providing the initial patch, which this commit is largely based on. This fixes #49.	2014-11-17 22:39:43 +01:00
Yorick Peterse	675eb562e2	Basic docs on manually creating documents.	2014-11-17 09:13:13 +01:00
Yorick Peterse	cd86d5d294	Allow removal of element attributes.	2014-11-17 09:00:40 +01:00
Yorick Peterse	104576b4ad	Removed .ruby-version Oga doesn't really require an exact version during development, so lets get rid of this.	2014-11-17 00:12:01 +01:00
Yorick Peterse	804646cc5e	Don't modify raw namespaces. When calling Element#available_namespaces the list of namespaces returned by Element#namespaces must not be modified.	2014-11-17 00:01:16 +01:00
Yorick Peterse	a4459c866f	Basic docs on using XML namespaces.	2014-11-16 23:59:05 +01:00
Yorick Peterse	6753d6a26d	Slightly better docs for the XPath/CSS parsers.	2014-11-16 23:40:19 +01:00
Yorick Peterse	57adabc068	Ensure SAX after_element receives meaningful args This changes the behaviour of after_element when parsing documents using the SAX parsing API. Previously it would always receive a nil argument, which is kinda pointless. This commit changes that by making sure it receives a namespace name (if any) and the element name. This fixes #54.	2014-11-16 23:32:32 +01:00
Yorick Peterse	8c8ecce447	Added README note on default namespaces. This closes #57.	2014-11-16 23:08:32 +01:00
Yorick Peterse	67abe7731e	Added CSS selectors to the list of features.	2014-11-16 22:55:44 +01:00

1 2 3 4 5 ...

796 Commits All Branches Search

796 Commits

All Branches