Yorick Peterse
25edd2de00
Use a Set for storing void element names.
2014-04-10 12:28:47 +02:00
Yorick Peterse
b96f7c4852
Lex attributes with namespaces.
...
These are lexed as just the name instead of two separate tokens.
2014-04-10 11:01:49 +02:00
Yorick Peterse
c974b96b88
Truncate lines in parser errors.
...
The offending lines of code displayed in the error message are truncated to 80
characters. This should make reading the error messages less of a pain when
dealing with very long lines of HTML/XML.
2014-04-10 10:08:51 +02:00
Yorick Peterse
292a98d7f6
Basic benchmarks for the Parser class.
2014-04-10 10:05:04 +02:00
Yorick Peterse
8ca7781842
Updated the lexer benchmarks.
...
These had to be updated for the API changes of Oga::XML::Lexer.
2014-04-10 10:01:11 +02:00
Yorick Peterse
8237d5791d
Stream tokens when lexing.
...
Instead of returning the tokens as a whole they are now streamed using
XML::Lexer#advance. This method returns the next token upon every call. It uses
a small buffer in case a particular block of text results in multiple tokens.
2014-04-09 22:08:13 +02:00
Yorick Peterse
e9bb97d261
First steps towards making the lexer stream tokens
2014-04-09 19:32:06 +02:00
Yorick Peterse
10d0ec1573
Specs for parsing various empty nodes.
2014-04-07 21:33:23 +02:00
Yorick Peterse
cb74c7edf9
Specs for XML parser errors.
2014-04-07 21:31:36 +02:00
Yorick Peterse
915d3ee505
Expanded tests for XML::Document#inspect.
2014-04-07 20:11:12 +02:00
Yorick Peterse
e9412c9c4e
Tests for various inspect methods.
2014-04-07 09:58:31 +02:00
Yorick Peterse
54ef125637
Basic docs for everything under Oga::XML.
2014-04-04 17:48:36 +02:00
Yorick Peterse
13a9228563
Properly indent doctype/XML decl inspect values.
2014-04-04 11:13:39 +02:00
Yorick Peterse
37a12722cb
Rough setup for a custom #inspect format.
...
This format is a lot more readable than the default Ruby #inspect format
(mostly due to not including previous/next/parent nodes).
2014-04-04 00:41:29 +02:00
Yorick Peterse
a2c525dd7c
Insert newlines after XML dec/doctypes.
2014-04-03 23:04:21 +02:00
Yorick Peterse
230fafa2d3
Document should not inherit from Node.
...
A document is not an XML node on itself. If logic has to be shared between the
Document and the Node class I'll resort to using mixins for this.
2014-04-03 22:45:40 +02:00
Yorick Peterse
c077988dd6
Tree building of doctypes.
2014-04-03 22:44:00 +02:00
Yorick Peterse
81b1155af3
Lex/parse doctype names separately.
2014-04-03 21:59:57 +02:00
Yorick Peterse
8185656c1e
Fixed typ.
2014-04-03 21:41:31 +02:00
Yorick Peterse
6cf906e500
Lexer tests for single quoted attributes.
2014-04-03 18:50:07 +02:00
Yorick Peterse
30c01a5aee
Tests for XML::TreeBuilder#handler_missing.
2014-04-03 09:43:30 +02:00
Yorick Peterse
0f129ceac9
Tests for XML::TreeBuilder#on_comment.
2014-04-03 09:38:18 +02:00
Yorick Peterse
bdb76cefc5
Dedicated handling of XML declaration nodes.
2014-04-02 22:30:45 +02:00
Yorick Peterse
d6c0a1f3f3
Lex/parser XML declaration attributes.
2014-04-02 22:01:17 +02:00
Yorick Peterse
fa2e71c790
Tests for TreeBuilder#on_document.
2014-03-28 18:52:08 +01:00
Yorick Peterse
f99c13b516
Tests + docs for the TreeBuilder class.
2014-03-28 17:11:54 +01:00
Yorick Peterse
6d866523b8
Renamed XML::Builder to XML::TreeBuilder.
2014-03-28 16:37:37 +01:00
Yorick Peterse
331726b2ca
Tests for the various XML node types.
2014-03-28 16:34:30 +01:00
Yorick Peterse
c366a96ce8
Rake task for generating code coverage.
2014-03-28 16:33:47 +01:00
Yorick Peterse
e141c084f9
Dedicated DOM builder class for CDATA tags.
2014-03-28 09:27:53 +01:00
Yorick Peterse
2b250bbf42
Rough DOM building setup.
2014-03-28 08:59:48 +01:00
Yorick Peterse
6ae52c1b12
Initial rough sketches for the DOM API.
2014-03-26 18:12:00 +01:00
Yorick Peterse
6c661f3ee9
Removed the donations section.
...
I gave this some thought and I've removed it for two reasons:
1. My Dogecoin Wallet takes *forever* to sync with the network (13 weeks
behind) so I uninstalled it. I can't be bothered waiting forever for a
gimmick.
2. I don't like asking for donations/money. I'd much rather have people send me
an Email thanking me for my work than for them to donate money. The latter
means much more to me.
2014-03-25 23:55:10 +01:00
Yorick Peterse
4a48647d1e
Removed generated lexer/parser.
...
I am a dumbass.
2014-03-25 21:47:40 +01:00
Yorick Peterse
fb626278a8
Re-wrapped comments in the XML lexer.
2014-03-25 10:12:39 +01:00
Yorick Peterse
8ebd72158c
Renamed XML::Lexer#t to #emit().
2014-03-25 09:42:52 +01:00
Yorick Peterse
79818eb349
Added a convenience class for parsing HTML.
...
This removes the need for users having to set the `:html` option themselves.
2014-03-25 09:40:24 +01:00
Yorick Peterse
58009614f6
Moved XML specs into spec/oga/xml.
2014-03-25 09:36:39 +01:00
Yorick Peterse
7c03de0e2f
Renamed HTML_PARSER to PARSER_OUTPUT.
...
This keeps it consistent with the lexer.
2014-03-25 09:35:48 +01:00
Yorick Peterse
eae13d21ed
Namespaced the lexer/parser under Oga::XML.
...
With the upcoming XPath and CSS selector lexers/parsers it will be confusing to
keep these in the root namespace.
2014-03-25 09:34:38 +01:00
Yorick Peterse
2259061c89
Don't require the 2nd Lexer#add_token argument.
2014-03-24 21:35:47 +01:00
Yorick Peterse
641c54261e
Simplified lexer output for comments.
2014-03-24 21:34:30 +01:00
Yorick Peterse
eaf1669b07
Simplified lexer output for CDATA tags.
2014-03-24 21:33:05 +01:00
Yorick Peterse
470be5a839
Simplified the lexer output for doctypes.
2014-03-24 21:32:16 +01:00
Yorick Peterse
ac775918ee
Lexing/parsing of XML declaration tags.
...
This closes #12 .
2014-03-24 21:30:19 +01:00
Yorick Peterse
b695ecf0df
Renamed element lexer tags.
...
T_ELEM_OPEN has been renamed to T_ELEM_START, T_ELEM_CLOSE has been renamed to
T_ELEM_END. This keeps the token names consistent with the other ones (e.g.
T_COMMENT_START).
2014-03-24 20:32:43 +01:00
Yorick Peterse
0b6ba6e6b5
Fixed typ.
2014-03-24 20:20:19 +01:00
Yorick Peterse
ca66339a08
README entry on donations.
2014-03-24 20:13:16 +01:00
Yorick Peterse
52abc9d29e
Basic documentation for Oga::Parser.
2014-03-23 21:29:57 +01:00
Yorick Peterse
19c1d66287
Use String#unpack instead of String#codepoints.
...
The latter returns an Enumerable which on Ruby 1.9.3 doesn't have #length
available. Besides this it's better to just return an Array since we'll iterate
over every character anyway.
2014-03-23 21:21:27 +01:00