Yorick Peterse
0a396043f8
Support for parsing CDATA tags.
2014-03-11 22:22:02 +01:00
Yorick Peterse
c9592856f0
Updated parsing of doctypes.
...
The resulting nodes now separate the type, public and system IDs in to separate
string values.
2014-03-11 22:08:21 +01:00
Yorick Peterse
4a41894e2c
Updated the doctype parser specs.
2014-03-11 22:02:26 +01:00
Yorick Peterse
8ce76be050
Moved the parser class to Oga::Parser.
...
Oga will use the same parser for XML and HTML so it doesn't make sense to
separate the two into different namespaces (at least for now).
2014-03-11 22:01:50 +01:00
Yorick Peterse
eacd9b88cf
Reworked token generation for elements.
...
This emits separate tokens for the start tag (T_ELEMENT_OPEN) and name
(T_ELEMENT_NAME). This makes it easier to include the namespace of an element
(T_ELEMENT_NS) in the output.
2014-03-10 23:50:39 +01:00
Yorick Peterse
cd53d5e426
Fixed advancing column numbers.
...
In a bunch of cases the column number would not be increased correctly.
2014-03-07 23:54:56 +01:00
Yorick Peterse
1c9a6c8b76
Tests for nested tags/text nodes.
...
Well guess what, apparently that did work. That was slightly unexpected.
2014-03-03 22:13:29 +01:00
Yorick Peterse
a5a3b8db3f
Basic lexing of HTML tags.
...
The current implementation is a bit messy. In particular the counting of column
numbers is not entirely the way it should be. There are also some problems with
nested tags/text that I still have to resolve.
2014-03-03 22:08:46 +01:00
Yorick Peterse
d9ef33e1f8
Lexing of comments.
...
This fixes #4 .
2014-02-28 23:27:23 +01:00
Yorick Peterse
ca6f422036
Lexing of doctypes.
...
This comes with various structural changes to the lexer as I'm slowly starting
to get the hang of Ragel. Ragel is a beast but damn it's an awesome piece of
software.
Note that the doctype public/system IDs are lexed as T_STRING. The parser will
figure out whether a ID is a public or system ID based on the order.
This fixes #1
2014-02-28 23:08:55 +01:00
Yorick Peterse
2294bf19f4
Better lexing of CDATA tags.
...
This means the lexer is now capable of lexing CDATA tags that contain text such
as ]].
2014-02-28 20:05:12 +01:00
Yorick Peterse
4883ac7384
Lexing of CDATA tags.
2014-02-28 00:03:37 +01:00
Yorick Peterse
c011e2faaa
Moved the lexer specs to spec/oga/lexer.
...
I accidently moved these inside the parser specs.
2014-02-27 21:30:10 +01:00
Yorick Peterse
cdaa14a28e
Broke up lexer specs into separate files.
2014-02-27 20:55:29 +01:00
Yorick Peterse
2c82f88f6c
Basic lexing + parsing of doctypes.
...
We're doing these the lazy way. I can't be bothered writing patterns/rules for
4 different formats for something such as doctypes.
2014-02-27 01:27:51 +01:00
Yorick Peterse
c4e0406ed9
Lexing of CDATA tags.
2014-02-26 22:01:07 +01:00
Yorick Peterse
0a336e76d3
Renamed T_EXCLAMATION to T_BANG.
...
This is way easier to type.
2014-02-26 21:54:27 +01:00
Yorick Peterse
684eccd3e2
Lex dashes as T_DASH instead of T_TEXT.
2014-02-26 21:52:32 +01:00
Yorick Peterse
39bbe5afc4
Expanded lexer tag/attribute tests.
2014-02-26 21:48:46 +01:00
Yorick Peterse
d32888f803
Basic lexer setup/tests.
...
Too lazy to do this the right way. ᕕ(ᐛ)ᕗ
2014-02-26 21:36:30 +01:00
Yorick Peterse
5755c325bd
Imported a half-assed lexer.
2014-02-26 19:54:11 +01:00
Yorick Peterse
702477ca28
Basic project layout.
2014-02-26 19:50:16 +01:00