Commit Graph

71 Commits

Author SHA1 Message Date
Yorick Peterse c9592856f0 Updated parsing of doctypes.
The resulting nodes now separate the type, public and system IDs in to separate
string values.
2014-03-11 22:08:21 +01:00
Yorick Peterse 4a41894e2c Updated the doctype parser specs. 2014-03-11 22:02:26 +01:00
Yorick Peterse 8ce76be050 Moved the parser class to Oga::Parser.
Oga will use the same parser for XML and HTML so it doesn't make sense to
separate the two into different namespaces (at least for now).
2014-03-11 22:01:50 +01:00
Yorick Peterse eacd9b88cf Reworked token generation for elements.
This emits separate tokens for the start tag (T_ELEMENT_OPEN) and name
(T_ELEMENT_NAME). This makes it easier to include the namespace of an element
(T_ELEMENT_NS) in the output.
2014-03-10 23:50:39 +01:00
Yorick Peterse cd53d5e426 Fixed advancing column numbers.
In a bunch of cases the column number would not be increased correctly.
2014-03-07 23:54:56 +01:00
Yorick Peterse 1c9a6c8b76 Tests for nested tags/text nodes.
Well guess what, apparently that did work. That was slightly unexpected.
2014-03-03 22:13:29 +01:00
Yorick Peterse a5a3b8db3f Basic lexing of HTML tags.
The current implementation is a bit messy. In particular the counting of column
numbers is not entirely the way it should be. There are also some problems with
nested tags/text that I still have to resolve.
2014-03-03 22:08:46 +01:00
Yorick Peterse d9ef33e1f8 Lexing of comments.
This fixes #4.
2014-02-28 23:27:23 +01:00
Yorick Peterse ca6f422036 Lexing of doctypes.
This comes with various structural changes to the lexer as I'm slowly starting
to get the hang of Ragel. Ragel is a beast but damn it's an awesome piece of
software.

Note that the doctype public/system IDs are lexed as T_STRING. The parser will
figure out whether a ID is a public or system ID based on the order.

This fixes #1
2014-02-28 23:08:55 +01:00
Yorick Peterse 2294bf19f4 Better lexing of CDATA tags.
This means the lexer is now capable of lexing CDATA tags that contain text such
as ]].
2014-02-28 20:05:12 +01:00
Yorick Peterse 4883ac7384 Lexing of CDATA tags. 2014-02-28 00:03:37 +01:00
Yorick Peterse c011e2faaa Moved the lexer specs to spec/oga/lexer.
I accidently moved these inside the parser specs.
2014-02-27 21:30:10 +01:00
Yorick Peterse cdaa14a28e Broke up lexer specs into separate files. 2014-02-27 20:55:29 +01:00
Yorick Peterse 2c82f88f6c Basic lexing + parsing of doctypes.
We're doing these the lazy way. I can't be bothered writing patterns/rules for
4 different formats for something such as doctypes.
2014-02-27 01:27:51 +01:00
Yorick Peterse c4e0406ed9 Lexing of CDATA tags. 2014-02-26 22:01:07 +01:00
Yorick Peterse 0a336e76d3 Renamed T_EXCLAMATION to T_BANG.
This is way easier to type.
2014-02-26 21:54:27 +01:00
Yorick Peterse 684eccd3e2 Lex dashes as T_DASH instead of T_TEXT. 2014-02-26 21:52:32 +01:00
Yorick Peterse 39bbe5afc4 Expanded lexer tag/attribute tests. 2014-02-26 21:48:46 +01:00
Yorick Peterse d32888f803 Basic lexer setup/tests.
Too lazy to do this the right way. ᕕ(ᐛ)ᕗ
2014-02-26 21:36:30 +01:00
Yorick Peterse 5755c325bd Imported a half-assed lexer. 2014-02-26 19:54:11 +01:00
Yorick Peterse 702477ca28 Basic project layout. 2014-02-26 19:50:16 +01:00