Commit Graph

477 Commits

Author SHA1 Message Date
Yorick Peterse c9592856f0 Updated parsing of doctypes.
The resulting nodes now separate the type, public and system IDs in to separate
string values.
2014-03-11 22:08:21 +01:00
Yorick Peterse c07edc767b Updated the gitignore entry for the parser. 2014-03-11 22:03:02 +01:00
Yorick Peterse 8ce76be050 Moved the parser class to Oga::Parser.
Oga will use the same parser for XML and HTML so it doesn't make sense to
separate the two into different namespaces (at least for now).
2014-03-11 22:01:50 +01:00
Yorick Peterse 77b40d2e81 Use a separate machine for closing tags.
This makes it easier to advance column numbers for whitespace as well as
captuing and emitting tokens for the closing tag.
2014-03-11 21:55:36 +01:00
Yorick Peterse eacd9b88cf Reworked token generation for elements.
This emits separate tokens for the start tag (T_ELEMENT_OPEN) and name
(T_ELEMENT_NAME). This makes it easier to include the namespace of an element
(T_ELEMENT_NS) in the output.
2014-03-10 23:50:39 +01:00
Yorick Peterse cd53d5e426 Fixed advancing column numbers.
In a bunch of cases the column number would not be increased correctly.
2014-03-07 23:54:56 +01:00
Yorick Peterse a5a3b8db3f Basic lexing of HTML tags.
The current implementation is a bit messy. In particular the counting of column
numbers is not entirely the way it should be. There are also some problems with
nested tags/text that I still have to resolve.
2014-03-03 22:08:46 +01:00
Yorick Peterse d9ef33e1f8 Lexing of comments.
This fixes #4.
2014-02-28 23:27:23 +01:00
Yorick Peterse 92ae48f905 Use fcall + fret instead of fgoto.
This removes the hardcoded return to the main machine.
2014-02-28 23:19:31 +01:00
Yorick Peterse 30d3e455d1 Use squote/dquote everywhere in the lexer. 2014-02-28 23:18:23 +01:00
Yorick Peterse 970ce27283 Cleanup of buffering text/strings.
This removes the need to use ||= and such, which should speed things up a bit
and keeps the code cleaner.
2014-02-28 23:16:01 +01:00
Yorick Peterse ca6f422036 Lexing of doctypes.
This comes with various structural changes to the lexer as I'm slowly starting
to get the hang of Ragel. Ragel is a beast but damn it's an awesome piece of
software.

Note that the doctype public/system IDs are lexed as T_STRING. The parser will
figure out whether a ID is a public or system ID based on the order.

This fixes #1
2014-02-28 23:08:55 +01:00
Yorick Peterse 3c825afee0 Cleaned up lexer rules a bit.
There's no benefit to adding variables for angle brackets and such, it's much
easier to grok to just use them directly.
2014-02-28 20:09:13 +01:00
Yorick Peterse 2294bf19f4 Better lexing of CDATA tags.
This means the lexer is now capable of lexing CDATA tags that contain text such
as ]].
2014-02-28 20:05:12 +01:00
Yorick Peterse 6138945d53 Moved some of the CDATA docs around. 2014-02-28 00:04:44 +01:00
Yorick Peterse 4883ac7384 Lexing of CDATA tags. 2014-02-28 00:03:37 +01:00
Yorick Peterse 2c82f88f6c Basic lexing + parsing of doctypes.
We're doing these the lazy way. I can't be bothered writing patterns/rules for
4 different formats for something such as doctypes.
2014-02-27 01:27:51 +01:00
Yorick Peterse 91f416f035 Moved ending tags into their own racc rule. 2014-02-26 22:20:11 +01:00
Yorick Peterse 4f04fa0d30 Untrack Racc generated files.
Yorick, you can stop being bad now.
2014-02-26 22:18:33 +01:00
Yorick Peterse e764ba640a Basic parser setup without tests.
Who needs tests anyway!
2014-02-26 22:17:47 +01:00
Yorick Peterse c4e0406ed9 Lexing of CDATA tags. 2014-02-26 22:01:07 +01:00
Yorick Peterse 0a336e76d3 Renamed T_EXCLAMATION to T_BANG.
This is way easier to type.
2014-02-26 21:54:27 +01:00
Yorick Peterse 684eccd3e2 Lex dashes as T_DASH instead of T_TEXT. 2014-02-26 21:52:32 +01:00
Yorick Peterse 39bbe5afc4 Expanded lexer tag/attribute tests. 2014-02-26 21:48:46 +01:00
Yorick Peterse d32888f803 Basic lexer setup/tests.
Too lazy to do this the right way. ᕕ(ᐛ)ᕗ
2014-02-26 21:36:30 +01:00
Yorick Peterse 5755c325bd Imported a half-assed lexer. 2014-02-26 19:54:11 +01:00
Yorick Peterse 702477ca28 Basic project layout. 2014-02-26 19:50:16 +01:00