Yorick Peterse
cb75edc30d
Basic support for lexing/parsing HTML5.
...
This will need a bunch of extra tests before I'll consider closing #7 .
2014-03-16 23:42:24 +01:00
Yorick Peterse
ce8bbdb64a
Parsing support for multiple nested nodes.
2014-03-15 20:19:54 +01:00
Yorick Peterse
05ee3c13c9
Parsing support for nested element/text nodes.
2014-03-14 00:44:11 +01:00
Yorick Peterse
6b2f682c5c
Tests for lexing a basic HTML document.
...
This also comes with some changes to the lexer so that it advances column/line
numbers correctly.
2014-03-13 23:55:18 +01:00
Yorick Peterse
34f8779c94
Lexing of bare regular text.
...
This is currently a bit of a hack but at least we're slowly getting there.
2014-03-13 00:42:12 +01:00
Yorick Peterse
2fbca93ae8
Supported for parsing nested elements.
2014-03-12 23:13:28 +01:00
Yorick Peterse
8cfa81aed9
Basic support for parsing elements.
...
This includes support for elements with namespaces and attributes. Nested
elements are not yet supported.
2014-03-12 23:02:54 +01:00
Yorick Peterse
5ce515d224
Small line wrapping change in the lexer.
2014-03-12 22:42:13 +01:00
Yorick Peterse
98b3443e7f
Lexing of element attributes without values.
2014-03-12 22:41:17 +01:00
Yorick Peterse
ed9d8c05a2
Added support for parsing comments.
2014-03-12 22:20:12 +01:00
Yorick Peterse
0a396043f8
Support for parsing CDATA tags.
2014-03-11 22:22:02 +01:00
Yorick Peterse
c9592856f0
Updated parsing of doctypes.
...
The resulting nodes now separate the type, public and system IDs in to separate
string values.
2014-03-11 22:08:21 +01:00
Yorick Peterse
c07edc767b
Updated the gitignore entry for the parser.
2014-03-11 22:03:02 +01:00
Yorick Peterse
8ce76be050
Moved the parser class to Oga::Parser.
...
Oga will use the same parser for XML and HTML so it doesn't make sense to
separate the two into different namespaces (at least for now).
2014-03-11 22:01:50 +01:00
Yorick Peterse
77b40d2e81
Use a separate machine for closing tags.
...
This makes it easier to advance column numbers for whitespace as well as
captuing and emitting tokens for the closing tag.
2014-03-11 21:55:36 +01:00
Yorick Peterse
eacd9b88cf
Reworked token generation for elements.
...
This emits separate tokens for the start tag (T_ELEMENT_OPEN) and name
(T_ELEMENT_NAME). This makes it easier to include the namespace of an element
(T_ELEMENT_NS) in the output.
2014-03-10 23:50:39 +01:00
Yorick Peterse
cd53d5e426
Fixed advancing column numbers.
...
In a bunch of cases the column number would not be increased correctly.
2014-03-07 23:54:56 +01:00
Yorick Peterse
a5a3b8db3f
Basic lexing of HTML tags.
...
The current implementation is a bit messy. In particular the counting of column
numbers is not entirely the way it should be. There are also some problems with
nested tags/text that I still have to resolve.
2014-03-03 22:08:46 +01:00
Yorick Peterse
d9ef33e1f8
Lexing of comments.
...
This fixes #4 .
2014-02-28 23:27:23 +01:00
Yorick Peterse
92ae48f905
Use fcall + fret instead of fgoto.
...
This removes the hardcoded return to the main machine.
2014-02-28 23:19:31 +01:00
Yorick Peterse
30d3e455d1
Use squote/dquote everywhere in the lexer.
2014-02-28 23:18:23 +01:00
Yorick Peterse
970ce27283
Cleanup of buffering text/strings.
...
This removes the need to use ||= and such, which should speed things up a bit
and keeps the code cleaner.
2014-02-28 23:16:01 +01:00
Yorick Peterse
ca6f422036
Lexing of doctypes.
...
This comes with various structural changes to the lexer as I'm slowly starting
to get the hang of Ragel. Ragel is a beast but damn it's an awesome piece of
software.
Note that the doctype public/system IDs are lexed as T_STRING. The parser will
figure out whether a ID is a public or system ID based on the order.
This fixes #1
2014-02-28 23:08:55 +01:00
Yorick Peterse
3c825afee0
Cleaned up lexer rules a bit.
...
There's no benefit to adding variables for angle brackets and such, it's much
easier to grok to just use them directly.
2014-02-28 20:09:13 +01:00
Yorick Peterse
2294bf19f4
Better lexing of CDATA tags.
...
This means the lexer is now capable of lexing CDATA tags that contain text such
as ]].
2014-02-28 20:05:12 +01:00
Yorick Peterse
6138945d53
Moved some of the CDATA docs around.
2014-02-28 00:04:44 +01:00
Yorick Peterse
4883ac7384
Lexing of CDATA tags.
2014-02-28 00:03:37 +01:00
Yorick Peterse
2c82f88f6c
Basic lexing + parsing of doctypes.
...
We're doing these the lazy way. I can't be bothered writing patterns/rules for
4 different formats for something such as doctypes.
2014-02-27 01:27:51 +01:00
Yorick Peterse
91f416f035
Moved ending tags into their own racc rule.
2014-02-26 22:20:11 +01:00
Yorick Peterse
4f04fa0d30
Untrack Racc generated files.
...
Yorick, you can stop being bad now.
2014-02-26 22:18:33 +01:00
Yorick Peterse
e764ba640a
Basic parser setup without tests.
...
Who needs tests anyway!
2014-02-26 22:17:47 +01:00
Yorick Peterse
c4e0406ed9
Lexing of CDATA tags.
2014-02-26 22:01:07 +01:00
Yorick Peterse
0a336e76d3
Renamed T_EXCLAMATION to T_BANG.
...
This is way easier to type.
2014-02-26 21:54:27 +01:00
Yorick Peterse
684eccd3e2
Lex dashes as T_DASH instead of T_TEXT.
2014-02-26 21:52:32 +01:00
Yorick Peterse
39bbe5afc4
Expanded lexer tag/attribute tests.
2014-02-26 21:48:46 +01:00
Yorick Peterse
d32888f803
Basic lexer setup/tests.
...
Too lazy to do this the right way. ᕕ(ᐛ)ᕗ
2014-02-26 21:36:30 +01:00
Yorick Peterse
5755c325bd
Imported a half-assed lexer.
2014-02-26 19:54:11 +01:00
Yorick Peterse
702477ca28
Basic project layout.
2014-02-26 19:50:16 +01:00