core/oga - oga

Commit Graph

Author	SHA1	Message	Date
Yorick Peterse	cb75edc30d	Basic support for lexing/parsing HTML5. This will need a bunch of extra tests before I'll consider closing #7.	2014-03-16 23:42:24 +01:00
Yorick Peterse	ce8bbdb64a	Parsing support for multiple nested nodes.	2014-03-15 20:19:54 +01:00
Yorick Peterse	05ee3c13c9	Parsing support for nested element/text nodes.	2014-03-14 00:44:11 +01:00
Yorick Peterse	6b2f682c5c	Tests for lexing a basic HTML document. This also comes with some changes to the lexer so that it advances column/line numbers correctly.	2014-03-13 23:55:18 +01:00
Yorick Peterse	34f8779c94	Lexing of bare regular text. This is currently a bit of a hack but at least we're slowly getting there.	2014-03-13 00:42:12 +01:00
Yorick Peterse	2fbca93ae8	Supported for parsing nested elements.	2014-03-12 23:13:28 +01:00
Yorick Peterse	8cfa81aed9	Basic support for parsing elements. This includes support for elements with namespaces and attributes. Nested elements are not yet supported.	2014-03-12 23:02:54 +01:00
Yorick Peterse	5ce515d224	Small line wrapping change in the lexer.	2014-03-12 22:42:13 +01:00
Yorick Peterse	98b3443e7f	Lexing of element attributes without values.	2014-03-12 22:41:17 +01:00
Yorick Peterse	ed9d8c05a2	Added support for parsing comments.	2014-03-12 22:20:12 +01:00
Yorick Peterse	0a396043f8	Support for parsing CDATA tags.	2014-03-11 22:22:02 +01:00
Yorick Peterse	c9592856f0	Updated parsing of doctypes. The resulting nodes now separate the type, public and system IDs in to separate string values.	2014-03-11 22:08:21 +01:00
Yorick Peterse	c07edc767b	Updated the gitignore entry for the parser.	2014-03-11 22:03:02 +01:00
Yorick Peterse	8ce76be050	Moved the parser class to Oga::Parser. Oga will use the same parser for XML and HTML so it doesn't make sense to separate the two into different namespaces (at least for now).	2014-03-11 22:01:50 +01:00
Yorick Peterse	77b40d2e81	Use a separate machine for closing tags. This makes it easier to advance column numbers for whitespace as well as captuing and emitting tokens for the closing tag.	2014-03-11 21:55:36 +01:00
Yorick Peterse	eacd9b88cf	Reworked token generation for elements. This emits separate tokens for the start tag (T_ELEMENT_OPEN) and name (T_ELEMENT_NAME). This makes it easier to include the namespace of an element (T_ELEMENT_NS) in the output.	2014-03-10 23:50:39 +01:00
Yorick Peterse	cd53d5e426	Fixed advancing column numbers. In a bunch of cases the column number would not be increased correctly.	2014-03-07 23:54:56 +01:00
Yorick Peterse	a5a3b8db3f	Basic lexing of HTML tags. The current implementation is a bit messy. In particular the counting of column numbers is not entirely the way it should be. There are also some problems with nested tags/text that I still have to resolve.	2014-03-03 22:08:46 +01:00
Yorick Peterse	d9ef33e1f8	Lexing of comments. This fixes #4.	2014-02-28 23:27:23 +01:00
Yorick Peterse	92ae48f905	Use fcall + fret instead of fgoto. This removes the hardcoded return to the main machine.	2014-02-28 23:19:31 +01:00
Yorick Peterse	30d3e455d1	Use squote/dquote everywhere in the lexer.	2014-02-28 23:18:23 +01:00
Yorick Peterse	970ce27283	Cleanup of buffering text/strings. This removes the need to use \|\|= and such, which should speed things up a bit and keeps the code cleaner.	2014-02-28 23:16:01 +01:00
Yorick Peterse	ca6f422036	Lexing of doctypes. This comes with various structural changes to the lexer as I'm slowly starting to get the hang of Ragel. Ragel is a beast but damn it's an awesome piece of software. Note that the doctype public/system IDs are lexed as T_STRING. The parser will figure out whether a ID is a public or system ID based on the order. This fixes #1	2014-02-28 23:08:55 +01:00
Yorick Peterse	3c825afee0	Cleaned up lexer rules a bit. There's no benefit to adding variables for angle brackets and such, it's much easier to grok to just use them directly.	2014-02-28 20:09:13 +01:00
Yorick Peterse	2294bf19f4	Better lexing of CDATA tags. This means the lexer is now capable of lexing CDATA tags that contain text such as ]].	2014-02-28 20:05:12 +01:00
Yorick Peterse	6138945d53	Moved some of the CDATA docs around.	2014-02-28 00:04:44 +01:00
Yorick Peterse	4883ac7384	Lexing of CDATA tags.	2014-02-28 00:03:37 +01:00
Yorick Peterse	2c82f88f6c	Basic lexing + parsing of doctypes. We're doing these the lazy way. I can't be bothered writing patterns/rules for 4 different formats for something such as doctypes.	2014-02-27 01:27:51 +01:00
Yorick Peterse	91f416f035	Moved ending tags into their own racc rule.	2014-02-26 22:20:11 +01:00
Yorick Peterse	4f04fa0d30	Untrack Racc generated files. Yorick, you can stop being bad now.	2014-02-26 22:18:33 +01:00
Yorick Peterse	e764ba640a	Basic parser setup without tests. Who needs tests anyway!	2014-02-26 22:17:47 +01:00
Yorick Peterse	c4e0406ed9	Lexing of CDATA tags.	2014-02-26 22:01:07 +01:00
Yorick Peterse	0a336e76d3	Renamed T_EXCLAMATION to T_BANG. This is way easier to type.	2014-02-26 21:54:27 +01:00
Yorick Peterse	684eccd3e2	Lex dashes as T_DASH instead of T_TEXT.	2014-02-26 21:52:32 +01:00
Yorick Peterse	39bbe5afc4	Expanded lexer tag/attribute tests.	2014-02-26 21:48:46 +01:00
Yorick Peterse	d32888f803	Basic lexer setup/tests. Too lazy to do this the right way. ᕕ(ᐛ)ᕗ	2014-02-26 21:36:30 +01:00
Yorick Peterse	5755c325bd	Imported a half-assed lexer.	2014-02-26 19:54:11 +01:00
Yorick Peterse	702477ca28	Basic project layout.	2014-02-26 19:50:16 +01:00

... 9 10 11 12 13

638 Commits