core/oga - oga

Commit Graph

Author	SHA1	Message	Date
Yorick Peterse	74bc11a239	Rip out column counting. This makes both the lexer and parser quite a bit easier to use. Counting column numbers isn't also really needed when parsing XML/HTML.	2014-03-20 19:44:28 +01:00
Yorick Peterse	70a39042e7	Removed useless rules from the parser.	2014-03-20 18:58:32 +01:00
Yorick Peterse	03774f2788	Documented the lexer.	2014-03-19 22:05:57 +01:00
Yorick Peterse	192ba9bb54	Expanded the lexer comment tests.	2014-03-19 21:44:57 +01:00
Yorick Peterse	f1fcdfbacb	Cleaned up the Ragel bits of the lexer. This removes some of the complexity that existed before (e.g. too many state machines) and fixes a bunch of problems with nested data.	2014-03-19 21:44:10 +01:00
Yorick Peterse	7271e74396	Revert "Compacter parser AST." Although this AST is compacter it will result in conflicts between (text), (attributes) and (attribute) nodes in regular XML documents. This is due to XML allowing elements with these names (unlike in HTML). This reverts commit `8898d08831`.	2014-03-18 18:55:16 +01:00
Yorick Peterse	9687dd379f	Added a .ruby-version file.	2014-03-18 18:08:25 +01:00
Yorick Peterse	56f22c311e	Allow JRuby to fail for now.	2014-03-18 00:13:33 +01:00
Yorick Peterse	422832fd68	Lowered the required Ragel version to 6.7.	2014-03-18 00:12:21 +01:00
Yorick Peterse	091e32c17a	Install Ragel on Travis CI.	2014-03-18 00:09:16 +01:00
Yorick Peterse	8d4d3999b5	Configuration file for Travis CI.	2014-03-17 21:52:24 +01:00
Yorick Peterse	9975c9c430	Removed the emit_text_buffer Ragel action.	2014-03-17 21:49:49 +01:00
Yorick Peterse	274ab359ba	Don't use separate tokens/nodes for newlines. Newlines are now lexed together with regular text. The line numbers are advanced based on the amount of "\n" sequences in a text buffer.	2014-03-17 21:26:21 +01:00
Yorick Peterse	8898d08831	Compacter parser AST. The AST no longer uses the generic `element` type for element nodes but instead changes the type based on the element type. That is, a <p> element now results in an (p) node, <link> in (link), etc.	2014-03-17 21:03:54 +01:00
Yorick Peterse	8d3f3f15d7	Renamed parse_html() to parse().	2014-03-16 23:46:20 +01:00
Yorick Peterse	cb75edc30d	Basic support for lexing/parsing HTML5. This will need a bunch of extra tests before I'll consider closing #7.	2014-03-16 23:42:24 +01:00
Yorick Peterse	ce8bbdb64a	Parsing support for multiple nested nodes.	2014-03-15 20:19:54 +01:00
Yorick Peterse	05ee3c13c9	Parsing support for nested element/text nodes.	2014-03-14 00:44:11 +01:00
Yorick Peterse	6b2f682c5c	Tests for lexing a basic HTML document. This also comes with some changes to the lexer so that it advances column/line numbers correctly.	2014-03-13 23:55:18 +01:00
Yorick Peterse	edf2e4112b	Added a test for parsing bare text tokens.	2014-03-13 00:42:58 +01:00
Yorick Peterse	34f8779c94	Lexing of bare regular text. This is currently a bit of a hack but at least we're slowly getting there.	2014-03-13 00:42:12 +01:00
Yorick Peterse	2fbca93ae8	Supported for parsing nested elements.	2014-03-12 23:13:28 +01:00
Yorick Peterse	8cfa81aed9	Basic support for parsing elements. This includes support for elements with namespaces and attributes. Nested elements are not yet supported.	2014-03-12 23:02:54 +01:00
Yorick Peterse	5ce515d224	Small line wrapping change in the lexer.	2014-03-12 22:42:13 +01:00
Yorick Peterse	98b3443e7f	Lexing of element attributes without values.	2014-03-12 22:41:17 +01:00
Yorick Peterse	ed9d8c05a2	Added support for parsing comments.	2014-03-12 22:20:12 +01:00
Yorick Peterse	0a396043f8	Support for parsing CDATA tags.	2014-03-11 22:22:02 +01:00
Yorick Peterse	c9592856f0	Updated parsing of doctypes. The resulting nodes now separate the type, public and system IDs in to separate string values.	2014-03-11 22:08:21 +01:00
Yorick Peterse	c07edc767b	Updated the gitignore entry for the parser.	2014-03-11 22:03:02 +01:00
Yorick Peterse	4a41894e2c	Updated the doctype parser specs.	2014-03-11 22:02:26 +01:00
Yorick Peterse	8ce76be050	Moved the parser class to Oga::Parser. Oga will use the same parser for XML and HTML so it doesn't make sense to separate the two into different namespaces (at least for now).	2014-03-11 22:01:50 +01:00
Yorick Peterse	77b40d2e81	Use a separate machine for closing tags. This makes it easier to advance column numbers for whitespace as well as captuing and emitting tokens for the closing tag.	2014-03-11 21:55:36 +01:00
Yorick Peterse	eacd9b88cf	Reworked token generation for elements. This emits separate tokens for the start tag (T_ELEMENT_OPEN) and name (T_ELEMENT_NAME). This makes it easier to include the namespace of an element (T_ELEMENT_NS) in the output.	2014-03-10 23:50:39 +01:00
Yorick Peterse	cd53d5e426	Fixed advancing column numbers. In a bunch of cases the column number would not be increased correctly.	2014-03-07 23:54:56 +01:00
Yorick Peterse	1c9a6c8b76	Tests for nested tags/text nodes. Well guess what, apparently that did work. That was slightly unexpected.	2014-03-03 22:13:29 +01:00
Yorick Peterse	a5a3b8db3f	Basic lexing of HTML tags. The current implementation is a bit messy. In particular the counting of column numbers is not entirely the way it should be. There are also some problems with nested tags/text that I still have to resolve.	2014-03-03 22:08:46 +01:00
Yorick Peterse	d9ef33e1f8	Lexing of comments. This fixes #4.	2014-02-28 23:27:23 +01:00
Yorick Peterse	92ae48f905	Use fcall + fret instead of fgoto. This removes the hardcoded return to the main machine.	2014-02-28 23:19:31 +01:00
Yorick Peterse	30d3e455d1	Use squote/dquote everywhere in the lexer.	2014-02-28 23:18:23 +01:00
Yorick Peterse	970ce27283	Cleanup of buffering text/strings. This removes the need to use \|\|= and such, which should speed things up a bit and keeps the code cleaner.	2014-02-28 23:16:01 +01:00
Yorick Peterse	ca6f422036	Lexing of doctypes. This comes with various structural changes to the lexer as I'm slowly starting to get the hang of Ragel. Ragel is a beast but damn it's an awesome piece of software. Note that the doctype public/system IDs are lexed as T_STRING. The parser will figure out whether a ID is a public or system ID based on the order. This fixes #1	2014-02-28 23:08:55 +01:00
Yorick Peterse	3c825afee0	Cleaned up lexer rules a bit. There's no benefit to adding variables for angle brackets and such, it's much easier to grok to just use them directly.	2014-02-28 20:09:13 +01:00
Yorick Peterse	2294bf19f4	Better lexing of CDATA tags. This means the lexer is now capable of lexing CDATA tags that contain text such as ]].	2014-02-28 20:05:12 +01:00
Yorick Peterse	6138945d53	Moved some of the CDATA docs around.	2014-02-28 00:04:44 +01:00
Yorick Peterse	4883ac7384	Lexing of CDATA tags.	2014-02-28 00:03:37 +01:00
Yorick Peterse	c011e2faaa	Moved the lexer specs to spec/oga/lexer. I accidently moved these inside the parser specs.	2014-02-27 21:30:10 +01:00
Yorick Peterse	cdaa14a28e	Broke up lexer specs into separate files.	2014-02-27 20:55:29 +01:00
Yorick Peterse	2c82f88f6c	Basic lexing + parsing of doctypes. We're doing these the lazy way. I can't be bothered writing patterns/rules for 4 different formats for something such as doctypes.	2014-02-27 01:27:51 +01:00
Yorick Peterse	d7d20b4c23	Added a license.	2014-02-26 22:20:47 +01:00
Yorick Peterse	91f416f035	Moved ending tags into their own racc rule.	2014-02-26 22:20:11 +01:00

... 5 6 7 8 9

413 Commits All Branches Search

413 Commits

All Branches