9fa694ad4f 
								
							 
						 
						
							
							
								
								Use index based buffers for text nodes.  
							
							... 
							
							
							
							Instead of appending single characters to a String buffer the lexer now uses a
start and end position to figure out what the buffer is. This is a lot faster
than constantly appending to a String. 
							
						 
						
							2014-03-21 17:32:07 +01:00  
				
					
						
							
							
								 
						
							
								55f116124c 
								
							 
						 
						
							
							
								
								Fix for showing lines in parser errors.  
							
							
							
						 
						
							2014-03-21 00:16:20 +01:00  
				
					
						
							
							
								 
						
							
								7749f4abce 
								
							 
						 
						
							
							
								
								Corrected a comment in the parser.  
							
							
							
						 
						
							2014-03-21 00:10:20 +01:00  
				
					
						
							
							
								 
						
							
								a20ec0000a 
								
							 
						 
						
							
							
								
								Show up to 5 surrounding lines in parser errors.  
							
							
							
						 
						
							2014-03-20 23:40:25 +01:00  
				
					
						
							
							
								 
						
							
								91fb7523fd 
								
							 
						 
						
							
							
								
								Lex open tags with newlines in them.  
							
							
							
						 
						
							2014-03-20 23:39:29 +01:00  
				
					
						
							
							
								 
						
							
								ba17996bfc 
								
							 
						 
						
							
							
								
								Fancier error messages for the parser.  
							
							... 
							
							
							
							The error messages of the parser now contain surrounding lines of code instead
of only the offending line of code. This should make debugging a bit easier.
Line numbers are also shown for each line. 
							
						 
						
							2014-03-20 23:30:24 +01:00  
				
					
						
							
							
								 
						
							
								74bc11a239 
								
							 
						 
						
							
							
								
								Rip out column counting.  
							
							... 
							
							
							
							This makes both the lexer and parser quite a bit easier to use. Counting column
numbers isn't also really needed when parsing XML/HTML. 
							
						 
						
							2014-03-20 19:44:28 +01:00  
				
					
						
							
							
								 
						
							
								70a39042e7 
								
							 
						 
						
							
							
								
								Removed useless rules from the parser.  
							
							
							
						 
						
							2014-03-20 18:58:32 +01:00  
				
					
						
							
							
								 
						
							
								03774f2788 
								
							 
						 
						
							
							
								
								Documented the lexer.  
							
							
							
						 
						
							2014-03-19 22:05:57 +01:00  
				
					
						
							
							
								 
						
							
								f1fcdfbacb 
								
							 
						 
						
							
							
								
								Cleaned up the Ragel bits of the lexer.  
							
							... 
							
							
							
							This removes some of the complexity that existed before (e.g. too many state
machines) and fixes a bunch of problems with nested data. 
							
						 
						
							2014-03-19 21:44:10 +01:00  
				
					
						
							
							
								 
						
							
								7271e74396 
								
							 
						 
						
							
							
								
								Revert "Compacter parser AST."  
							
							... 
							
							
							
							Although this AST is compacter it will result in conflicts between (text),
(attributes) and (attribute) nodes in regular XML documents. This is due to XML
allowing elements with these names (unlike in HTML).
This reverts commit 8898d08831 
							
						 
						
							2014-03-18 18:55:16 +01:00  
				
					
						
							
							
								 
						
							
								9975c9c430 
								
							 
						 
						
							
							
								
								Removed the emit_text_buffer Ragel action.  
							
							
							
						 
						
							2014-03-17 21:49:49 +01:00  
				
					
						
							
							
								 
						
							
								274ab359ba 
								
							 
						 
						
							
							
								
								Don't use separate tokens/nodes for newlines.  
							
							... 
							
							
							
							Newlines are now lexed together with regular text. The line numbers are
advanced based on the amount of "\n" sequences in a text buffer. 
							
						 
						
							2014-03-17 21:26:21 +01:00  
				
					
						
							
							
								 
						
							
								8898d08831 
								
							 
						 
						
							
							
								
								Compacter parser AST.  
							
							... 
							
							
							
							The AST no longer uses the generic `element` type for element nodes but instead
changes the type based on the element type. That is, a <p> element now results
in an (p) node, <link> in (link), etc. 
							
						 
						
							2014-03-17 21:03:54 +01:00  
				
					
						
							
							
								 
						
							
								cb75edc30d 
								
							 
						 
						
							
							
								
								Basic support for lexing/parsing HTML5.  
							
							... 
							
							
							
							This will need a bunch of extra tests before I'll consider closing #7 . 
							
						 
						
							2014-03-16 23:42:24 +01:00  
				
					
						
							
							
								 
						
							
								ce8bbdb64a 
								
							 
						 
						
							
							
								
								Parsing support for multiple nested nodes.  
							
							
							
						 
						
							2014-03-15 20:19:54 +01:00  
				
					
						
							
							
								 
						
							
								05ee3c13c9 
								
							 
						 
						
							
							
								
								Parsing support for nested element/text nodes.  
							
							
							
						 
						
							2014-03-14 00:44:11 +01:00  
				
					
						
							
							
								 
						
							
								6b2f682c5c 
								
							 
						 
						
							
							
								
								Tests for lexing a basic HTML document.  
							
							... 
							
							
							
							This also comes with some changes to the lexer so that it advances column/line
numbers correctly. 
							
						 
						
							2014-03-13 23:55:18 +01:00  
				
					
						
							
							
								 
						
							
								34f8779c94 
								
							 
						 
						
							
							
								
								Lexing of bare regular text.  
							
							... 
							
							
							
							This is currently a bit of a hack but at least we're slowly getting there. 
							
						 
						
							2014-03-13 00:42:12 +01:00  
				
					
						
							
							
								 
						
							
								2fbca93ae8 
								
							 
						 
						
							
							
								
								Supported for parsing nested elements.  
							
							
							
						 
						
							2014-03-12 23:13:28 +01:00  
				
					
						
							
							
								 
						
							
								8cfa81aed9 
								
							 
						 
						
							
							
								
								Basic support for parsing elements.  
							
							... 
							
							
							
							This includes support for elements with namespaces and attributes. Nested
elements are not yet supported. 
							
						 
						
							2014-03-12 23:02:54 +01:00  
				
					
						
							
							
								 
						
							
								5ce515d224 
								
							 
						 
						
							
							
								
								Small line wrapping change in the lexer.  
							
							
							
						 
						
							2014-03-12 22:42:13 +01:00  
				
					
						
							
							
								 
						
							
								98b3443e7f 
								
							 
						 
						
							
							
								
								Lexing of element attributes without values.  
							
							
							
						 
						
							2014-03-12 22:41:17 +01:00  
				
					
						
							
							
								 
						
							
								ed9d8c05a2 
								
							 
						 
						
							
							
								
								Added support for parsing comments.  
							
							
							
						 
						
							2014-03-12 22:20:12 +01:00  
				
					
						
							
							
								 
						
							
								0a396043f8 
								
							 
						 
						
							
							
								
								Support for parsing CDATA tags.  
							
							
							
						 
						
							2014-03-11 22:22:02 +01:00  
				
					
						
							
							
								 
						
							
								c9592856f0 
								
							 
						 
						
							
							
								
								Updated parsing of doctypes.  
							
							... 
							
							
							
							The resulting nodes now separate the type, public and system IDs in to separate
string values. 
							
						 
						
							2014-03-11 22:08:21 +01:00  
				
					
						
							
							
								 
						
							
								c07edc767b 
								
							 
						 
						
							
							
								
								Updated the gitignore entry for the parser.  
							
							
							
						 
						
							2014-03-11 22:03:02 +01:00  
				
					
						
							
							
								 
						
							
								8ce76be050 
								
							 
						 
						
							
							
								
								Moved the parser class to Oga::Parser.  
							
							... 
							
							
							
							Oga will use the same parser for XML and HTML so it doesn't make sense to
separate the two into different namespaces (at least for now). 
							
						 
						
							2014-03-11 22:01:50 +01:00  
				
					
						
							
							
								 
						
							
								77b40d2e81 
								
							 
						 
						
							
							
								
								Use a separate machine for closing tags.  
							
							... 
							
							
							
							This makes it easier to advance column numbers for whitespace as well as
captuing and emitting tokens for the closing tag. 
							
						 
						
							2014-03-11 21:55:36 +01:00  
				
					
						
							
							
								 
						
							
								eacd9b88cf 
								
							 
						 
						
							
							
								
								Reworked token generation for elements.  
							
							... 
							
							
							
							This emits separate tokens for the start tag (T_ELEMENT_OPEN) and name
(T_ELEMENT_NAME). This makes it easier to include the namespace of an element
(T_ELEMENT_NS) in the output. 
							
						 
						
							2014-03-10 23:50:39 +01:00  
				
					
						
							
							
								 
						
							
								cd53d5e426 
								
							 
						 
						
							
							
								
								Fixed advancing column numbers.  
							
							... 
							
							
							
							In a bunch of cases the column number would not be increased correctly. 
							
						 
						
							2014-03-07 23:54:56 +01:00  
				
					
						
							
							
								 
						
							
								a5a3b8db3f 
								
							 
						 
						
							
							
								
								Basic lexing of HTML tags.  
							
							... 
							
							
							
							The current implementation is a bit messy. In particular the counting of column
numbers is not entirely the way it should be. There are also some problems with
nested tags/text that I still have to resolve. 
							
						 
						
							2014-03-03 22:08:46 +01:00  
				
					
						
							
							
								 
						
							
								d9ef33e1f8 
								
							 
						 
						
							
							
								
								Lexing of comments.  
							
							... 
							
							
							
							This fixes  #4 . 
							
						 
						
							2014-02-28 23:27:23 +01:00  
				
					
						
							
							
								 
						
							
								92ae48f905 
								
							 
						 
						
							
							
								
								Use fcall + fret instead of fgoto.  
							
							... 
							
							
							
							This removes the hardcoded return to the main machine. 
							
						 
						
							2014-02-28 23:19:31 +01:00  
				
					
						
							
							
								 
						
							
								30d3e455d1 
								
							 
						 
						
							
							
								
								Use squote/dquote everywhere in the lexer.  
							
							
							
						 
						
							2014-02-28 23:18:23 +01:00  
				
					
						
							
							
								 
						
							
								970ce27283 
								
							 
						 
						
							
							
								
								Cleanup of buffering text/strings.  
							
							... 
							
							
							
							This removes the need to use ||= and such, which should speed things up a bit
and keeps the code cleaner. 
							
						 
						
							2014-02-28 23:16:01 +01:00  
				
					
						
							
							
								 
						
							
								ca6f422036 
								
							 
						 
						
							
							
								
								Lexing of doctypes.  
							
							... 
							
							
							
							This comes with various structural changes to the lexer as I'm slowly starting
to get the hang of Ragel. Ragel is a beast but damn it's an awesome piece of
software.
Note that the doctype public/system IDs are lexed as T_STRING. The parser will
figure out whether a ID is a public or system ID based on the order.
This fixes  #1  
							
						 
						
							2014-02-28 23:08:55 +01:00  
				
					
						
							
							
								 
						
							
								3c825afee0 
								
							 
						 
						
							
							
								
								Cleaned up lexer rules a bit.  
							
							... 
							
							
							
							There's no benefit to adding variables for angle brackets and such, it's much
easier to grok to just use them directly. 
							
						 
						
							2014-02-28 20:09:13 +01:00  
				
					
						
							
							
								 
						
							
								2294bf19f4 
								
							 
						 
						
							
							
								
								Better lexing of CDATA tags.  
							
							... 
							
							
							
							This means the lexer is now capable of lexing CDATA tags that contain text such
as ]]. 
							
						 
						
							2014-02-28 20:05:12 +01:00  
				
					
						
							
							
								 
						
							
								6138945d53 
								
							 
						 
						
							
							
								
								Moved some of the CDATA docs around.  
							
							
							
						 
						
							2014-02-28 00:04:44 +01:00  
				
					
						
							
							
								 
						
							
								4883ac7384 
								
							 
						 
						
							
							
								
								Lexing of CDATA tags.  
							
							
							
						 
						
							2014-02-28 00:03:37 +01:00  
				
					
						
							
							
								 
						
							
								2c82f88f6c 
								
							 
						 
						
							
							
								
								Basic lexing + parsing of doctypes.  
							
							... 
							
							
							
							We're doing these the lazy way. I can't be bothered writing patterns/rules for
4 different formats for something such as doctypes. 
							
						 
						
							2014-02-27 01:27:51 +01:00  
				
					
						
							
							
								 
						
							
								91f416f035 
								
							 
						 
						
							
							
								
								Moved ending tags into their own racc rule.  
							
							
							
						 
						
							2014-02-26 22:20:11 +01:00  
				
					
						
							
							
								 
						
							
								4f04fa0d30 
								
							 
						 
						
							
							
								
								Untrack Racc generated files.  
							
							... 
							
							
							
							Yorick, you can stop being bad now. 
							
						 
						
							2014-02-26 22:18:33 +01:00  
				
					
						
							
							
								 
						
							
								e764ba640a 
								
							 
						 
						
							
							
								
								Basic parser setup without tests.  
							
							... 
							
							
							
							Who needs tests anyway! 
							
						 
						
							2014-02-26 22:17:47 +01:00  
				
					
						
							
							
								 
						
							
								c4e0406ed9 
								
							 
						 
						
							
							
								
								Lexing of CDATA tags.  
							
							
							
						 
						
							2014-02-26 22:01:07 +01:00  
				
					
						
							
							
								 
						
							
								0a336e76d3 
								
							 
						 
						
							
							
								
								Renamed T_EXCLAMATION to T_BANG.  
							
							... 
							
							
							
							This is way easier to type. 
							
						 
						
							2014-02-26 21:54:27 +01:00  
				
					
						
							
							
								 
						
							
								684eccd3e2 
								
							 
						 
						
							
							
								
								Lex dashes as T_DASH instead of T_TEXT.  
							
							
							
						 
						
							2014-02-26 21:52:32 +01:00  
				
					
						
							
							
								 
						
							
								39bbe5afc4 
								
							 
						 
						
							
							
								
								Expanded lexer tag/attribute tests.  
							
							
							
						 
						
							2014-02-26 21:48:46 +01:00  
				
					
						
							
							
								 
						
							
								d32888f803 
								
							 
						 
						
							
							
								
								Basic lexer setup/tests.  
							
							... 
							
							
							
							Too lazy to do this the right way. ᕕ(ᐛ)ᕗ 
							
						 
						
							2014-02-26 21:36:30 +01:00