91fb7523fd 
								
							 
						 
						
							
							
								
								Lex open tags with newlines in them.  
							
							
							
						 
						
							2014-03-20 23:39:29 +01:00  
				
					
						
							
							
								 
						
							
								74bc11a239 
								
							 
						 
						
							
							
								
								Rip out column counting.  
							
							... 
							
							
							
							This makes both the lexer and parser quite a bit easier to use. Counting column
numbers isn't also really needed when parsing XML/HTML. 
							
						 
						
							2014-03-20 19:44:28 +01:00  
				
					
						
							
							
								 
						
							
								192ba9bb54 
								
							 
						 
						
							
							
								
								Expanded the lexer comment tests.  
							
							
							
						 
						
							2014-03-19 21:44:57 +01:00  
				
					
						
							
							
								 
						
							
								7271e74396 
								
							 
						 
						
							
							
								
								Revert "Compacter parser AST."  
							
							... 
							
							
							
							Although this AST is compacter it will result in conflicts between (text),
(attributes) and (attribute) nodes in regular XML documents. This is due to XML
allowing elements with these names (unlike in HTML).
This reverts commit 8898d08831 
							
						 
						
							2014-03-18 18:55:16 +01:00  
				
					
						
							
							
								 
						
							
								274ab359ba 
								
							 
						 
						
							
							
								
								Don't use separate tokens/nodes for newlines.  
							
							... 
							
							
							
							Newlines are now lexed together with regular text. The line numbers are
advanced based on the amount of "\n" sequences in a text buffer. 
							
						 
						
							2014-03-17 21:26:21 +01:00  
				
					
						
							
							
								 
						
							
								8898d08831 
								
							 
						 
						
							
							
								
								Compacter parser AST.  
							
							... 
							
							
							
							The AST no longer uses the generic `element` type for element nodes but instead
changes the type based on the element type. That is, a <p> element now results
in an (p) node, <link> in (link), etc. 
							
						 
						
							2014-03-17 21:03:54 +01:00  
				
					
						
							
							
								 
						
							
								8d3f3f15d7 
								
							 
						 
						
							
							
								
								Renamed parse_html() to parse().  
							
							
							
						 
						
							2014-03-16 23:46:20 +01:00  
				
					
						
							
							
								 
						
							
								cb75edc30d 
								
							 
						 
						
							
							
								
								Basic support for lexing/parsing HTML5.  
							
							... 
							
							
							
							This will need a bunch of extra tests before I'll consider closing #7 . 
							
						 
						
							2014-03-16 23:42:24 +01:00  
				
					
						
							
							
								 
						
							
								ce8bbdb64a 
								
							 
						 
						
							
							
								
								Parsing support for multiple nested nodes.  
							
							
							
						 
						
							2014-03-15 20:19:54 +01:00  
				
					
						
							
							
								 
						
							
								05ee3c13c9 
								
							 
						 
						
							
							
								
								Parsing support for nested element/text nodes.  
							
							
							
						 
						
							2014-03-14 00:44:11 +01:00  
				
					
						
							
							
								 
						
							
								6b2f682c5c 
								
							 
						 
						
							
							
								
								Tests for lexing a basic HTML document.  
							
							... 
							
							
							
							This also comes with some changes to the lexer so that it advances column/line
numbers correctly. 
							
						 
						
							2014-03-13 23:55:18 +01:00  
				
					
						
							
							
								 
						
							
								edf2e4112b 
								
							 
						 
						
							
							
								
								Added a test for parsing bare text tokens.  
							
							
							
						 
						
							2014-03-13 00:42:58 +01:00  
				
					
						
							
							
								 
						
							
								34f8779c94 
								
							 
						 
						
							
							
								
								Lexing of bare regular text.  
							
							... 
							
							
							
							This is currently a bit of a hack but at least we're slowly getting there. 
							
						 
						
							2014-03-13 00:42:12 +01:00  
				
					
						
							
							
								 
						
							
								2fbca93ae8 
								
							 
						 
						
							
							
								
								Supported for parsing nested elements.  
							
							
							
						 
						
							2014-03-12 23:13:28 +01:00  
				
					
						
							
							
								 
						
							
								8cfa81aed9 
								
							 
						 
						
							
							
								
								Basic support for parsing elements.  
							
							... 
							
							
							
							This includes support for elements with namespaces and attributes. Nested
elements are not yet supported. 
							
						 
						
							2014-03-12 23:02:54 +01:00  
				
					
						
							
							
								 
						
							
								98b3443e7f 
								
							 
						 
						
							
							
								
								Lexing of element attributes without values.  
							
							
							
						 
						
							2014-03-12 22:41:17 +01:00  
				
					
						
							
							
								 
						
							
								ed9d8c05a2 
								
							 
						 
						
							
							
								
								Added support for parsing comments.  
							
							
							
						 
						
							2014-03-12 22:20:12 +01:00  
				
					
						
							
							
								 
						
							
								0a396043f8 
								
							 
						 
						
							
							
								
								Support for parsing CDATA tags.  
							
							
							
						 
						
							2014-03-11 22:22:02 +01:00  
				
					
						
							
							
								 
						
							
								c9592856f0 
								
							 
						 
						
							
							
								
								Updated parsing of doctypes.  
							
							... 
							
							
							
							The resulting nodes now separate the type, public and system IDs in to separate
string values. 
							
						 
						
							2014-03-11 22:08:21 +01:00  
				
					
						
							
							
								 
						
							
								4a41894e2c 
								
							 
						 
						
							
							
								
								Updated the doctype parser specs.  
							
							
							
						 
						
							2014-03-11 22:02:26 +01:00  
				
					
						
							
							
								 
						
							
								8ce76be050 
								
							 
						 
						
							
							
								
								Moved the parser class to Oga::Parser.  
							
							... 
							
							
							
							Oga will use the same parser for XML and HTML so it doesn't make sense to
separate the two into different namespaces (at least for now). 
							
						 
						
							2014-03-11 22:01:50 +01:00  
				
					
						
							
							
								 
						
							
								eacd9b88cf 
								
							 
						 
						
							
							
								
								Reworked token generation for elements.  
							
							... 
							
							
							
							This emits separate tokens for the start tag (T_ELEMENT_OPEN) and name
(T_ELEMENT_NAME). This makes it easier to include the namespace of an element
(T_ELEMENT_NS) in the output. 
							
						 
						
							2014-03-10 23:50:39 +01:00  
				
					
						
							
							
								 
						
							
								cd53d5e426 
								
							 
						 
						
							
							
								
								Fixed advancing column numbers.  
							
							... 
							
							
							
							In a bunch of cases the column number would not be increased correctly. 
							
						 
						
							2014-03-07 23:54:56 +01:00  
				
					
						
							
							
								 
						
							
								1c9a6c8b76 
								
							 
						 
						
							
							
								
								Tests for nested tags/text nodes.  
							
							... 
							
							
							
							Well guess what, apparently that did work. That was slightly unexpected. 
							
						 
						
							2014-03-03 22:13:29 +01:00  
				
					
						
							
							
								 
						
							
								a5a3b8db3f 
								
							 
						 
						
							
							
								
								Basic lexing of HTML tags.  
							
							... 
							
							
							
							The current implementation is a bit messy. In particular the counting of column
numbers is not entirely the way it should be. There are also some problems with
nested tags/text that I still have to resolve. 
							
						 
						
							2014-03-03 22:08:46 +01:00  
				
					
						
							
							
								 
						
							
								d9ef33e1f8 
								
							 
						 
						
							
							
								
								Lexing of comments.  
							
							... 
							
							
							
							This fixes  #4 . 
							
						 
						
							2014-02-28 23:27:23 +01:00  
				
					
						
							
							
								 
						
							
								ca6f422036 
								
							 
						 
						
							
							
								
								Lexing of doctypes.  
							
							... 
							
							
							
							This comes with various structural changes to the lexer as I'm slowly starting
to get the hang of Ragel. Ragel is a beast but damn it's an awesome piece of
software.
Note that the doctype public/system IDs are lexed as T_STRING. The parser will
figure out whether a ID is a public or system ID based on the order.
This fixes  #1  
							
						 
						
							2014-02-28 23:08:55 +01:00  
				
					
						
							
							
								 
						
							
								2294bf19f4 
								
							 
						 
						
							
							
								
								Better lexing of CDATA tags.  
							
							... 
							
							
							
							This means the lexer is now capable of lexing CDATA tags that contain text such
as ]]. 
							
						 
						
							2014-02-28 20:05:12 +01:00  
				
					
						
							
							
								 
						
							
								4883ac7384 
								
							 
						 
						
							
							
								
								Lexing of CDATA tags.  
							
							
							
						 
						
							2014-02-28 00:03:37 +01:00  
				
					
						
							
							
								 
						
							
								c011e2faaa 
								
							 
						 
						
							
							
								
								Moved the lexer specs to spec/oga/lexer.  
							
							... 
							
							
							
							I accidently moved these inside the parser specs. 
							
						 
						
							2014-02-27 21:30:10 +01:00  
				
					
						
							
							
								 
						
							
								cdaa14a28e 
								
							 
						 
						
							
							
								
								Broke up lexer specs into separate files.  
							
							
							
						 
						
							2014-02-27 20:55:29 +01:00  
				
					
						
							
							
								 
						
							
								2c82f88f6c 
								
							 
						 
						
							
							
								
								Basic lexing + parsing of doctypes.  
							
							... 
							
							
							
							We're doing these the lazy way. I can't be bothered writing patterns/rules for
4 different formats for something such as doctypes. 
							
						 
						
							2014-02-27 01:27:51 +01:00  
				
					
						
							
							
								 
						
							
								c4e0406ed9 
								
							 
						 
						
							
							
								
								Lexing of CDATA tags.  
							
							
							
						 
						
							2014-02-26 22:01:07 +01:00  
				
					
						
							
							
								 
						
							
								0a336e76d3 
								
							 
						 
						
							
							
								
								Renamed T_EXCLAMATION to T_BANG.  
							
							... 
							
							
							
							This is way easier to type. 
							
						 
						
							2014-02-26 21:54:27 +01:00  
				
					
						
							
							
								 
						
							
								684eccd3e2 
								
							 
						 
						
							
							
								
								Lex dashes as T_DASH instead of T_TEXT.  
							
							
							
						 
						
							2014-02-26 21:52:32 +01:00  
				
					
						
							
							
								 
						
							
								39bbe5afc4 
								
							 
						 
						
							
							
								
								Expanded lexer tag/attribute tests.  
							
							
							
						 
						
							2014-02-26 21:48:46 +01:00  
				
					
						
							
							
								 
						
							
								d32888f803 
								
							 
						 
						
							
							
								
								Basic lexer setup/tests.  
							
							... 
							
							
							
							Too lazy to do this the right way. ᕕ(ᐛ)ᕗ 
							
						 
						
							2014-02-26 21:36:30 +01:00  
				
					
						
							
							
								 
						
							
								5755c325bd 
								
							 
						 
						
							
							
								
								Imported a half-assed lexer.  
							
							
							
						 
						
							2014-02-26 19:54:11 +01:00  
				
					
						
							
							
								 
						
							
								702477ca28 
								
							 
						 
						
							
							
								
								Basic project layout.  
							
							
							
						 
						
							2014-02-26 19:50:16 +01:00