Commit Graph

26 Commits

Author SHA1 Message Date
Yorick Peterse 00579eaa8a Changed text action from @{} to %{}.
This ensures the action is only run at the end, opposed to any non final state.
2014-09-23 22:58:20 +02:00
Yorick Peterse ad2e040f05 Handle lexing of input such as just "</".
Previously this would cause the lexer to go in an infinite loop in the "text"
state machine.

This fixes #37.
2014-09-15 17:20:06 +02:00
Yorick Peterse 9b8e9f49c6 Support for lexing empty attribute values.
This ensures that Oga can lex the following properly:

    <input value="" />

Previously Ragel would stop upon finding the empty string. This was caused due
to the string rules being declared as following:

    string_dquote = (dquote ^dquote+ dquote);
    string_squote = (squote ^squote+ squote);

These rules only match strings _with_ content, not without. Since Ragel stops
consuming input the moment it finds unhandled data this resulted in incorrect
tokens being emitted.
2014-09-03 23:10:50 +02:00
Yorick Peterse 49ddebf358 Tighten lexing of T_TEXT nodes.
Thanks to some heavy rubberducking with @whitequark the lexer is now a little
bit better at lexing T_TEXT nodes. For example, previously the following could
not be lexed properly:

    "foo < bar"

There might still be some tweaking to do but we're getting there.
2014-09-03 00:51:13 +02:00
Yorick Peterse 96b7296910 Ragel variable of element closing tags. 2014-09-02 22:50:21 +02:00
Yorick Peterse 56341b5585 Cleaned up lexing of comments/cdata.
Thanks to @whitequark for suggesting the use of the "--" operator.
2014-08-16 16:03:55 +02:00
Yorick Peterse 2c488f92be Cleaned up marking of comments/cdata tags. 2014-08-15 22:05:09 +02:00
Yorick Peterse 8f4eaf3823 Lexing of XML processing instructions. 2014-08-15 22:04:45 +02:00
Yorick Peterse 4e8cca258c Fixed lexing of XML CDATA tags. 2014-08-15 20:47:58 +02:00
Yorick Peterse 81edce2eb8 Fixed lexing of XML comments.
The previous setup would consume too much. For example the following HTML:

    <a><!--foo--><b><!--bar--></b></a>

would result in the following T_COMMENT token:

    "foo--><b><!--bar"

The new setup requires the marking of a start position. I'm not a huge fan of
this but there doesn't appear to be a way around this.
2014-08-15 20:42:32 +02:00
Yorick Peterse d5569ead0b Use XML::Attribute for element attributes.
Instead of using a raw Hash Oga now uses the XML::Attribute class for storing
information about element attributes.

Attributes are stored as an Array of XML::Attribute instances. This allows the
attributes to be more easily modified. If they were stored as a Hash you'd not
only have to update the attributes themselves but also the Hash that contains
them.

While using an Array has a slight runtime cost in most cases the amount of
attributes is small enough that this doesn't really pose a problem. If webscale
performance is desired at some point in the future Oga could most likely cache
the lookup of an attribute. This however is something for the future.
2014-07-20 07:29:37 +02:00
Yorick Peterse f660b11e47 Parsing of closing XML nodes with namespaces. 2014-07-09 19:54:45 +02:00
Yorick Peterse be3f8fb494 Removed the on_newline XML lexer callback. 2014-05-29 14:21:48 +02:00
Yorick Peterse 418b4ef498 Cleaned up documentation of the XML lexer. 2014-05-21 00:21:21 +02:00
Yorick Peterse 3a8582030d Removed remaining fhold call in the XML lexer.
There's no particular need any more for this fhold call so we're getting rid of
it.
2014-05-21 00:11:39 +02:00
Yorick Peterse 4542f06d0f Replaced fcall/fret with fnext in the XML lexer.
With the rules being cleaned up/moved around a bit we can drop the use of
fcall/fret. This saves the need of having to maintain a stack (position).
2014-05-21 00:08:48 +02:00
Yorick Peterse c56b0395e4 Moved various rules around for the XML lexer.
This moves the element related rules to the element_head machine (where they
belong). This in turn makes it possible to lex ">" as a text node, previously
this was impossible.
2014-05-21 00:04:53 +02:00
Yorick Peterse feaf28d423 Remove dedicated string machine in the XML lexer.
This removes the need for another fcall/fret combination.
2014-05-19 20:26:07 +02:00
Yorick Peterse 93b9718406 Cleaned up the XML lexer documentation. 2014-05-19 09:39:35 +02:00
Yorick Peterse cd0f3380c4 Merge multiple CDATA tokens into a single token.
The tokens T_CDATA_START, T_TEXT and T_CDATA_END have been merged together into
T_CDATA.
2014-05-19 09:36:19 +02:00
Yorick Peterse a4fb5c1299 Merge multiple comment tokens into a single one.
The tokens T_COMMENT_START, T_TEXT and T_COMMENT_END have been merged into a
single token: T_COMMENT. This simplifies both the lexer and the parser.
2014-05-19 09:30:30 +02:00
Yorick Peterse 44bf1dd1ca Split up handling of element names/namespaces.
This is now split up on Ragel level, simplifying the corresponding Ruby code.
2014-05-15 10:22:05 +02:00
Yorick Peterse 19f04f98f7 Support for lexing/parsing inline doctypes. 2014-05-10 00:28:11 +02:00
Yorick Peterse c472ceac6f Docs for the shared Ragel grammar. 2014-05-08 00:21:23 +02:00
Yorick Peterse e271298984 Use macros in the C lexer. 2014-05-07 00:57:25 +02:00
Yorick Peterse f25f8a3d15 Break up the Ragel C grammar.
The grammar is now broken up in to a base lexer and a C lexer. This allows the
same grammar to also be used in the Java code.
2014-05-07 00:50:34 +02:00