The previous setup would consume too much. For example the following HTML:
<a><!--foo--><b><!--bar--></b></a>
would result in the following T_COMMENT token:
"foo--><b><!--bar"
The new setup requires the marking of a start position. I'm not a huge fan of
this but there doesn't appear to be a way around this.
Instead of using a raw Hash Oga now uses the XML::Attribute class for storing
information about element attributes.
Attributes are stored as an Array of XML::Attribute instances. This allows the
attributes to be more easily modified. If they were stored as a Hash you'd not
only have to update the attributes themselves but also the Hash that contains
them.
While using an Array has a slight runtime cost in most cases the amount of
attributes is small enough that this doesn't really pose a problem. If webscale
performance is desired at some point in the future Oga could most likely cache
the lookup of an attribute. This however is something for the future.
This moves the element related rules to the element_head machine (where they
belong). This in turn makes it possible to lex ">" as a text node, previously
this was impossible.