This allows for parsing of HTML such as:
<a href=lol("javascript")></a>
Here the "href" attribute would have its value set to:
lol("javascript")
Fixes#119
```
element = Oga::XML::Element.new(:name => 'div')
some_node.replace(element)
```
You can also pass a `String` to `replace` and it will be replaced with
a `Oga::XML::Text` node
```
some_node.replace('this will replace the current node with a text node')
```
closes#115
Currently this only disabled the automatic insertion of closing tags, in
the future this may also disable other features if deemed worth the
effort.
Fixes#107
This allows the lexer to process input such as:
<a href=foo"></a>
For XML input the lexer still expects properly opened/closed attribute
values.
Fixes#109
It pains me that I have to write this in the first place (I was hoping
people would figure this out). Sadly history has shown it's required to
document this properly.
This ensures that entities such as "½" are decoded properly.
Previously this would be ignored as the regular expression used for this
only matched [a-zA-Z].
This was adapted from PR #111.
Previous HTML such as this would be lexed incorrectly:
<div>
<ul>
<li>foo
</ul>
inside div
</div>
outside div
The lexer would see this as the following instead:
<div>
<ul>
<li>foo</li>
inside div
</ul>
outside div
</div>
This commit exposes the name of the closing tag to
XML::Lexer#on_element_end (omitted for self closing tags). This can be
used to automatically close nested tags that were left open, ensuring
the above HTML is lexer correctly.
The new setup ignores namespace prefixes as these are not used in HTML,
XML in turn won't even run the code to begin with since it doesn't allow
one to leave out closing tags.
By encoding single/double quotes we can potentially break input, so lets
stop doing this. This now ensures that this:
<foo>a"b</foo>
Is actually serialized back into the exact same instead of being
serialized into:
<foo>a"b</foo>
This allows for more fine grained control over when to close certain
elements. For example, an unclosed <tr> element should be closed first
when bumping into any element other than <td> or <th>. Using the old
NodeNameSet this would mean having to list every possible HTML element
out there. Using this new setup one can just create a whitelist of the
<td> and <th> elements.