8acc7fc743
Instead of using a single token (T_CDATA) for a CDATA tag the lexer now uses 3 tokens: 1. T_CDATA_START 2. T_CDATA_BODY 3. T_CDATA_END The T_CDATA_BODY token can occur multiple times and is turned into a single value in the XML parser. This is similar to the way strings are lexed. By changing the way CDATA tags are lexed Oga can now lex CDATA tags containing newlines when using an IO as input. For example, this would previously fail: Oga.parse_xml(StringIO.new("<![CDATA[\nfoo]]>")) Because IO input reads input per line the input for the lexer would be as following: "<![CDATA[\n" "foo]]>" Related issues: #93 |
||
---|---|---|
.. | ||
base_lexer.rl |