Updated the README.
This commit is contained in:
parent
fe74d60138
commit
51c1f3c32d
68
README.md
68
README.md
|
@ -1,11 +1,12 @@
|
||||||
# Oga
|
# Oga
|
||||||
|
|
||||||
Oga is (or will be) a pure Ruby, thread-safe HTML (and XML in the future)
|
Oga is an XML/HTML parser written in Ruby. Oga aims to provide an easy to use
|
||||||
parser that doesn't trigger segmentation faults on Ruby implementations other
|
and high performance API for all your XML/HTML parsing needs. Oga requires
|
||||||
than MRI. Oga will initially **not** focus on performance but instead will
|
nothing other than Ruby, it does not depend on libxml and the likes.
|
||||||
focus on proper handling of encodings, stability and a sane API. More
|
|
||||||
importantly it will be pure Ruby **only**. No C extensions, no Java, no x86 64
|
To achieve high performance Oga uses a C or Java extension depending on your
|
||||||
assembly, just Ruby.
|
Ruby platform. Pure Ruby is sadly not fast enough to process large amounts of
|
||||||
|
text in reasonable time.
|
||||||
|
|
||||||
From [Wikipedia][oga-wikipedia]:
|
From [Wikipedia][oga-wikipedia]:
|
||||||
|
|
||||||
|
@ -13,49 +14,38 @@ From [Wikipedia][oga-wikipedia]:
|
||||||
> power saws. One person stood on a raised platform, with the board below him,
|
> power saws. One person stood on a raised platform, with the board below him,
|
||||||
> and the other person stood underneath them.
|
> and the other person stood underneath them.
|
||||||
|
|
||||||
## Planned Features
|
|
||||||
|
|
||||||
* Full support for HTML(5)
|
|
||||||
* Full support for XML, DTDs will probably be ignored.
|
|
||||||
* Support for xpath and CSS selector based queries
|
|
||||||
* SAX/pull parsing APIs that don't make you want to cut yourself
|
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
* A README
|
* Support for parsing XML and HTML(5)
|
||||||
|
* DOM parsing
|
||||||
|
* Stream/pull parsing
|
||||||
|
* High performance and low memory usage (depending on the parsing API)
|
||||||
|
* Support for XPath 1.0 (planned)
|
||||||
|
* CSS selectors support (planned)
|
||||||
|
|
||||||
## Requirements
|
## Requirements
|
||||||
|
|
||||||
* Ruby
|
Oga runs on MRI 1.9.3 or newer, Rubinius 2.2 or newer and JRuby 1.7 or newer.
|
||||||
|
Ruby 1.8.7 is not supported. Maglev, Topaz and mruby are currently not
|
||||||
|
supported.
|
||||||
|
|
||||||
Development requirements:
|
To install Oga on MRI or Rubinius you'll need to have a working compiler such
|
||||||
|
as gcc or clang. Oga's C extension can be compiled with any capable C compiler.
|
||||||
|
|
||||||
* Ragel
|
## Native Extension Setup
|
||||||
* Racc
|
|
||||||
* Other stuff
|
|
||||||
|
|
||||||
## Usage
|
The native extensions can be found in `ext/` and are divided into a C and Java
|
||||||
|
extension. These extensions are only used for the XML lexer built using Ragel.
|
||||||
|
The grammar for this lexer is shared between C and Java and can be found in
|
||||||
|
`ext/ragel/base_lexer.rl`.
|
||||||
|
|
||||||
Basic DOM parsing example:
|
The extensions delegate most of their work back to Ruby code. As a result of
|
||||||
|
this maintenance of this codebase is much easier. If one wants to change the
|
||||||
|
grammar they only have to do so in one place and they don't have to worry about
|
||||||
|
C and/or Java specific details.
|
||||||
|
|
||||||
require 'oga'
|
For more details on calling Ruby methods from Ragel see the source
|
||||||
|
documentation in `ext/ragel/base_lexer.rl`.
|
||||||
parser = Oga::Parser::DOM.new
|
|
||||||
document = parser.parse('<p>Hello</p>')
|
|
||||||
|
|
||||||
puts document.css('p').first.text # => "Hello"
|
|
||||||
|
|
||||||
Pull parsing:
|
|
||||||
|
|
||||||
require 'oga'
|
|
||||||
|
|
||||||
parser = Oga::Parser::Pull.new('<p>Hello</p>')
|
|
||||||
|
|
||||||
parser.each do |node|
|
|
||||||
puts node.text
|
|
||||||
end
|
|
||||||
|
|
||||||
These examples will probably change once I actually start writing some code.
|
|
||||||
|
|
||||||
## Why Another HTML/XML parser?
|
## Why Another HTML/XML parser?
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue