oga/doc/css_selectors.md

15 KiB

CSS Selectors Specification

This document acts as an alternative specification to the official W3 CSS3 Selectors Specification. This document specifies only the selectors supported by Oga itself. Only CSS3 selectors are covered, CSS4 is not part of this specification.

This document is best viewed in the YARD generated documentation or any other Markdown viewer that supports the Kramdown syntax. Alternatively it can be viewed in its raw form.

Abstract

The official W3 specification on CSS selectors is anything but pleasant to read. A lack of good examples and unspecified behaviour are just two of many problems. This document was written as a reference guide for myself as well as a way for others to more easily understand how CSS selectors work.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Syntax

To describe syntax elements of CSS selectors this document uses the same grammar as Ragel. For example, an integer would be defined as following:

integer = [0-9]+;

In turn an integer that can optionally be prefixed by + or - would be defined as following:

integer = ('+' | '-')* [0-9]+;

A quick and basic crash course of the Ragel grammar:

  • *: zero or one instance of the preceding token(s)
  • +: one or more instances of the preceding token(s)
  • ( and ): used for grouping expressions together
  • ^: inverts a match, thus ^[0-9] means "anything but a single digit"
  • "..." or '...': a literal character, "x" would match the literal "x"
  • |: the OR operator, x | y translates to "x OR y"
  • [...]: used to define a sequence, [0-9] translates to "0 OR 1 OR 2 OR 3..." all the way upto 9

Semicolons are used to terminate lines. While not strictly required in this specification they are included in order to produce a Ragel syntax compatible grammar.

See the Ragel documentation for more information on the grammar.

Terminology

local name
The name of an element without a namespace. For the element <strong> the local name is strong.
namespace prefix
The namespace prefix of an element. For the element <foo:strong> the namespace prefix is foo.
expression
A single or multiple selectors used together to retrieve a set of elements from a document.

Selector Scoping

Whenever a selector is used to match an element the selector applies to all nodes in the context. For example, the selector foo would match all foo elements at any position in the document. On the other hand, the selector foo bar only matches any bar elements that are a descedant of any foo element.

In XPath the corresponding axis for this is descendant-or-self. In other words, this CSS expression:

foo

is the same as this XPath expression:

descendant-or-self::foo

In turn this CSS expression:

foo bar

is the same as this XPath expression:

descendant-or-self::foo/descendant-or-self::bar

Note that in the various XPath examples the descendant-or-self axis is omitted in order to enhance readability.

Syntax

A CSS expression is made up of multiple selectors separated by one or more spaces. There MUST be at least 1 space between two selectors, there MAY be more than one. Multiple spaces do not alter the behaviour of the expression in any way.

Universal Selector

W3 chapter: http://www.w3.org/TR/css3-selectors/#universal-selector

The universal selector * (also known as the "wildcard selector") can be used to match any element, regardless of its local name or namespace prefix.

Example XML:

<root>
    <foo></foo>
    <bar></bar>
</root>

CSS:

root *

This would return a set containing two elements: <foo> and <bar>

The corresponding XPath is also *.

Element Selector

W3 chapter: http://www.w3.org/TR/css3-selectors/#type-selectors

The element selector (known as "Type selector" in the official W3 specification) can be used to match a set of elements by their local name or namespace. The selector foo is used to match all elements with the local name being set to foo.

Example XML:

<root>
    <foo />
    <bar />
</root>

CSS:

root foo

This would return a set with only the <foo> element.

This selector can be used in combination with the Universal Selector. This allows one to select elements using both a given local name and namespace. The syntax for this is as following:

ns-prefix|local-name

Here the pipe (|) character separates the namespace prefix and the local name. Both can either be an identifier or a wildcard. For example, the selector rb|foo matches all elements with local name foo and namespace prefix rb.

The namespace prefix MAY be left out producing the selector |local-name. In this case the selector only matches elements without a namespace prefix.

If a namespace prefix is given and it's not a wildcard then elements without a namespace prefix will not be matched.

The corresponding XPath expression for such a selector is ns-prefix:local-name. For example, rb|foo in CSS is the same as rb:foo in XPath.

Syntax

The syntax for just the local name is as following:

identifier = '*' | [a-zA-Z]+ [a-zA-Z\-_0-9]*;

The wildcard is put in place to allow a single rule to be used for both names and wildcards.

The syntax for selecting an element including a namespace prefix is as following:

ns_plus_local_name = identifier* '|' identifier

This would match |foo, *|foo and foo|bar. In order to match foo the regular identifier rule declared above can be used.

Attribute Selectors

W3 chapter: http://www.w3.org/TR/css3-selectors/#attribute-selectors

Attribute selectors can be used to further narrow down a set of elements based on their attribute list. In XPath these selectors are known as "predicates". For example, the selector foo[bar] matches all foo elements that have a bar attribute, regardless of the value of said attribute.

Example XML:

<root>
    <foo number="1" />
    <bar />
</root>

CSS:

root foo[number]

This would return a set containing only the <foo> element since the <bar> element has no attributes.

For the CSS expression foo[number] the corresponding XPath expression is the following:

foo[@number]

When specifying an attribute you MAY include an operator and a value to match. In this case you MUST include an attribute value surrounded by either single or double quotes (but not a combination of the two).

There are 6 operators available:

  • =: equals operator
  • ~=: whitespace-in operator
  • ^=: starts-with operator
  • $=: ends-with operator
  • *=: contains operator
  • |=: hyphen-starts-with operator

Equals Operator

The equals operator matches an element if a given attribute value equals the value specified. For example, foo[number="1"] matches all foo elements that have a number attribute who's value is exactly "1".

Example XML:

<root>
    <foo number="1" />
    <foo number="2" />
</root>

CSS:

root foo[number="1"]

This would return a set containing only the first <foo> element.

The corresponding XPath expression is quite similar. For foo[number="1"] this would be:

foo[@number="1"]

Whitespace-in Operator

This operator matches an element if the given attribute value consists out of space separated values of which one is exactly the given value. For example, foo[numbers~="1"] matches all foo elements that have the value "1" in the numbers attribute.

Example XML:

<root>
    <foo numbers="1 2 3" />
    <foo numbers="4 bar 6" />
</root>

CSS:

root foo[numbers~="1"]

This would return a set containing only the first foo element. On the other hand, if one were to use the expression root foo[numbers~="bar"] instead then only the second <foo> element would be matched.

The corresponding XPath expression is quite complex, foo[numbers~="1"] is translated into the following XPath expression:

foo[contains(concat(" ", @numbers, " "), concat(" ", "1", " "))]

The concat calls are used to ensure the expression doesn't match the substring of an attrbitue value and that the expression matches elements of which the attribute only has a single value. If foo[contains(@numbers, ' 1 ')] were to be used then attributes such as <foo numbers="1" /> would not be matched.

Software implementing this selector are free to decide how they concatenate spaces around the value to match. Both Oga and Nokogiri use an extra call to concat but the following would be perfectly valid too:

foo[contains(concat(" ", @numbers, " "), " 1 ")]

Starts-with Operator

This operator matches elements of which the attribute value starts exactly with the given value. For example, foo[numbers^="1"] would match the element <foo numbers="1 2 3" /> but not the element <foo numbers="2 3 1" />.

For foo[numbers^="1"] the corresponding XPath expression is as following:

foo[starts-with(@numbers, "1")]

Ends-with Operator

This operator matches elements of which the attribute value ends exactly with the given value. For example, foo[numbers$="3"] would match the element <foo numbers="1 2 3" /> but not the element <foo numbers="2 3 1" />.

The corresponding XPath expression is quite complex due to a lack of a ends-with function in XPath. Instead one has to resort to using the substring() function. As such the corresponding XPath expression for foo[bar="baz"] is as following:

foo[substring(@bar, string-length(@bar) - string-length("baz") + 1, string-length("baz")) = "baz"]

Contains Operator

This operator matches elements of which the attribute value contains the given value. For example, foo[bar*="baz"] would match both <foo bar="bazzzz" /> and <foo bar="hello baz" />.

For foo[bar*="baz"] the corresponding XPath expression is as following:

foo[contains(@bar, "baz")]

Hyphen-starts-with Operator

This operator matches elements of which the attribute value is a hyphen separated list of values that starts exactly with the given value. For example, foo[numbers|="1"] matches <foo numbers="1-2-3" /> but not <foo numbers="2-1-3" />.

For foo[numbers|="1"] the corresponding XPath expression is as following:

foo[@numbers = "1" or starts-with(@numbers, concat("1", "-"))]

Note that this selector will also match elements such as <foo numbers="1- foo bar" />.

Syntax

The syntax of the various attribute selectors can be described as following:

# Strings are used for the attribute values

dquote = '"';
squote = "'";

string_dquote = dquote ^dquote* dquote;
string_squote = squote ^squote* squote;

string = string_dquote | string_squote;

# The `identifier` rule is the same as the one used for matching element
# names.
attr_test = identifier '[' space* identifier (space* '=' space* string)* space* ']';

Whitespace inside the brackets does not affect the behaviour of the selector.

Pseudo Classes

W3 chapter: http://www.w3.org/TR/css3-selectors/#structural-pseudos

Pseudo classes can be used to further narrow down elements besides just their names and attribute values. In essence they are a combination of XPath function calls and axes. Some pseudo classes can take an argument to alter their behaviour.

Pseudo classes are often applied to element selectors. For example:

foo:bar

Here :bar would be a pseudo class applied to the foo element. Some pseudo classes (e.g. the :root pseudo class) can also be used on their own, for example:

:root

:root

The :root pseudo class selects an element only if it's the top-level element in a document.

Example XML:

<root>
    <foo />
</root>

Using the CSS expression root foo:root we'd get an empty set as the <foo> element is not the root element. On the other hand, root:root would return a set containing only the <root> element.

This selector can both be applied to an element selector as well as being used on its own.

For the selector foo:root the corresponding XPath expression is as following:

foo[not(parent::*)]

For :root the XPath expression is:

*[not(parent::*)]

:nth-child(n)

The :nth-child(n) selector can be used to select a set of elements based on their position and/or an interval. Here n is an argument that can be used to specify one of the following:

  1. A literal node set index
  2. A node interval used to match every N nodes
  3. A node interval plus an initial offset

The first element in a node set for :nth-child() is located at position 1, not position 0 (unlike most programming languages). As a result :nth-child(1) matches the first element, not the second.

Besides using a literal index argument you can also use an interval, optionally with an offset. This can be used to for example match every 2nd element, or every 2nd element starting at element number 4.

The syntax of this argument is as following:

integer  = ('+' | '-')* [0-9]+;
interval = ('n' | '-n' | integer 'n') integer;

Here interval would match any of the following:

n
-n
2n
2n+5
2n-5
-2n+5
-2n-5

Due to integer also matching the + and - it will be part of the same token. If this is not desired the following grammar can be used instead:

integer  = [0-9]+;
modifier = '+' | '-';
interval = ('n' | '-n' | modifier* integer 'n') modifier integer;

To match every 2nd element you'd use the following:

:nth-child(2n)

To match every 2nd element starting at element 2 you'd instead use this:

:nth-child(2n+2)

As mentioned the +2 in the above example is the initial offset. This is however only the case if the second number is positive. That means that for :nth-child(2n-2) the offset is not -2. When using a negative offset the actual offset has to first be calculated. When using an argument in the form of An-B we can calculate the actual offset as following:

offset = A - (B % A)

For example, this would effectively turn :nth-child(2n-2) into :nth-child(2n+2) and :nth-child(2n-5) into :nth-child(2n+1). Note that if the minus sign is part of the number you can simply use the following formula instead:

offset = B % A

For :nth-child(2n-5) this translates to:

offset = -5 % 2

Which would result in :nth-child(2n+1).

To ease the process of selecting even and uneven elements you can also use even and odd as an argument. Using :nth-child(even) is the same as :nth-child(2n) while using :nth-child(odd) in turn is the same as :nth-child(2n+1).