3.2.4 Content models

Each element defined in this specification has a content model: a description of the element's expected contents. An HTML element must have contents that match the requirements described in the element's content model. The contents of an element are its children in the DOM, except for template elements, where the children are those in the template contents (a separate DocumentFragment assigned to the element when the element is created).

The space characters are always allowed between elements. User agents represent these characters between elements in the source markup as Text nodes in the DOM. Empty Text nodes and Text nodes consisting of just sequences of those characters are considered inter-element whitespace.

Inter-element whitespace, comment nodes, and processing instruction nodes must be ignored when establishing whether an element's contents match the element's content model or not, and must be ignored when following algorithms that define document and element semantics.

Thus, an element A is said to be preceded or followed by a second element B if A and B have the same parent node and there are no other element nodes or Text nodes (other than inter-element whitespace) between them. Similarly, a node is the only child of an element if that element contains no other nodes other than inter-element whitespace, comment nodes, and processing instruction nodes.

Authors must not use HTML elements anywhere except where they are explicitly allowed, as defined for each element, or as explicitly required by other specifications. For XML compound documents, these contexts could be inside elements from other namespaces, if those elements are defined as providing the relevant contexts.

For example, the Atom specification defines a content element. When its type attribute has the value xhtml, the Atom specification requires that it contain a single HTML div element. Thus, a div element is allowed in that context, even though this is not explicitly normatively stated by this specification. [ATOM]

In addition, HTML elements may be orphan nodes (i.e. without a parent node).

For example, creating a td element and storing it in a global variable in a script is conforming, even though td elements are otherwise only supposed to be used inside tr elements.

var data = {
  name: "Banana",
  cell: document.createElement('td'),
};
3.2.4.1 Kinds of content

Each element in HTML falls into zero or more categories that group elements with similar characteristics together. The following broad categories are used in this specification:

Some elements also fall into other categories, which are defined in other parts of this specification.

These categories are related as follows:

Sectioning content, heading content, phrasing content, embedded content, and interactive content are all types of flow content. Metadata is sometimes flow content. Metadata and interactive content are sometimes phrasing content. Embedded content is also a type of phrasing content, and sometimes is interactive content.

Other categories are also used for specific purposes, e.g. form controls are specified using a number of categories to define common requirements. Some elements have unique requirements and do not fit into any particular category.

3.2.4.1.1 Metadata content

Metadata content is content that sets up the presentation or behavior of the rest of the content, or that sets up the relationship of the document with other documents, or that conveys other "out of band" information.

Elements from other namespaces whose semantics are primarily metadata-related (e.g. RDF) are also metadata content.

Thus, in the XML serialization, one can use RDF, like this:

<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:r="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
 <head>
  <title>Hedral's Home Page</title>
  <r:RDF>
   <Person xmlns="http://www.w3.org/2000/10/swap/pim/contact#"
           r:about="http://hedral.example.com/#">
    <fullName>Cat Hedral</fullName>
    <mailbox r:resource="mailto:hedral@damowmow.com"/>
    <personalTitle>Sir</personalTitle>
   </Person>
  </r:RDF>
 </head>
 <body>
  <h1>My home page</h1>
  <p>I like playing with string, I guess. Sister says squirrels are fun
  too so sometimes I follow her to play with them.</p>
 </body>
</html>

This isn't possible in the HTML serialization, however.

3.2.4.1.2 Flow content

Most elements that are used in the body of documents and applications are categorised as flow content.

3.2.4.1.3 Sectioning content

Sectioning content is content that defines the scope of headings and footers.

Each sectioning content element potentially has a heading and an outline. See the section on headings and sections for further details.

There are also certain elements that are sectioning roots. These are distinct from sectioning content, but they can also have an outline.

3.2.4.1.4 Heading content

Heading content defines the header of a section (whether explicitly marked up using sectioning content elements, or implied by the heading content itself).

3.2.4.1.5 Phrasing content

Phrasing content is the text of the document, as well as elements that mark up that text at the intra-paragraph level. Runs of phrasing content form paragraphs.

Most elements that are categorised as phrasing content can only contain elements that are themselves categorised as phrasing content, not any flow content.

Text, in the context of content models, means either nothing, or Text nodes. Text is sometimes used as a content model on its own, but is also phrasing content, and can be inter-element whitespace (if the Text nodes are empty or contain just space characters).

Text nodes and attribute values must consist of Unicode characters, must not contain U+0000 characters, must not contain permanently undefined Unicode characters (noncharacters), and must not contain control characters other than space characters. This specification includes extra constraints on the exact value of Text nodes and attribute values depending on their precise context.

3.2.4.1.6 Embedded content

Embedded content is content that imports another resource into the document, or content from another vocabulary that is inserted into the document.

Elements that are from namespaces other than the HTML namespace and that convey content but not metadata, are embedded content for the purposes of the content models defined in this specification. (For example, MathML, or SVG.)

Some embedded content elements can have fallback content: content that is to be used when the external resource cannot be used (e.g. because it is of an unsupported format). The element definitions state what the fallback is, if any.

3.2.4.1.7 Interactive content

Interactive content is content that is specifically intended for user interaction.

The tabindex attribute can also make any element into interactive content.

3.2.4.1.8 Palpable content

As a general rule, elements whose content model allows any flow content or phrasing content should have at least one node in its contents that is palpable content and that does not have the hidden attribute specified.

This requirement is not a hard requirement, however, as there are many cases where an element can be empty legitimately, for example when it is used as a placeholder which will later be filled in by a script, or when the element is part of a template and would on most pages be filled in but on some pages is not relevant.

Conformance checkers are encouraged to provide a mechanism for authors to find elements that fail to fulfill this requirement, as an authoring aid.

The following elements are palpable content:

3.2.4.1.9 Script-supporting elements

Script-supporting elements are those that do not represent anything themselves (i.e. they are not rendered), but are used to support scripts, e.g. to provide functionality for the user.

The following elements are script-supporting elements:

3.2.4.2 Transparent content models

Some elements are described as transparent; they have "transparent" in the description of their content model. The content model of a transparent element is derived from the content model of its parent element: the elements required in the part of the content model that is "transparent" are the same elements as required in the part of the content model of the parent of the transparent element in which the transparent element finds itself.

For instance, an ins element inside a ruby element cannot contain an rt element, because the part of the ruby element's content model that allows ins elements is the part that allows phrasing content, and the rt element is not phrasing content.

In some cases, where transparent elements are nested in each other, the process has to be applied iteratively.

Consider the following markup fragment:

<p><object><param><ins><map><a href="/">Apples</a></map></ins></object></p>

To check whether "Apples" is allowed inside the a element, the content models are examined. The a element's content model is transparent, as is the map element's, as is the ins element's, as is the part of the object element's in which the ins element is found. The object element is found in the p element, whose content model is phrasing content. Thus, "Apples" is allowed, as text is phrasing content.

When a transparent element has no parent, then the part of its content model that is "transparent" must instead be treated as accepting any flow content.

3.2.4.3 Paragraphs

The term paragraph as defined in this section is used for more than just the definition of the p element. The paragraph concept defined here is used to describe how to interpret documents. The p element is merely one of several ways of marking up a paragraph.

A paragraph is typically a run of phrasing content that forms a block of text with one or more sentences that discuss a particular topic, as in typography, but can also be used for more general thematic grouping. For instance, an address is also a paragraph, as is a part of a form, a byline, or a stanza in a poem.

In the following example, there are two paragraphs in a section. There is also a heading, which contains phrasing content that is not a paragraph. Note how the comments and inter-element whitespace do not form paragraphs.

<section>
  <h1>Example of paragraphs</h1>
  This is the <em>first</em> paragraph in this example.
  <p>This is the second.</p>
  <!-- This is not a paragraph. -->
</section>

Paragraphs in flow content are defined relative to what the document looks like without the a, ins, del, and map elements complicating matters, since those elements, with their hybrid content models, can straddle paragraph boundaries, as shown in the first two examples below.

Generally, having elements straddle paragraph boundaries is best avoided. Maintaining such markup can be difficult.

The following example takes the markup from the earlier example and puts ins and del elements around some of the markup to show that the text was changed (though in this case, the changes admittedly don't make much sense). Notice how this example has exactly the same paragraphs as the previous one, despite the ins and del elements — the ins element straddles the heading and the first paragraph, and the del element straddles the boundary between the two paragraphs.

<section>
  <ins><h1>Example of paragraphs</h1>
  This is the <em>first</em> paragraph in</ins> this example<del>.
  <p>This is the second.</p></del>
  <!-- This is not a paragraph. -->
</section>

A paragraph is also formed explicitly by p elements.

The p element can be used to wrap individual paragraphs when there would otherwise not be any content other than phrasing content to separate the paragraphs from each other.

In the following example, the link spans half of the first paragraph, all of the heading separating the two paragraphs, and half of the second paragraph. It straddles the paragraphs and the heading.

<header>
 Welcome!
 <a href="about.html">
  This is home of...
  <h1>The Falcons!</h1>
  The Lockheed Martin multirole jet fighter aircraft!
 </a>
 This page discusses the F-16 Fighting Falcon's innermost secrets.
</header>

Here is another way of marking this up, this time showing the paragraphs explicitly, and splitting the one link element into three:

<header>
 <p>Welcome! <a href="about.html">This is home of...</a></p>
 <h1><a href="about.html">The Falcons!</a></h1>
 <p><a href="about.html">The Lockheed Martin multirole jet
 fighter aircraft!</a> This page discusses the F-16 Fighting
 Falcon's innermost secrets.</p>
</header>

It is possible for paragraphs to overlap when using certain elements that define fallback content. For example, in the following section:

<section>
 <h1>My Cats</h1>
 You can play with my cat simulator.
 <object data="cats.sim">
  To see the cat simulator, use one of the following links:
  <ul>
   <li><a href="cats.sim">Download simulator file</a>
   <li><a href="http://sims.example.com/watch?v=LYds5xY4INU">Use online simulator</a>
  </ul>
  Alternatively, upgrade to the Mellblom Browser.
 </object>
 I'm quite proud of it.
</section>

There are five paragraphs:

  1. The paragraph that says "You can play with my cat simulator. object I'm quite proud of it.", where object is the object element.
  2. The paragraph that says "To see the cat simulator, use one of the following links:".
  3. The paragraph that says "Download simulator file".
  4. The paragraph that says "Use online simulator".
  5. The paragraph that says "Alternatively, upgrade to the Mellblom Browser.".

The first paragraph is overlapped by the other four. A user agent that supports the "cats.sim" resource will only show the first one, but a user agent that shows the fallback will confusingly show the first sentence of the first paragraph as if it was in the same paragraph as the second one, and will show the last paragraph as if it was at the start of the second sentence of the first paragraph.

To avoid this confusion, explicit p elements can be used. For example:

<section>
 <h1>My Cats</h1>
 <p>You can play with my cat simulator.</p>
 <object data="cats.sim">
  <p>To see the cat simulator, use one of the following links:</p>
  <ul>
   <li><a href="cats.sim">Download simulator file</a>
   <li><a href="http://sims.example.com/watch?v=LYds5xY4INU">Use online simulator</a>
  </ul>
  <p>Alternatively, upgrade to the Mellblom Browser.</p>
 </object>
 <p>I'm quite proud of it.</p>
</section>

3.2.5 Global attributes

The following attributes are common to and may be specified on all HTML elements:


To enable assistive technology products to expose a more fine-grained interface than is otherwise possible with HTML elements and attributes, a set of annotations for assistive technology products can be specified (the ARIA role and aria-* attributes). [ARIA]


The following event handler content attributes may be specified on any HTML element:

The attributes marked with an asterisk have a different meaning when specified on body elements as those elements expose event handlers of the Window object with the same names.

While these attributes apply to all elements, they are not useful on all elements. For example, only media elements will ever receive a volumechange event fired by the user agent.


Custom data attributes (e.g. data-foldername or data-msgid) can be specified on any HTML element, to store custom data specific to the page.


In HTML documents, elements in the HTML namespace may have an xmlns attribute specified, if, and only if, it has the exact value "http://www.w3.org/1999/xhtml". This does not apply to XML documents.

In HTML, the xmlns attribute has absolutely no effect. It is basically a talisman. It is allowed merely to make migration to and from XHTML mildly easier. When parsed by an HTML parser, the attribute ends up in no namespace, not the "http://www.w3.org/2000/xmlns/" namespace like namespace declaration attributes in XML do.

In XML, an xmlns attribute is part of the namespace declaration mechanism, and an element cannot actually have an xmlns attribute in no namespace specified.


The XML specification also allows the use of the xml:space attribute in the XML namespace on any element in an XML document. This attribute has no effect on HTML elements, as the default behavior in HTML is to preserve whitespace. [XML]

There is no way to serialise the xml:space attribute on HTML elements in the text/html syntax.

3.2.5.1 The id attribute

The id attribute specifies its element's unique identifier (ID). [DOM]

The value must be unique amongst all the IDs in the element's home subtree and must contain at least one character. The value must not contain any space characters.

There are no other restrictions on what form an ID can take; in particular, IDs can consist of just digits, start with a digit, start with an underscore, consist of just punctuation, etc.

An element's unique identifier can be used for a variety of purposes, most notably as a way to link to specific parts of a document using fragment identifiers, as a way to target an element when scripting, and as a way to style a specific element from CSS.

3.2.5.2 The title attribute

The title attribute represents advisory information for the element, such as would be appropriate for a tooltip. On a link, this could be the title or a description of the target resource; on an image, it could be the image credit or a description of the image; on a paragraph, it could be a footnote or commentary on the text; on a citation, it could be further information about the source; on interactive content, it could be a label for, or instructions for, use of the element; and so forth. The value is text.

Relying on the title attribute is currently discouraged as many user agents do not expose the attribute in an accessible manner as required by this specification (e.g. requiring a pointing device such as a mouse to cause a tooltip to appear, which excludes keyboard-only users and touch-only users, such as anyone with a modern phone or tablet).

If this attribute is omitted from an element, then it implies that the title attribute of the nearest ancestor HTML element with a title attribute set is also relevant to this element. Setting the attribute overrides this, explicitly stating that the advisory information of any ancestors is not relevant to this element. Setting the attribute to the empty string indicates that the element has no advisory information.

If the title attribute's value contains U+000A LINE FEED (LF) characters, the content is split into multiple lines. Each U+000A LINE FEED (LF) character represents a line break.

Caution is advised with respect to the use of newlines in title attributes.

For instance, the following snippet actually defines an abbreviation's expansion with a line break in it:

<p>My logs show that there was some interest in <abbr title="Hypertext
Transport Protocol">HTTP</abbr> today.</p>

Some elements, such as link, abbr, and input, define additional semantics for the title attribute beyond the semantics described above.

3.2.5.3 The lang and xml:lang attributes

The lang attribute (in no namespace) specifies the primary language for the element's contents and for any of the element's attributes that contain text. Its value must be a valid BCP 47 language tag, or the empty string. Setting the attribute to the empty string indicates that the primary language is unknown. [BCP47]

The lang attribute in the XML namespace is defined in XML. [XML]

If these attributes are omitted from an element, then the language of this element is the same as the language of its parent element, if any.

The lang attribute in no namespace may be used on any HTML element.

The lang attribute in the XML namespace may be used on HTML elements in XML documents, as well as elements in other namespaces if the relevant specifications allow it (in particular, MathML and SVG allow lang attributes in the XML namespace to be specified on their elements). If both the lang attribute in no namespace and the lang attribute in the XML namespace are specified on the same element, they must have exactly the same value when compared in an ASCII case-insensitive manner.

Authors must not use the lang attribute in the XML namespace on HTML elements in HTML documents. To ease migration to and from XHTML, authors may specify an attribute in no namespace with no prefix and with the literal localname "xml:lang" on HTML elements in HTML documents, but such attributes must only be specified if a lang attribute in no namespace is also specified, and both attributes must have the same value when compared in an ASCII case-insensitive manner.

The attribute in no namespace with no prefix and with the literal localname "xml:lang" has no effect on language processing.

3.2.5.4 The translate attribute

The translate attribute is an enumerated attribute that is used to specify whether an element's attribute values and the values of its Text node children are to be translated when the page is localized, or whether to leave them unchanged.

The attribute's keywords are the empty string, yes, and no. The empty string and the yes keyword map to the yes state. The no keyword maps to the no state. In addition, there is a third state, the inherit state, which is the missing value default (and the invalid value default).

Each element (even non-HTML elements) has a translation mode, which is in either the translate-enabled state or the no-translate state. If an HTML element's translate attribute is in the yes state, then the element's translation mode is in the translate-enabled state; otherwise, if the element's translate attribute is in the no state, then the element's translation mode is in the no-translate state. Otherwise, either the element's translate attribute is in the inherit state, or the element is not an HTML element and thus does not have a translate attribute; in either case, the element's translation mode is in the same state as its parent element's, if any, or in the translate-enabled state, if the element is a root element.

When an element is in the translate-enabled state, the element's translatable attributes and the values of its Text node children are to be translated when the page is localized.

When an element is in the no-translate state, the element's attribute values and the values of its Text node children are to be left as-is when the page is localized, e.g. because the element contains a person's name or a the name of a computer program.

The following attributes are translatable attributes:

Other specifications may define other attributes that are also translatable attributes. For example, ARIA would define the aria-label attribute as translatable.

In this example, everything in the document is to be translated when the page is localized, except the sample keyboard input and sample program output:

<!DOCTYPE HTML>
<html> <!-- default on the root element is translate=yes -->
 <head>
  <title>The Bee Game</title> <!-- implied translate=yes inherited from ancestors -->
 </head>
 <body>
  <p>The Bee Game is a text adventure game in English.</p>
  <p>When the game launches, the first thing you should do is type
  <kbd translate=no>eat honey</kbd>. The game will respond with:</p>
  <pre><samp translate=no>Yum yum! That was some good honey!</samp></pre>
 </body>
</html>
3.2.5.5 The xml:base attribute (XML only)

The xml:base attribute is defined in XML Base. [XMLBASE]

The xml:base attribute may be used on HTML elements of XML documents. Authors must not use the xml:base attribute on HTML elements in HTML documents.

3.2.5.6 The dir attribute

The dir attribute specifies the element's text directionality. The attribute is an enumerated attribute with the following keywords and states:

The ltr keyword, which maps to the ltr state

Indicates that the contents of the element are explicitly directionally isolated left-to-right text.

The rtl keyword, which maps to the rtl state

Indicates that the contents of the element are explicitly directionally isolated right-to-left text.

The auto keyword, which maps to the auto state

Indicates that the contents of the element are explicitly directionally isolated text, but that the direction is to be determined programmatically using the contents of the element (as described below).

The heuristic used by this state is very crude (it just looks at the first character with a strong directionality, in a manner analogous to the Paragraph Level determination in the bidirectional algorithm). Authors are urged to only use this value as a last resort when the direction of the text is truly unknown and no better server-side heuristic can be applied. [BIDI]

For textarea and pre elements, the heuristic is applied on a per-paragraph level.

The attribute has no invalid value default and no missing value default.


The directionality of an element (any element, not just an HTML element) is either 'ltr' or 'rtl', and is determined as per the first appropriate set of steps from the following list:

If the element's dir attribute is in the ltr state
If the element is a root element and the dir attribute is not in a defined state (i.e. it is not present or has an invalid value)
If the element is an input element whose type attribute is in the Telephone state, and the dir attribute is not in a defined state (i.e. it is not present or has an invalid value)

The directionality of the element is 'ltr'.

If the element's dir attribute is in the rtl state

The directionality of the element is 'rtl'.

If the element is an input element whose type attribute is in the Text, Search, Telephone, URL, or E-mail state, and the dir attribute is in the auto state
If the element is a textarea element and the dir attribute is in the auto state

If the element's value contains a character of bidirectional character type AL or R, and there is no character of bidirectional character type L anywhere before it in the element's value, then the directionality of the element is 'rtl'. [BIDI]

Otherwise, if the element's value is not the empty string, or if the element is a root element, the directionality of the element is 'ltr'.

Otherwise, the directionality of the element is the same as the element's parent element's directionality.

If the element's dir attribute is in the auto state
If the element is a bdi element and the dir attribute is not in a defined state (i.e. it is not present or has an invalid value)

Find the first character in tree order that matches the following criteria:

  • The character is from a Text node that is a descendant of the element whose directionality is being determined.

  • The character is of bidirectional character type L, AL, or R. [BIDI]

  • The character is not in a Text node that has an ancestor element that is a descendant of the element whose directionality is being determined and that is either:

If such a character is found and it is of bidirectional character type AL or R, the directionality of the element is 'rtl'.

If such a character is found and it is of bidirectional character type L, the directionality of the element is 'ltr'.

Otherwise, if the element is empty and is not a root element, the directionality of the element is the same as the element's parent element's directionality.

Otherwise, the directionality of the element is 'ltr'.

If the element has a parent element and the dir attribute is not in a defined state (i.e. it is not present or has an invalid value)

The directionality of the element is the same as the element's parent element's directionality.

Since the dir attribute is only defined for HTML elements, it cannot be present on elements from other namespaces. Thus, elements from other namespaces always just inherit their directionality from their parent element, or, if they don't have one, default to 'ltr'.


The directionality of an attribute of an HTML element, which is used when the text of that attribute is to be included in the rendering in some manner, is determined as per the first appropriate set of steps from the following list:

If the attribute is a directionality-capable attribute and the element's dir attribute is in the auto state

Find the first character (in logical order) of the attribute's value that is of bidirectional character type L, AL, or R. [BIDI]

If such a character is found and it is of bidirectional character type AL or R, the directionality of the attribute is 'rtl'.

Otherwise, the directionality of the attribute is 'ltr'.

Otherwise
The directionality of the attribute is the same as the element's directionality.

The following attributes are directionality-capable attributes:


document . dir [ = value ]

Returns the html element's dir attribute's value, if any.

Can be set, to either "ltr", "rtl", or "auto" to replace the html element's dir attribute's value.

If there is no html element, returns the empty string and ignores new values.

Authors are strongly encouraged to use the dir attribute to indicate text direction rather than using CSS, since that way their documents will continue to render correctly even in the absence of CSS (e.g. as interpreted by search engines).

This markup fragment is of an IM conversation.

<p dir=auto class="u1"><b><bdi>Student</bdi>:</b> How do you write "What's your name?" in Arabic?</p>
<p dir=auto class="u2"><b><bdi>Teacher</bdi>:</b> ما اسمك؟</p>
<p dir=auto class="u1"><b><bdi>Student</bdi>:</b> Thanks.</p>
<p dir=auto class="u2"><b><bdi>Teacher</bdi>:</b> That's written "شكرًا".</p>
<p dir=auto class="u2"><b><bdi>Teacher</bdi>:</b> Do you know how to write "Please"?</p>
<p dir=auto class="u1"><b><bdi>Student</bdi>:</b> "من فضلك", right?</p>

Given a suitable style sheet and the default alignment styles for the p element, namely to align the text to the start edge of the paragraph, the resulting rendering could be as follows:

Each paragraph rendered as a separate block, with the paragraphs left-aligned except the second paragraph and the last one, which would  be right aligned, with the usernames ('Student' and 'Teacher' in this example) flush right, with a colon to their left, and the text first to the left of that.

As noted earlier, the auto value is not a panacea. The final paragraph in this example is misinterpreted as being right-to-left text, since it begins with an Arabic character, which causes the "right?" to be to the left of the Arabic text.

3.2.5.7 The class attribute

Every HTML element may have a class attribute specified.

The attribute, if specified, must have a value that is a set of space-separated tokens representing the various classes that the element belongs to.

Assigning classes to an element affects class matching in selectors in CSS, the getElementsByClassName() method in the DOM, and other such features.

There are no additional restrictions on the tokens authors can use in the class attribute, but authors are encouraged to use values that describe the nature of the content, rather than values that describe the desired presentation of the content.

3.2.5.8 The style attribute

All HTML elements may have the style content attribute set. This is a CSS styling attribute as defined by the CSS Styling Attribute Syntax specification. [CSSATTR]

Documents that use style attributes on any of their elements must still be comprehensible and usable if those attributes were removed.

In particular, using the style attribute to hide and show content, or to convey meaning that is otherwise not included in the document, is non-conforming. (To hide and show content, use the hidden attribute.)


element . style

Returns a CSSStyleDeclaration object for the element's style attribute.

In the following example, the words that refer to colors are marked up using the span element and the style attribute to make those words show up in the relevant colors in visual media.

<p>My sweat suit is <span style="color: green; background:
transparent">green</span> and my eyes are <span style="color: blue;
background: transparent">blue</span>.</p>
3.2.5.9 Embedding custom non-visible data with the data-* attributes

A custom data attribute is an attribute in no namespace whose name starts with the string "data-", has at least one character after the hyphen, is XML-compatible, and contains no uppercase ASCII letters.

All attribute names on HTML elements in HTML documents get ASCII-lowercased automatically, so the restriction on ASCII uppercase letters doesn't affect such documents.

Custom data attributes are intended to store custom data private to the page or application, for which there are no more appropriate attributes or elements.

These attributes are not intended for use by software that is not known to the administrators of the site that uses the attributes. For generic extensions that are to be used by multiple independent tools, either this specification should be extended to provide the feature explicitly, or a technology like microdata should be used (with a standardised vocabulary).

For instance, a site about music could annotate list items representing tracks in an album with custom data attributes containing the length of each track. This information could then be used by the site itself to allow the user to sort the list by track length, or to filter the list for tracks of certain lengths.

<ol>
 <li data-length="2m11s">Beyond The Sea</li>
 ...
</ol>

It would be inappropriate, however, for the user to use generic software not associated with that music site to search for tracks of a certain length by looking at this data.

This is because these attributes are intended for use by the site's own scripts, and are not a generic extension mechanism for publicly-usable metadata.

Similarly, a page author could write markup that provides information for a translation tool that they are intending to use:

<p>The third <span data-mytrans-de="Anspruch">claim</span> covers the case of <span
translate="no">HTML</span> markup.</p>

In this example, the "data-mytrans-de" attribute gives specific text for the MyTrans product to use when translating the phrase "claim" to German. However, the standard translate attribute is used to tell it that in all languages, "HTML" is to remain unchanged. When a standard attribute is available, there is no need for a custom data attribute to be used.

Every HTML element may have any number of custom data attributes specified, with any value.


element . dataset

Returns a DOMStringMap object for the element's data-* attributes.

Hyphenated names become camel-cased. For example, data-foo-bar="" becomes element.dataset.fooBar.

If a Web page wanted an element to represent a space ship, e.g. as part of a game, it would have to use the class attribute along with data-* attributes:

<div class="spaceship" data-ship-id="92432"
     data-weapons="laser 2" data-shields="50%"
     data-x="30" data-y="10" data-z="90">
 <button class="fire"
         onclick="spaceships[this.parentNode.dataset.shipId].fire()">
  Fire
 </button>
</div>

Notice how the hyphenated attribute name becomes camel-cased in the API.

Authors should carefully design such extensions so that when the attributes are ignored and any associated CSS dropped, the page is still usable.

JavaScript libraries may use the custom data attributes, as they are considered to be part of the page on which they are used. Authors of libraries that are reused by many authors are encouraged to include their name in the attribute names, to reduce the risk of clashes. Where it makes sense, library authors are also encouraged to make the exact name used in the attribute names customizable, so that libraries whose authors unknowingly picked the same name can be used on the same page, and so that multiple versions of a particular library can be used on the same page even when those versions are not mutually compatible.

For example, a library called "DoQuery" could use attribute names like data-doquery-range, and a library called "jJo" could use attributes names like data-jjo-range. The jJo library could also provide an API to set which prefix to use (e.g. J.setDataPrefix('j2'), making the attributes have names like data-j2-range).

3.2.6 Requirements relating to the bidirectional algorithm

Text content in HTML elements with Text nodes in their contents, and text in attributes of HTML elements that allow free-form text, may contain characters in the ranges U+202A to U+202E and U+2066 to U+2069 (the bidirectional-algorithm formatting characters). However, the use of these characters is restricted so that any embedding or overrides generated by these characters do not start and end with different parent elements, and so that all such embeddings and overrides are explicitly terminated by a U+202C POP DIRECTIONAL FORMATTING character. This helps reduce incidences of text being reused in a manner that has unforeseen effects on the bidirectional algorithm. [BIDI]

The aforementioned restrictions are defined by specifying that certain parts of documents form bidirectional-algorithm formatting character ranges, and then imposing a requirement on such ranges.

The strings resulting from applying the following algorithm to an HTML element element are bidirectional-algorithm formatting character ranges:

  1. Let output be an empty list of strings.

  2. Let string be an empty string.

  3. Let node be the first child node of element, if any, or null otherwise.

  4. Loop: If node is null, jump to the step labeled end.

  5. Process node according to the first matching step from the following list:

    If node is a Text node

    Append the text data of node to string.

    If node is a br element
    If node is an HTML element that is flow content but that is not also phrasing content

    If string is not the empty string, push string onto output, and let string be empty string.

    Otherwise
    Do nothing.
  6. Let node be node's next sibling, if any, or null otherwise.

  7. Jump to the step labeled loop.

  8. End: If string is not the empty string, push string onto output.

  9. Return output as the bidirectional-algorithm formatting character ranges.

The value of a namespace-less attribute of an HTML element is a bidirectional-algorithm formatting character range.

Any strings that, as described above, are bidirectional-algorithm formatting character ranges must match the string production in the following ABNF, the character set for which is Unicode. [ABNF]

string        = *( plaintext ( embedding / override / isolation ) ) plaintext
embedding     = ( lre / rle ) string pdf
override      = ( lro / rlo ) string pdf
isolation     = ( lri / rli / fsi ) string pdi
lre           = %x202A ; U+202A LEFT-TO-RIGHT EMBEDDING
rle           = %x202B ; U+202B RIGHT-TO-LEFT EMBEDDING
lro           = %x202D ; U+202D LEFT-TO-RIGHT OVERRIDE
rlo           = %x202E ; U+202E RIGHT-TO-LEFT OVERRIDE
pdf           = %x202C ; U+202C POP DIRECTIONAL FORMATTING
lri           = %x2066 ; U+2066 LEFT-TO-RIGHT ISOLATE
rli           = %x2067 ; U+2067 RIGHT-TO-LEFT ISOLATE
fsi           = %x2068 ; U+2068 FIRST STRONG ISOLATE
pdi           = %x2069 ; U+2069 POP DIRECTIONAL ISOLATE
plaintext     = *( %x0000-2029 / %x202F-2065 / %x206A-10FFFF )
                ; any string with no bidirectional-algorithm formatting characters

While the U+2069 POP DIRECTIONAL ISOLATE character implicitly also ends open embeddings and overrides, text that relies on this implicit scope closure is not conforming to this specification. All strings of embeddings, overrides, and isolations need to be explicitly terminated to conform to this section's requirements.

Authors are encouraged to use the dir attribute, the bdo element, and the bdi element, rather than maintaining the bidirectional-algorithm formatting characters manually. The bidirectional-algorithm formatting characters interact poorly with CSS.

3.2.7 WAI-ARIA

Authors may use the ARIA role and aria-* attributes on HTML elements, in accordance with the requirements described in the ARIA specifications, except where these conflict with the strong native semantics or are equal to the default implicit ARIA semantics described below. These exceptions are intended to prevent authors from making assistive technology products report nonsensical states that do not represent the actual state of the document. [ARIA]

Authors must not set the ARIA role and aria-* attributes in a manner that conflicts with the semantics described in the following table, except that the presentation role may always be used. Authors must not set the ARIA role and aria-* attributes to values that match the default implicit ARIA semantics defined in the following two tables.

The following table defines the strong native semantics and corresponding default implicit ARIA semantics that apply to HTML elements. Each language feature (element or attribute) in a cell in the first column implies the ARIA semantics (any role, states, and properties) given in the cell in the second column of the same row.

Language feature Strong native semantics and default implicit ARIA semantics
area element that creates a hyperlink link role
base element No role
datalist element listbox role, with the aria-multiselectable property set to "false"
details element aria-expanded state set to "true" if the element's open attribute is present, and set to "false" otherwise
dialog element without an open attribute The aria-hidden state set to "true"
fieldset element group role
head element No role, with the aria-hidden state set to "true"
hgroup element heading role, with the aria-level property set to the element's outline depth
hr element separator role
html element No role
img element whose alt attribute's value is empty presentation role
input element with a type attribute in the Checkbox state aria-checked state set to "mixed" if the element's indeterminate IDL attribute is true, or "true" if the element's checkedness is true, or "false" otherwise
input element with a type attribute in the Color state No role
input element with a type attribute in the Date state No role, with the aria-readonly property set to "true" if the element has a readonly attribute
input element with a type attribute in the Date and Time state No role, with the aria-readonly property set to "true" if the element has a readonly attribute
input element with a type attribute in the Local Date and Time state No role, with the aria-readonly property set to "true" if the element has a readonly attribute
input element with a type attribute in the E-mail state with no suggestions source element textbox role, with the aria-readonly property set to "true" if the element has a readonly attribute
input element with a type attribute in the File Upload state No role
input element with a type attribute in the Hidden state No role
input element with a type attribute in the Month state No role, with the aria-readonly property set to "true" if the element has a readonly attribute
input element with a type attribute in the Number state spinbutton role, with the aria-readonly property set to "true" if the element has a readonly attribute, the aria-valuemax property set to the element's maximum, the aria-valuemin property set to the element's minimum, and, if the result of applying the rules for parsing floating-point number values to the element's value is a number, with the aria-valuenow property set to that number
input element with a type attribute in the Password state textbox role, with the aria-readonly property set to "true" if the element has a readonly attribute
input element with a type attribute in the Radio Button state aria-checked state set to "true" if the element's checkedness is true, or "false" otherwise
input element with a type attribute in the Range state and the multiple attribute not specified slider role, with the aria-valuemax property set to the element's maximum, the aria-valuemin property set to the element's minimum, and the aria-valuenow property set to the result of applying the rules for parsing floating-point number values to the element's value, if that results in a number, or the default value otherwise
input element with a type attribute in the Range state and the multiple attribute not specified aria-valuemax property set to the element's maximum, and the aria-valuemin property set to the element's minimum
input element with a type attribute in the Reset Button state button role
input element with a type attribute in the Search state with no suggestions source element textbox role, with the aria-readonly property set to "true" if the element has a readonly attribute
input element with a type attribute in the Submit Button state button role
input element with a type attribute in the Telephone state with no suggestions source element textbox role, with the aria-readonly property set to "true" if the element has a readonly attribute
input element with a type attribute in the Text state with no suggestions source element textbox role, with the aria-readonly property set to "true" if the element has a readonly attribute
input element with a type attribute in the Text, Search, Telephone, URL, or E-mail states with a suggestions source element combobox role, with the aria-owns property set to the same value as the list attribute, and the aria-readonly property set to "true" if the element has a readonly attribute
input element with a type attribute in the Time state No role, with the aria-readonly property set to "true" if the element has a readonly attribute
input element with a type attribute in the URL state with no suggestions source element textbox role, with the aria-readonly property set to "true" if the element has a readonly attribute
input element with a type attribute in the Week state No role, with the aria-readonly property set to "true" if the element has a readonly attribute
input element that is required The aria-required state set to "true"
keygen element No role
label element No role
link element that creates a hyperlink link role
menu element with a type attribute in the popup menu state No role
meta element No role
meter element No role
nav element navigation role
noscript element No role, with the aria-hidden state set to "true"
optgroup element No role
option element that is in a list of options aria-selected and aria-checked states set to "true" if the element's selectedness is true, and "false" otherwise
option element that represents a suggestion in a datalist element or that is in a list of options of a select element with a multiple attribute or a display size greater than 1 option role
param element No role
progress element progressbar role, with, if the progress bar is determinate, the aria-valuemax property set to the maximum value of the progress bar, the aria-valuemin property set to zero, and the aria-valuenow property set to the current value of the progress bar
script element No role, with the aria-hidden state set to "true"
select element with a multiple attribute listbox role, with the aria-multiselectable property set to "true"
select element with no multiple attribute and with a display size equal to 1 aria-multiselectable property set to "false"
select element with no multiple attribute and with a display size greater than 1 listbox role, with the aria-multiselectable property set to "false"
select element with a required attribute The aria-required state set to "true"
source element No role
style element No role, with the aria-hidden state set to "true"
summary element No role
table element grid role with the aria-readonly property set to "true"
tbody element rowgroup role
td element gridcell role
template element No role, with the aria-hidden state set to "true"
textarea element textbox role, with the aria-multiline property set to "true", and the aria-readonly property set to "true" if the element has a readonly attribute
textarea element with a required attribute The aria-required state set to "true"
tfoot element rowgroup role
thead element rowgroup role
th element that is a sorting-capable th element whose column key ordinality is 1 columnheader role, with the aria-sort state set to "ascending" if the element's column sort direction is normal, and "descending" otherwise.
title element No role
tr element row role
Element that is disabled The aria-disabled state set to "true"
Element that is inert The aria-hidden state set to "true"
Element with a hidden attribute The aria-hidden state set to "true"
Element that is a candidate for constraint validation but that does not satisfy its constraints The aria-invalid state set to "true"

Some HTML elements have native semantics that can be overridden. The following table lists these elements and their default implicit ARIA semantics, along with the restrictions that apply to those elements. Each language feature (element or attribute) in a cell in the first column implies, unless otherwise overridden, the ARIA semantic (role, state, or property) given in the cell in the second column of the same row, but this semantic may be overridden under the conditions listed in the cell in the third column of that row. In addition, any element may be given the presentation role, regardless of the restrictions below.

Language feature Default implicit ARIA semantic Restrictions
a element that creates a hyperlink link role Role must be either link, menuitem, tab, or treeitem
address element No role If specified, role must be contentinfo
article element article role Role must be either article, document, application, or main
aside element complementary role Role must be either complementary, note, or search
audio element No role If specified, role must be application
button element button role Role must be either button, menuitem
details element group role Role must be a role that supports aria-expanded
dialog element dialog role Role must be either alert, alertdialog, application, contentinfo, dialog, document, log, main, marquee, region, search, or status
embed element No role If specified, role must be either application, document, or img
footer element No role If specified, role must be contentinfo
h1 element that does not have an hgroup ancestor heading role, with the aria-level property set to the element's outline depth Role must be either heading or tab
h2 element that does not have an hgroup ancestor heading role, with the aria-level property set to the element's outline depth Role must be either heading or tab
h3 element that does not have an hgroup ancestor heading role, with the aria-level property set to the element's outline depth Role must be either heading or tab
h4 element that does not have an hgroup ancestor heading role, with the aria-level property set to the element's outline depth Role must be either heading or tab
h5 element that does not have an hgroup ancestor heading role, with the aria-level property set to the element's outline depth Role must be either heading or tab
h6 element that does not have an hgroup ancestor heading role, with the aria-level property set to the element's outline depth Role must be either heading or tab
header element No role If specified, role must be banner
iframe element No role If specified, role must be either application, document, or img
img element whose alt attribute's value is absent img role No restrictions
img element whose alt attribute's value is present and not empty img role No restrictions
input element with a type attribute in the Button state button role Role must be either button, menuitem
input element with a type attribute in the Checkbox state checkbox role Role must be either checkbox or menuitemcheckbox
input element with a type attribute in the Image Button state button role Role must be either button, menuitem
input element with a type attribute in the Radio Button state radio role Role must be either radio or menuitemradio
li element whose parent is an ol or ul element listitem role Role must be either listitem, menuitemcheckbox, menuitemradio, option, tab, or treeitem
main element main role Role must be either document, application, or main
menu element with a type attribute in the toolbar state toolbar role Role must be either directory, list, listbox, menu, menubar, tablist, toolbar, or tree
object element No role If specified, role must be either application, document, or img
ol element list role Role must be either directory, group, list, listbox, menu, menubar, tablist, toolbar, or tree
option element that is in a list of options of a select element with no multiple attribute and with a display size equal to 1 option role Role must be either option, menuitem, menuitemradio, or separator
output element status role No restrictions
section element region role Role must be either alert, alertdialog, application, contentinfo, dialog, document, log, main, marquee, region, search, or status
select element with no multiple attribute and with a display size equal to 1 listbox role Role must be either listbox or menu
th element that is not a sorting-capable th element or whose column key ordinality is not 1 gridcell role Role must be either columnheader, rowheader, or gridcell
ul element list role Role must be either directory, group, list, listbox, menu, menubar, tablist, toolbar, or tree
video element No role If specified, role must be application
The body element document role Role must be either document or application

The entry "no role", when used as a strong native semantic, means that no role other than presentation can be used. When used as a default implicit ARIA semantic, it means the user agent has no default mapping to ARIA roles. (However, it probably will have its own mappings to the accessibility layer.)

These features can be used to make accessibility tools render content to their users in more useful ways. For example, ASCII art, which is really an image, appears to be text, and in the absence of appropriate annotations would end up being rendered by screen readers as a very painful reading of lots of punctuation. Using the features described in this section, one can instead make the ATs skip the ASCII art and just read the caption:

<figure role="img" aria-labelledby="fish-caption"> 
 <pre>
 o           .'`/
     '      /  (
   O    .-'` ` `'-._      .')
      _/ (o)        '.  .' /
      )       )))     ><  <
      `\  |_\      _.'  '. \
        '-._  _ .-'       '.)
    jgs     `\__\
 </pre>
 <figcaption id="fish-caption">
  Joan G. Stark, "<cite>fish</cite>".
  October 1997. ASCII on electrons. 28×8.
 </figcaption>
</figure>