Accessible PDF files are referred to as “tagged PDFs,” even though there is more to an accessible PDF than tags. PDF tags provide the structured, textual representation of the PDF content that is accessed by screen readers, for the purpose of reading a document out-loud. They exist for accessibility purposes only and have no visible effect on the PDF file.
HTML tags and PDF tags often use similar tag names (e.g., both have tags named h1) and organization structures, but they really are quite different.These standard tags provide assistive software and devices with semantic and structural elements to use to interpret document structure and present content in a useful manner.
The PDF tags architecture is extensible, so any PDF document can contain any tag set that an authoring application decides to use. For example, a PDF can have XML tags that came in from an XML schema. Custom tags that you define (such as tag names generated from paragraph styles of an authoring application) need a role map. The role map matches each custom tag to a standard tag here. When assistive software encounters a custom tag, the software can check this role map and properly interpret the tags. Tagging PDFs by using one of the methods described here generally produces a correct role map for the document.
The tag structure is usually organized within block-level elements, which are page elements that consist of text laid out in paragraph-like forms. Block-level elements are part of a document’s logical structure. Such elements are further classified as container elements, heading and paragraph elements, label and list elements, special text elements, and table elements.
Recognized PDF Tags
Container Elements
Container elements are the highest level of element and provide hierarchical grouping for other block-level elements.
Document: Document element. The root element of a document’s tag tree.
Part: Part element. A large division of a document; may group smaller units of content together, such as division elements, article elements, or section elements.
Div: Division element. A generic block-level element or group of block-level elements.
Art: Article element. A self-contained body of text considered to be a single narrative.
Sect: Section element. A general container element type, comparable to Division (DIV Class=”Sect”) in HTML, which is usually a component of a part element or an article element.
Heading and paragraph elements
Heading and paragraph elements are paragraph-like, block-level elements that include specific level heading and generic paragraph (P) tags. A heading (H) element should appear as the first child of any higher-level division. Six levels of headings (H1 to H6) are available for applications that don’t hierarchically nest sections.
Label and list elements:
Label and list elements are block-level elements used for structuring lists.
L: List element. Any sequence of items of similar meaning or other relevance; immediate child elements should be list item elements.
LI: List item element. Any one member of a list; may have a label element (optional) and a list body element (required) as a child.
Lbl: Label element. A bullet, name, or number that identifies and distinguishes an element from others in the same list.
LBody: List item body element. The descriptive content of a list item.
Special text elements:
Special text elements identify text that isn’t used as a generic paragraph (P).
Block Quote: Block quote element. One or more paragraphs of text attributed to someone other than the author of the immediate surrounding text.
Caption: Caption element. A brief portion of text that describes a table or a figure.
Index: Index element. A sequence of entries that contain identifying text and reference elements that point out the occurrence of the text in the main body of the document.
TOC: Table of contents element. An element that contains a structured list of items and labels identifying those items; has its own discrete hierarchy.
TOCI: Table of contents item element. An item contained in a list associated with a table of contents element.
Table elements:
Table elements are special elements for structuring tables.
Table: Table element. A two-dimensional arrangement of data or text cells that contains table row elements as child elements and may have a caption element as its first or last child element.
TR: Table row element. One row of headings or data in a table; may contain table header cell elements and table data cell elements.
TD: Table data cell element. A table cell that contains nonheader data.
TH: Table header cell element. A table cell that contains header text or data describing one or more rows or columns of a table.