Location: http://xmlsoft.org/xmldtd.html
Libxml home page: http://xmlsoft.org/
Mailing-list archive: http://xmlsoft.org/messages/
Version: $Revision: 1.2 $
Table of Content:
DTD is the acronym for Document Type Definition. This is a description of the content for a familly of XML files. This is part of the XML 1.0 specification, and alows to describe and check that a given document instance conforms to a set of rules detailing its structure and content.
The W3C XML Recommendation (Tim Bray's annotated version of Rev1):
(unfortunately) all this is inherited from the SGML world, the syntax is ancient...
Writing DTD can be done in multiple ways, the rules to build them if you need something fixed or something which can evolve over time can be radically different. Really complex DTD like Docbook ones are flexible but quite harder to design. I will just focuse on DTDs for a formats with a fixed simple structure. It is just a set of basic rules, and definitely not exhaustive nor useable for complex DTD design.
Assuming the top element of the document is spec
and the dtd
is placed in the file mydtd
in the subdirectory dtds
of the directory from where the document were loaded:
<!DOCTYPE spec SYSTEM "dtds/mydtd">
Notes:
PUBLIC
identifier (a
magic string) so that the DTd is looked up in catalogs on the client side
without having to locate it on the webDOCTYPE
declaration.The following declares an element spec
:
<!ELEMENT spec (front, body, back?)>
it also expresses that the spec element contains one front
,
one body
and one optionnal back
children elements in
this order. The declaration of one element of the structure and its content
are done in a single declaration. Similary the following declares
div1
elements:
<!ELEMENT div1 (head, (p | list | note)*, div2*)>
means div1 contains one head
then a series of optional
p
, list
s and note
s and then an optional
div2
. And last but not least an element can contain text:
<!ELEMENT b (#PCDATA)>
b
contains text or being of mixed content (text and elements
in no particular order):
<!ELEMENT p (#PCDATA|a|ul|b|i|em)*>
p
can contain text or a
, ul
,
b
, i
or em
elements in no particular
order.
again the attributes declaration includes their content definition:
<!ATTLIST termdef name CDATA #IMPLIED>
means that the element termdef
can have a name
attribute containing text (CDATA
) and which is optionnal
(#IMPLIED
). The attribute value can also be defined within a
set:
<!ATTLIST list type (bullets|ordered|glossary)
"ordered">
means list
element have a type
attribute with 3
allowed values "bullets", "ordered" or "glossary" and which default to
"ordered" if the attribute is not explicitely specified.
The content type of an attribute can be text (CDATA
),
anchor/reference/references
(ID
/IDREF
/IDREFS
), entity(ies)
(ENTITY
/ENTITIES
) or name(s)
(NMTOKEN
/NMTOKENS
). The following defines that a
chapter
element can have an optional id
attribute of
type ID
, usable for reference from attribute of type IDREF:
<!ATTLIST chapter id ID #IMPLIED>
The last value of an attribute definition can be #REQUIRED
meaning that the attribute has to be given, #IMPLIED
meaning that it is optional, or the default value (possibly prefixed by
#FIXED
if it is the only allowed).
Notes:
<!ATTLIST termdef id ID #REQUIRED name CDATA #IMPLIED>
The previous construct defines both id
and
name
attributes for the element termdef
The directory test/valid/dtds/
in the libxml distribution
contains some complex DTD examples. The test/valid/dia.xml
example shows an XML file where the simple DTD is directly included within the
document.
The simplest is to use the xmllint program comming with libxml. The
--valid
option turn on validation of the files given as input,
for example the following validates a copy of the first revision of the XML
1.0 specification:
xmllint --valid --noout test/valid/REC-xml-19980210.xml
the -- noout is used to not output the resulting tree.
The --dtdvalid dtd
allows to validate the document(s) against
a given DTD.
Libxml exports an API to handle DTDs and validation, check the associated description.
DTDs are as old as SGML. So there may be a number of examples on-line, I will just list one for now, others pointers welcome:
$Id: xmldtd.html,v 1.2 2000/11/24 13:28:38 veillard Exp $