Header javaperspective.com
JavaPerspective.com  >   Advanced Tutorials  >   3. XML processing with JDOM  >   3.2. What is XML validation?

3.2. What is XML validation?
Last updated: 7 February 2013.

When creating an XML document, if you follow the rules as detailed in the previous tutorial, the created XML document will be well formed. Nonetheless, although your XML document is well formed, it may not be valid. Typically, when two applications exchange XML documents with each other, the XML structure is agreed by both parties. However, when producing an XML document for another application, you may forget to include a node or an attribute. Worse, you may have inadvertently included a node or attribute that is not supposed to be in the XML document.

If you want to make sure that your XML document complies with a given structure, you can use a DTD (Document Type Definition) or an XSD (XML Schema Definition). DTDs and XSDs are optional, which means you can process XML documents in Java without using DTDs or XSDs. So in case you are not interested in using DTDs or XSDs at all, you can skip the rest of this page and proceed to the next tutorial.


3.2.1. DTD (Document Type Definition)

A DTD specifies the exact list of elements and attributes the XML document must contain. In fact, you can either create your own private DTD or use a public DTD if you find one that meets your needs (a public DTD is identified by a unique URI). This section will show you how to create and use your own private DTD.

A DTD can be either internal or external to the XML document it describes: you can either declare the DTD inside the XML document or outside (in a separate file with the extension .dtd). The latter allows multiple applications to use the same DTD file to validate the XML documents they exchange with each other. As an example, the external file hotel.dtd shown below defines the list of elements and attributes allowed in the file hotel.xml:

<!ELEMENT HOTEL (ROOM+)>
<!ELEMENT ROOM (NUMBER, CATEGORY, BED+, FLOOR, LIVING_ROOM*, KITCHENETTE*, JACUZZI?, AIR_CONDITIONING)>
<!ELEMENT NUMBER (#PCDATA)>
<!ELEMENT CATEGORY (#PCDATA)>
<!ELEMENT BED (#PCDATA)>
<!ELEMENT FLOOR (#PCDATA)>
<!ELEMENT LIVING_ROOM (#PCDATA)>
<!ELEMENT KITCHENETTE (#PCDATA)>
<!ELEMENT JACUZZI (#PCDATA)>
<!ELEMENT AIR_CONDITIONING (TABLE_FANS | AIR_CONDITIONERS)>
<!ELEMENT TABLE_FANS (#PCDATA)>
<!ELEMENT AIR_CONDITIONERS (#PCDATA)>

<!ATTLIST HOTEL name CDATA #IMPLIED>
<!ATTLIST HOTEL starRating CDATA #IMPLIED>

As you can see, there are two kinds of declarations in the DTD: elements' declarations (<!ELEMENT...) and attributes' declarations (<!ATTLIST...). The root element HOTEL is declared as follows:

<!ELEMENT HOTEL (ROOM+)>

The above declaration says that the children of the element HOTEL are ROOM elements. The character + indicates that there can be one or more ROOM occurrences within the element HOTEL. The possible occurrences are:If no occurrence is specified, then the occurrence is one.

The second declaration in the file hotel.dtd defines the ROOM element:

<!ELEMENT ROOM (NUMBER, CATEGORY, BED+, FLOOR, LIVING_ROOM*, KITCHENETTE*, JACUZZI?)>

The ROOM element's children are simply declared between brackets with their occurrences and in the order they must appear in the XML document.

Next, the third declaration says that the element NUMBER does not have any children but rather contains character data:

<!ELEMENT NUMBER (#PCDATA)>

Similarly, the subsequent element declarations specify that the nodes CATEGORY, BED, FLOOR, LIVING_ROOM, KITCHENETTE and JACUZZI contain character data.

You may have noticed the use of the character | in the declaration:

<!ELEMENT AIR_CONDITIONING (TABLE_FANS | AIR_CONDITIONERS)>

It indicates that AIR_CONDITIONING can have either a TABLE_FANS child or an AIR_CONDITIONERS child (not both).

Although it is not illustrated in the XML example used earlier, an element can mix up character data with other elements as shown below:

<!ELEMENT AIR_CONDITIONING (#PCDATA, (TABLE_FANS | AIR_CONDITIONERS))>

Thus, a valid AIR_CONDITIONING element in the XML document would look something like this:

<AIR_CONDITIONING>
         Air conditioners with 24 hour timer
         <AIR_CONDITIONERS>5</AIR_CONDITIONERS>
</AIR_CONDITIONING>

To finish, the attributes of each node are declared. An attribute is declared as follows:

<!ATTLIST node_name  attribute_name  attribute_type  default_value>

Take a look at the very last declaration in the file hotel.dtd:

<!ATTLIST HOTEL starRating CDATA #IMPLIED>

It says that the node HOTEL has an attribute named starRating whose type is character data (CDATA) and whose value is not required (#IMPLIED). Here is a list of the most commonly used attribute types:In the file hotel.dtd, the default value of the attributes name and starRating is #IMPLIED, which means that both of them are not required. The possible options are:If you declare a fixed attribute, the keyword FIXED must be followed by a value like this:

<!ATTLIST HOTEL freeOfCharge CDATA #FIXED no way>

Hence, the following XML declaration would be invalid:

<HOTEL freeOfCharge=of course>


How to declare an external DTD?

Once you have created an external DTD file, you must declare it in the XML document with the following syntax:

<!DOCTYPE root_element SYSTEM filename.dtd>

Here is an example:

   <?xml version="1.0"?>
   <!DOCTYPE HOTEL SYSTEM "hotel.dtd">
   <HOTEL>
            <ROOM>
                     <NUMBER>17</NUMBER>
                     <CATEGORY>STANDARD</CATEGORY>
                     <BED>DOUBLE SIZE</BED>
                     <FLOOR>5</FLOOR>
                     <LIVING_ROOM/>
                     <KITCHENETTE/>
                     <AIR_CONDITIONING>
                              <TABLE_FANS>2</TABLE_FANS>
                     </AIR_CONDITIONING>
            </ROOM>
            <ROOM>
                     <NUMBER>33</NUMBER>
                     <CATEGORY>SUITE</CATEGORY>
                     <BED>DOUBLE SIZE</BED>
                     <BED>KING SIZE</BED>
                     <FLOOR>9</FLOOR>
                     <LIVING_ROOM>17 FEET * 13 FEET</LIVING_ROOM>
                     <KITCHENETTE>10 FEET * 8 FEET</KITCHENETTE>
                     <JACUZZI>3 SEATS</JACUZZI>
                     <AIR_CONDITIONING>
                              <AIR_CONDITIONERS>5</AIR_CONDITIONERS>
                     </AIR_CONDITIONING>
            </ROOM>
   </HOTEL>


How to declare an internal DTD?

The DTD is declared directly within the XML file. Here is the syntax:

<!DOCTYPE root_element [elements_and_attributes_declarations]>

and here is an example:

   <?xml version="1.0"?>

   <!DOCTYPE HOTEL [
   <!ELEMENT HOTEL (ROOM+)>
   <!ELEMENT ROOM (NUMBER, CATEGORY, BED+, FLOOR, LIVING_ROOM*, KITCHENETTE*, JACUZZI?)>
   <!ELEMENT NUMBER (#PCDATA)>
   <!ELEMENT CATEGORY (#PCDATA)>
   <!ELEMENT BED (#PCDATA)>
   <!ELEMENT FLOOR (#PCDATA)>
   <!ELEMENT LIVING_ROOM (#PCDATA)>
   <!ELEMENT KITCHENETTE (#PCDATA)>
   <!ELEMENT JACUZZI (#PCDATA)>
   <!ELEMENT AIR_CONDITIONING (TABLE_FANS | AIR_CONDITIONERS)>
   <!ELEMENT TABLE_FANS (#PCDATA)>
   <!ELEMENT AIR_CONDITIONERS (#PCDATA)>
   <!ATTLIST HOTEL name CDATA #IMPLIED>
   <!ATTLIST HOTEL starRating CDATA #IMPLIED>
   ]>

   <HOTEL>
            <ROOM>
                     <NUMBER>17</NUMBER>
                     <CATEGORY>STANDARD</CATEGORY>
                     <BED>DOUBLE SIZE</BED>
                     <FLOOR>5</FLOOR>
                     <LIVING_ROOM/>
                     <KITCHENETTE/>
                     <AIR_CONDITIONING>
                              <TABLE_FANS>2</TABLE_FANS>
                     </AIR_CONDITIONING>
            </ROOM>
            <ROOM>
                     <NUMBER>33</NUMBER>
                     <CATEGORY>SUITE</CATEGORY>
                     <BED>DOUBLE SIZE</BED>
                     <BED>KING SIZE</BED>
                     <FLOOR>9</FLOOR>
                     <LIVING_ROOM>17 FEET * 13 FEET</LIVING_ROOM>
                     <KITCHENETTE>10 FEET * 8 FEET</KITCHENETTE>
                     <JACUZZI>3 SEATS</JACUZZI>
                     <AIR_CONDITIONING>
                              <AIR_CONDITIONERS>5</AIR_CONDITIONERS>
                     </AIR_CONDITIONING>
            </ROOM>
   </HOTEL>

When a DTD is declared in an XML document (either internally or externally), an XML parser can validate the XML document against the DTD. There will be a parsing error if the XML document is not valid.


3.2.2. XSD (XML Schema Definition)

Just like a DTD, an XSD specifies the exact list of elements and attributes within an XML document. However, XSDs provide more features. For example, they support data types and namespaces.

Explaining in detail how to set up XSDs would take too long. To avoid overloading this page, I will just show you a basic XSD example for you to see how it looks like. You can easily find more extensive explanations about XSDs online. Here is an XSD file named hotel.xsd that describes the XML file hotel.xml shown in the previous tutorial:

   <?xml version="1.0" encoding="ISO-8859-1" ?>
   <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

   <xs:element name="HOTEL">
      <xs:complexType>
         <xs:sequence>
            <xs:element name="ROOM" maxOccurs="unbounded" minOccurs="0">
               <xs:complexType>
                  <xs:sequence>
                     <xs:element name="NUMBER" type="xs:integer"/>
                     <xs:element name="CATEGORY" type="xs:string"/>
                     <xs:element name="BED" type="xs:string" maxOccurs="unbounded" minOccurs="0"/>
                     <xs:element name="FLOOR" type="xs:integer"/>
                     <xs:element name="LIVING_ROOM" type="xs:string" maxOccurs="unbounded" minOccurs="0"/>
                     <xs:element name="KITCHENETTE" type="xs:string" maxOccurs="unbounded" minOccurs="0"/>
                     <xs:element name="JACUZZI" type="xs:string" maxOccurs="2" minOccurs="0"/>
                     <xs:element name="AIR_CONDITIONING">
                        <xs:complexType>
                           <xs:choice>
                              <xs:element name="TABLE_FANS" type="xs:integer"/>
                              <xs:element name="AIR_CONDITIONERS" type="xs:integer"/>
                           </xs:choice>
                        </xs:complexType>
                     </xs:element>
                  </xs:sequence>
               </xs:complexType>
            </xs:element>
         </xs:sequence>
      </xs:complexType>
   </xs:element>

   </xs:schema>

Here is how you reference the XSD file hotel.xsd shown above in the file hotel.xml:

   <?xml version="1.0"?>
   <HOTEL xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="hotel.xsd">
            <ROOM>
                     <NUMBER>17</NUMBER>
                     <CATEGORY>STANDARD</CATEGORY>
                     <BED>DOUBLE SIZE</BED>
                     <FLOOR>5</FLOOR>
                     <LIVING_ROOM/>
                     <KITCHENETTE/>
                     <AIR_CONDITIONING>
                              <TABLE_FANS>2</TABLE_FANS>
                     </AIR_CONDITIONING>
            </ROOM>
            <ROOM>
                     <NUMBER>33</NUMBER>
                     <CATEGORY>SUITE</CATEGORY>
                     <BED>DOUBLE SIZE</BED>
                     <BED>KING SIZE</BED>
                     <FLOOR>9</FLOOR>
                     <LIVING_ROOM>17 FEET * 13 FEET</LIVING_ROOM>
                     <KITCHENETTE>10 FEET * 8 FEET</KITCHENETTE>
                     <JACUZZI>3 SEATS</JACUZZI>
                     <AIR_CONDITIONING>
                              <AIR_CONDITIONERS>5</AIR_CONDITIONERS>
                     </AIR_CONDITIONING>
            </ROOM>
   </HOTEL>


You are here :  JavaPerspective.com  >   Advanced Tutorials  >   3. XML processing with JDOM  >   3.2. What is XML validation?
Next tutorial :  JavaPerspective.com  >   Advanced Tutorials  >   3. XML processing with JDOM  >   3.3. What is JDOM?

Copyright © 2013. JavaPerspective.com. All rights reserved.  ( Terms | Contact | About ) 
Java is a trademark of Oracle Corporation
Image 1 Image 2 Image 3 Image 4 Image 5 Image 6 Image 7