Received: from FORT-POINT-STATION.MIT.EDU by po10 (5.61/4.7) id AA04954; Tue, 27 Jun 00 16:50:19 EDT
Received: from hermes.java.sun.com (hermes.javasoft.com [204.160.241.85])
	by fort-point-station.mit.edu (8.9.2/8.9.2) with ESMTP id QAA15217;
	Tue, 27 Jun 2000 16:47:57 -0400 (EDT)
Received: (from nobody@localhost)
	by hermes.java.sun.com (8.9.3+Sun/8.9.1) id UAA04917;
	Tue, 27 Jun 2000 20:44:18 GMT
Date: Tue, 27 Jun 2000 20:44:18 GMT
Message-Id: <200006272044.UAA04917@hermes.java.sun.com>
X-Authentication-Warning: hermes.java.sun.com: Processed from queue /bulkmail/data/ed_82/mqueue8
X-Mailing: 224
From: JDCTechTips@sun.com
Subject: JDC Tech Tips  June 27, 2000
To: JDCMember@sun.com
Reply-To: JDCTechTips@sun.com
Errors-To: bounced_mail@hermes.java.sun.com
Precedence: junk
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Beyond Email 2.2


 J  D  C    T  E  C  H    T  I  P  S

                      TIPS, TECHNIQUES, AND SAMPLE CODE


WELCOME to the Java Developer Connection(sm) (JDC) Tech Tips, 
June 27, 2000. This issue is covers some aspects of using the 
Java(tm) programming language with XML. First there's a short 
introduction to XML, followed by tips on how to use two APIs 
designed for use with XML. The tips are:

         * Using the SAX API
         * Using the DOM API
                  
These tips were developed using Java(tm) 2 SDK, Standard Edition, 
v 1.3.

This issue of the JDC Tech Tips is written by Stuart Halloway,
a Java specialist at DevelopMentor (http://www.develop.com/java).

You can view this issue of the Tech Tips on the Web at
http://developer.java.sun.com/developer/TechTips/2000/tt0627.html

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
XML INTRODUCTION

The Extensible Markup Language (XML) is a way of specifying the 
content elements of a page to a Web browser. XML is syntactically 
similar to HTML. In fact, XML can be used in many of the places 
in which HTML is used today. Here's an example. Imagine that the 
JDC Tech Tip index was stored in XML instead of HTML. Instead of 
HTML coding such as this:

<html>
<body>
<h1>JDC Tech Tip Index</h1>
<ol><li>
<a
href="http://developer.java.sun.com/developer/TechTips/2000/tt0509.html#tip1">
Random Access for Files
</a>
</li></ol>
</body>
</html>

It might look something like this:

<?xml version="1.0" encoding="UTF-8"?>
<tips>
<author id="glen" fullName="Glen McCluskey"/>
<tip title="Random Access for Files"
     author="glen"
     htmlURL="http://developer.java.sun.com/developer/TechTips/2000/tt0509.html#tip1"
     textURL="http://developer.java.sun.com/developer/TechTips/txtarchive/May00_GlenM.txt">
</tip>
</tips>

Notice the coding similarities between XML and HTML. In each case,
the document is organized as a hierarchy of elements, where each
element is demarcated by angle brackets. As is true for most HTML
elements, each XML element consists of a start tag, followed by 
some data, followed by an end tag:

<element>element data</element>

Also as in HTML, XML elements can be annotated with attributes. 
In the XML example above, each <tip> element has several 
attributes. The 'title' attribute is the name of the tip, the 
'author' attribute gives a short form of the author's name, and 
the 'htmlURL' and 'textURL' attributes contain links to different 
archived formats of the tip.  

The similarities between the two markup languages is an important 
advantage as the world moves to XML, because hard-earned HTML 
skills continue to be useful. However, it does  beg the question 
"Why bother to switch to XML at all?" To answer this question, 
look again at the XML example above, and this time consider the 
semantics instead of the syntax. Where HTML tells you how to format 
a document, XML tells you about the content of the document. This 
capability is very powerful. In an XML world, clients can 
reorganize data in a way most useful to them. They are not 
restricted to the presentation format delivered by the server. 
Importantly, the XML format has been designed for the convenience 
of parsers, without sacrificing readability. XML imposes strong 
guarantees about the structure of documents. To name a few: begin 
tags must have end tags, elements must nest properly, and all 
attributes must have values. This strictness makes parsing and 
transforming XML much more reliable than attempting to manipulate 
HTML.

The similarities between XML and HTML stem from a shared history.  
HTML is a simplified vocabulary of a powerful markup language 
called SGML. SGML is the "kitchen sink" of markup, allowing you 
to do almost anything, including the ability to define your own 
domain-specific vocabularies. HTML is a dim shadow of SGML, with 
a predefined vocabulary. Thus HTML is basically a static snapshot 
of some presentation features that seemed useful circa 1992. Both 
SGML and HTML are problematic: SGML does everything, but is too 
complex. HTML is simple, but its parsing rules are loose, and its 
vocabulary does not provide a standard mechanism for extension. 
XML, by comparison, is a streamlined version of SGML. It aims to 
meet the most important objectives of SGML without too much 
complexity. If SGML is the "kitchen sink," XML is a "Swiss Army 
knife."  

Given its advantages, XML does far more than simply displace HTML 
in some applications. It can also displace SGML, and open new 
opportunities where the complexity of SGML had been a barrier.  
Regardless of how you plan to use XML, the programming language of 
choice is likely to be the Java programming language. You could 
write your own code to parse XML directly, the Java language 
provides higher level tools to parse XML documents through the 
the Simple API for XML (SAX) and the Document Object Model (DOM) 
interfaces. The SAX and DOM parsers are standards that are 
implemented in several different languages.  In the Java 
programming language, you can instantiate the parsers by using the 
Java(tm) API for XML Parsing (JAXP). 

To execute the code in this tip, you will need to download JAXP 
and a reference implementation of the SAX and DOM parsers from 
http://java.sun.com/xml/download.html. You will also need to 
download SAX 2.0 from http://www.megginson.com/SAX/Java. Remember
to update your class path to include the jaxp, parser, and sax2 
JAR files.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
USING THE SAX API

The SAX API provides a serial mechanism for accessing XML 
documents. It was developed by members of the XML-DEV mailing list 
as a standard set of interfaces to allow different vendor 
implementations. The SAX model allows for simple parsers by 
allowing parsers to read through a document in a linear way, and 
then to call an event handler every time a markup event occurs. 
The original SAX implementation was released in May 1998. It was 
superseded by SAX 2.0 in May 2000. (The code is this tip is SAX2 
compliant.)  

All you have to do to use SAX2 for notification of markup events,
is implement a few methods and interfaces. The ContentHandler 
interface is the most important of these interfaces. It declares 
a number of methods for different steps in parsing an XML document. 
In many cases, you will only be interested in few of these methods.  
For example, the code below handles only a single ContentHandler 
method (startElement), and uses it to build an HTML page from the 
XML Tech Tip Index: 

import java.io.*;
import java.net.*;
import java.util.*;
import javax.xml.parsers.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
/**
 * Builds a simple HTML page which lists tip titles 
 * and provides links to HTML and text versions
 */
public class UseSAX2 extends DefaultHandler {
    StringBuffer htmlOut;

    public String toString() {
        if (htmlOut != null)
            return htmlOut.toString();
        return super.toString();    
    }
    
    public void startElement(String namespace, 
                            String localName, 
                            String qName,
                            Attributes atts) {
        if (localName.equals("tip")) 
        {
            String title = atts.getValue("title");
            String html = atts.getValue("htmlURL");
            String text = atts.getValue("textURL");
            htmlOut.append("<br>");
            htmlOut.append("<A HREF=");
            htmlOut.append(html);
            htmlOut.append(">HTML</A> <A HREF=");
            htmlOut.append(text);
            htmlOut.append(">TEXT</A> ");           
            htmlOut.append(title);
        }
    }
    
    public void processWithSAX(String urlString) throws Exception {
        System.out.println("Processing URL " + urlString);
        htmlOut = new StringBuffer("<HTML><BODY><H1>JDC Tech Tips Archive</H1>");
        SAXParserFactory spf = SAXParserFactory.newInstance();
        SAXParser sp = spf.newSAXParser();
        ParserAdapter pa = new ParserAdapter(sp.getParser());
        pa.setContentHandler(this);
        pa.parse(urlString);
        htmlOut.append("</BODY></HTML>");
    }

    public static void main(String[] args) {
        try {
            UseSAX2 us = new UseSAX2();
            us.processWithSAX(args[0]);
            String output = us.toString();
            System.out.println("Saving result to " + args[1]);
            FileWriter fw = new FileWriter(args[1]);
            fw.write(output, 0, output.length());
            fw.flush();
        }
        catch (Throwable t) {
            t.printStackTrace();
        }
    }
}

To test the program, you can use the XML fragment in the XML
Introduction that precedes this tip, or download a longer version 
from http://staff.develop.com/halloway/TechTips/TechTipArchive.xml.  
Save the XML fragment or the longer XML version in your local 
directory as TechTipArchive.xml. You can then produce an HTML 
version with the command: 

java UseSAX2 file:TechTipArchive.xml SimpleList.html

Then use your browser of choice to view SimpleList.html, and 
follow links to either text or HTML versions of recent Tech Tips.  
(In a production scenario you would probably merge this code into 
a client browser or into a servlet or JSP page on the server.) 

There are several interesting points about the code above. Notice
the steps in creating the parser. 

SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();

In JAXP, the SAXParser class is not created directly, but instead 
through the factory method newSAXParser(). This allows different 
implementations to be plug-compatible without source code changes.  
The factory also provides control over more advanced parsing 
features such as namespace support and validation. Even after you 
have the JAXP parser instance, you still aren't ready to parse.  
The current JAXP parser only supports SAX 1.0; to get SAX 2.0 
support, you must wrap the parser in a ParserAdapter.  

ParserAdapter pa = new ParserAdapter(sp.getParser());

The ParserAdapter class adds SAX2 functionality to an existing 
SAX1 parser and is part of the SAX2 download.  

Notice that instead of implementing the ContentHandler interface, 
UseSAX extends the DefaultHandler class. DefaultHandler is an 
adapter class that provides an empty implementation of all the 
ContentHandler methods, so only the methods that are of interest 
need to be overridden.  

The startElement() method does the real work. Because the program 
only wants to list the tips by title, the <tip> element is 
all-important, and the <tips> and <author> elements are ignored.  
The startElement method checks the element name and continues 
only if the current element is <tip>. The method also provides 
access to an element's attributes via an Attributes reference, so 
it is easy to extract the tip name, htmlURL, and textURL. 

The end result of this exercise is an HTML document that allows you 
to browse the list of recent Tech Tips. You could have done this
directly by coding in HTML. But doing this in XML, and writing the 
SAX code provides additional flexibility. If another person wanted 
to view the Tech Tips sorted by date, or by author, or filtered by 
some constraint, then various views could be generated from a 
single XML file, with different parsing code for each view.  

Unfortunately, as the XML data gets more complicated, the sample 
above becomes more difficult to code and maintain. The example 
suffers from two problems. First, the code to generate the HTML 
output is just raw string manipulation, which makes it easy to 
lose a '>' or a '/' somewhere. Second, the SAX API doesn't remember 
much; if you need to refer back to some earlier element, then you 
have to build your own state machine to remember the elements that 
have already been parsed. 

The Document Object Model (DOM) API solves both of these problems.  

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
USING THE DOM API

The DOM API is based on an entirely different model of document 
processing than the SAX API. Instead of reading a document 
one piece at a time (as with SAX), a DOM parser reads an entire 
document. It then makes the tree for the entire document available 
to program code for reading and updating. Simply put, the 
difference between SAX and DOM is the difference between 
sequential, read-only access, and random, read-write access.  

At the core of the DOM API are the Document and Node interfaces.
A Document is a top level object that represents an XML document. 
The Document holds the data as a tree of Nodes, where a Node is 
a base type that can be an element, an attribute, or some other  
type of content. The Document also acts as a factory for new 
Nodes. Nodes represent a single piece of data in the tree, and 
provide all of the popular tree operations. You can query nodes 
for their parent, their siblings, or their children. You can also 
modify the document by adding or removing Nodes.  
 
To demonstrate the DOM API, let's process the same XML document 
that got "SAXed" above. This time, let's group the output by 
author. This will take a little more work. Here's the code: 

//UseDOM.java
import java.io.*;
import java.net.*;
import java.util.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;

public class UseDOM {
    private Document outputDoc;
    private Element body;
    private Element html;

    private HashMap authors = new HashMap();
    
    public String toString() {
        if (html != null) {
            return html.toString();
        }
        return super.toString();
    }
    
    public void processWithDOM(String urlString) throws Exception {
        System.out.println("Processing URL " + urlString);
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        Document doc = db.parse(urlString);
        Element elem = doc.getDocumentElement();
        NodeList nl = elem.getElementsByTagName("author");
        for (int n=0; n<nl.getLength(); n++) 
        {
            Element author = (Element)nl.item(n);
            String id = author.getAttribute("id");
            String fullName = author.getAttribute("fullName");          
            Element h2 = outputDoc.createElement("H2");
            body.appendChild(h2);
            h2.appendChild(outputDoc.createTextNode("by " + fullName));
            Element list = outputDoc.createElement("OL");
            body.appendChild(list);
            authors.put(id, list);
        }
        NodeList nlTips = elem.getElementsByTagName("tip");
        for (int i=0; i<nlTips.getLength(); i++) 
        {
            Element tip = (Element)nlTips.item(i);
            String title = tip.getAttribute("title");           
            String htmlURL = tip.getAttribute("htmlURL");
            String author = tip.getAttribute("author");
            Node list = (Node) authors.get(author);
            Node item = list.appendChild(outputDoc.createElement("LI"));
            Element a = outputDoc.createElement("A");
            item.appendChild(a);
            a.appendChild(outputDoc.createTextNode(title));                     
            a.setAttribute("HREF", htmlURL);            
        }
    }
    
    public void createHTMLDoc(String heading) 
                    throws ParserConfigurationException  
    {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        outputDoc = db.newDocument();
        html = outputDoc.createElement("HTML");
        outputDoc.appendChild(html);
        body = outputDoc.createElement("BODY"); 
        html.appendChild(body);
        Element h1 = outputDoc.createElement("H1");
        body.appendChild(h1);
        h1.appendChild(outputDoc.createTextNode(heading));
    }
    
    public static void main(String[] args) {
        try {
            UseDOM ud = new UseDOM();
            ud.createHTMLDoc("JDC Tech Tips Archive");
            ud.processWithDOM(args[0]);
            String htmlOut = ud.toString();
            System.out.println("Saving result to " + args[1]);
            FileWriter fw = new FileWriter(args[1]);
            fw.write(htmlOut, 0, htmlOut.length());
            fw.flush();
        }
        catch (Throwable t) {
            t.printStackTrace();
        }
    }
}

Assuming you save the XML as TechTipArchive.xml, you can run the 
code with this command line:
 
java UseDOM file:TechTipArchive.xml ListByAuthor.html

Then point your browser to ListByAuthor.html to see a list of tips 
organized by author.

To see how the code works, start by looking at the createHTMLDoc 
method. This method creates the outputDoc Document, which will be 
used to build the HTML output. Notice that just as with SAX, the 
parser is created using factory methods. However here the factory 
method is in the DocumentBuilderFactory class. The second half of 
createHTMLDoc builds the basic elements of an HTML page. 

outputDoc.appendChild(html);
body = outputDoc.createElement("BODY"); 
html.appendChild(body);
Element h1 = outputDoc.createElement("H1");
body.appendChild(h1);
h1.appendChild(outputDoc.createTextNode(heading));

Compare that code with the code in the SAX example that builds 
the elements of an HTML page: 

//direct string manipulation from SAX example
htmlOut = new StringBuffer("<HTML><BODY><H1>JDC Tech Tips Archive</H1>");

Using the DOM API to build documents isn't as terse or as fast as 
direct String manipulation, but it is much less error-prone, 
especially in larger documents.  

The important part of the useDOM example is the processWithDOM 
method. This method does two things: (1) it finds the author 
elements and provides them as output, and (2) finds the tips and 
provides them as output organized by their respective author. 
Each of these steps requires access to the top level element of 
the document. This is done via the getDocumentElement() method. 
The author information is in <author> elements. These elements 
are found by calling getElementsByTagName("author") on the 
top-level element. The getElementsByTagName method returns 
a NodeList; this is a simple collection of Nodes. Each Node is 
then cast to an Element in order to use the convenience method 
getAttribute(). The getAttribute method gets the author's id and 
fullName. Each author is listed as a second-level heading; to do 
this, the output document is used to create an <H2> element 
containing the author's fullName. Adding a Node requires 
two steps. First the output document is used to create the Node 
with a factory method such as createElement(). Then the node is
added with appendChild(). Nodes can only be added to the document 
that created them.  

After the author headings are in place, it is time to create the 
links for individual tips. The <tip> elements are found in the 
same way as the <author> elements, that is, via 
getElementsByTagName(). The logic for extracting the tip attributes 
is also similar. The only difference is deciding where to add the 
Nodes. Different authors should be added to different lists. The 
groundwork for this was laid back when the author elements were 
processed by adding an <OL> node and storing it in a HashMap 
indexed by author id. Now, the author id attribute of the tip can 
be used to look up the appropriate <OL> node for adding the tip.  

For more in-depth coverage of XML, see The XML Companion, by Neil 
Bradley, Addision-Wesley 2000. For more information about JAXP,
see the Java(tm) Technology and XML page at 
http://java.sun.com/xml/index.html. For more information about 
SAX2, see http://www.megginson.com/SAX/index.html. The DOM 
standard is available at http://www.w3.org/TR/REC-DOM-Level-1. 

.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .

- NOTE
The names on the JDC mailing list are used for internal Sun
Microsystems(tm) purposes only. To remove your name from the list,
see Subscribe/Unsubscribe below.


- FEEDBACK
Comments? Send your feedback on the JDC Tech Tips to:

jdc-webmaster@sun.com


- SUBSCRIBE/UNSUBSCRIBE
The JDC Tech Tips are sent to you because you elected to subscribe
when you registered as a JDC member. To unsubscribe from JDC email,
go to the following address and enter the email address you wish to
remove from the mailing list:

http://developer.java.sun.com/unsubscribe.html


To become a JDC member and subscribe to this newsletter go to:

http://java.sun.com/jdc/


- ARCHIVES
You'll find the JDC Tech Tips archives at:

http://developer.java.sun.com/developer/TechTips/index.html


- COPYRIGHT
Copyright 2000 Sun Microsystems, Inc. All rights reserved.
901 San Antonio Road, Palo Alto, California 94303 USA.

This document is protected by copyright. For more information, see:

http://developer.java.sun.com/developer/copyright.html


JDC Tech Tips 
June 27, 2000














