TclDOM

TclDOM

Contents

Name

::dom::DOMImplementation, ::dom::create, ::dom::parse, ::dom::serialize, ::dom::document, ::dom::node, ::dom::element, ::dom::event, ::dom::selectNode — Tcl language binding for the W3C Document Object Model

Synopsis

package require dom

package require dom ?2.5?
::dom::DOMImplementation method ?args...?
::dom::create element
::dom::parse xml ?option value...?
::dom::serialize token ?option value...?
::dom::document method token ?args...?
::dom::node method token ?args...?
::dom::element method token ?args...?
::dom::event method token ?args...?
::dom::selectNode token xpath ?option value...?

Tcl Namespace Usage


::dom
::dom::tcl
::dom::libxml2

Description

TclDOM is a Tcl language binding for the W3C Document Object Model (DOM). DOM provides a view of a XML (or HTML) document as a tree structure. Currently, TclDOM only supports XML documents.

The package implements most of the DOM Level 1 interfaces and also some Level 2 and Level 3 interfaces. There are also a number of non-standard commands and methods provided for the convenience of application developers (these are documented).

The DOM specification should be read in conjunction with this reference manual, as it explains the meaning and purpose of the various interfaces. This manual is not a tutorial on how to use the DOM.

TclDOM also provides several implementations of the API, with a layered architecture. A generic layer provides a stable API to the application, and specific implementations may register themselves. Currently, three implementations exists: a pure-Tcl implementation, a C implementation (based on TclDOMPro) and another C implementation based on the Gnome libxml2 & gdome2 libraries.

Packages and Namespaces

The TclDOM generic layer defines the dom package and also a Tcl namespace using that name. The generic layer also uses the package name dom::generic.

Implementations define their own package name and Tcl namespace within the generic layer:

Tcl implementation

Package dom::tcl, Tcl namespace ::dom::tcl.

TclDOMPro

Package dom::c, Tcl namespace ::dom::c.

libxml2

Package dom::libxml2, Tcl namespace ::dom::libxml2.

Tokens

The TclDOM API uses tokens as identifiers for nodes within the document tree. This technique has been used to allow alternate implementations of TclDOM to be efficient, while retaining compatibility with the pure-Tcl implementation.

The format of the token itself as well as the data structure referred to by the token are not public and an application should not rely on these. Instead, an application should use the accessor methods provided by the API.

DOM Interfaces

Each Interface in the DOM specification is implemented with a Tcl command in the dom namespace. A few interfaces have not been mapped to Tcl commands because Tcl already provides the required functionality, for example the CharacterData interface.

methods for interfaces are methods (subcommands) of the corresponding Tcl command.

Each attribute of an interface is a configuration option for an object in the document tree.

Convenience Commands and Methods

DOM doesn't always provide an interface, method or attribute for every function required. For example, until DOM Level 3 for was no standard for creating, parsing and serializing a document. Sometimes using the standard DOM interface is awkward. TclDOM provides a number of non-standard features to overcome these problems.

A major convenience is that each method of the DOMImplementation interface is also defined as a command. For example, rather than using dom::DOMImplementation create to create a new document, the shorter command dom::create may be used.

Implementations may also provide direct access to specific features. Refer to the documentation for a DOM implementation.

Commands

::dom::DOMImplementation

The ::dom::DOMImplementation command implements the DOMImplementation DOM interface. It is used to provide implementation-specific features not explicitly defined in the DOM specification.

Command Options

The following command options may be used. These are also available as commands.

hasFeature
hasFeature feature

Provides a test for existence of a feature. Returns 1 if a feature is implemented, 0 otherwise. Uses the default DOM implementation.

create
create type

Creates the root node of a new DOM document, using the default DOM implementation. The document element type may be specified as an argument, in which case that element is created. The return value is a token referring to the root node of the newly created document.

Note

Non-standard method. DOM Level 2 introduced the createDocument method.

createDocument
createDocument nsURI type doctype

Creates the root node of a new DOM document, using the default DOM implementation. The document element namespace URI and local-name (element type) may be specified as an argument, in which case that element is created. If the document type is given then the newly created document is configured to use that document type. The return value is a token referring to the root node of the newly created document.

createDocumentType
createDocumentType token name publicid systemid internaldtd

Creates a Document Type Declaration, using the default DOM implementation. The return value is a token for the newly created document type declaration.

createNode
createNode token xpath

May create a node in the document. token specifies a context for the XPath expression given by xpath. The expression must resolve to a node. If the node exists then no further action is taken. Otherwise the node is created. The token of the matched or newly created node is returned.

Note

Non-standard method.

destroy
destroy token

This method frees all data structures associated with a DOM node. The token argument must refer to a valid token for any node in the document tree. The node is removed from the tree before it is destroyed. If the node has children, they will also be destroyed.

isNode
isNode token

Tests whether the given token is a valid token for some DOM node in the default DOM implementation.

Note

Non-standard method.

parse
parse xml ?option value?

This method parses XML formatted text given by the xml argument and constructs a DOM tree for the document. The return result is the token of the root node of the newly created document.

This method requires an event-based XML parser to be loaded to perform the parsing operation. The dom package itself does not include an XML parser. Support for the use of TclXML is provided. Any other Tcl event-based XML parser implementing the TclXML API may also be used. The -parser may be used to specify which XML parser to use.

In some circumstances, a DOM implementation may parse the XML document directly, for example libxml2. In this case, it may not be possible to interpose on the parsing operation.

Valid configuration options are:

[-parser] [[{} | expat | tcl]]

This option specifies which XML parser to use to parse the XML data. If an empty string is given then the default behaviour described above is used. The value expat specifies that the Expat parser must be used. The value tcl specifies that the Tcl-only parser must be used. If an explicit value is given and that parser cannot be loaded then the command will fail, despite the fact that another parser may be available.

[-progresscommand] [script]

This option specifies a Tcl command to be invoked from time to time while the DOM tree is being constructed. The script will be invoked after a certain number of element start tags have been processed, given by the -chunksize option.

[-chunksize] [integer]

This option specifies how many element start tags to process before invoking the script given by the -progresscommand option.

selectNode
selectNode token xpath ?option value...?

Resolves the XPath location path given by xpath. token is the initial context for the location path. Returns the resulting nodeset as a Tcl list.

The following options may be specified:

-namespaces

The value for this option is a list of prefix-URI pairs. Each of these pairs defines an XML Namespace and its prefix for the purposes of evaluating the XPath expression. The document itself may use a different prefix for the same XML Namespace.

This option may be repeated, in which case the lists of namespace pairs are merged and all of the XML Namespaces registered.

Note

Non-standard method.

serialize
serialize token ?option value?

This method returns the XML formatted text corresponding to the node given by token. The text is guaranteed to be a well-formed XML document, unless the -method option specifies a non-XML output method.

Valid configuration options are:

[-newline] [elementlist]

This option specifies a list of element types for which newline characters will be added before and after the start and end tags for those elements upon serialization.

White space is significant in XML, so the dom package never adds extra white spacefor purposes of "pretty-printing" the XML source document. On some platforms, such as VMS, this can actually cause serious problems due to line length limitations. This option provides the convenience of adding newlines to certain nominated element types for formatting the source into lines.

Examples:

Suppose the following DOM document is constructed:

set doc [::dom::DOMImplementation create]
set top [::dom::document createElement $doc Root]
set node [::dom::document createElement $top First]
::dom::document createElement $node Second
::dom::document createElement $top First

Without the -newline option the serialized document would be:

::dom::DOMImplementation serialize $doc
<?xml version="1.0"?>
<!DOCTYPE Root>
<Root><First><Second/></First><First/></Root>


With the -newline option the serialized document would be:

::dom::DOMImplementation serialize $doc -newline First
<?xml version="1.0"?>
<!DOCTYPE Root>
<Root>
<First>
<Second/>
</First>
<First/>
</Root>


trim
trim token

This method removes any node containing only white space from the document tree of the node given by token.

::dom::document

This command implements the Document interface in the DOM specification. The most important aspect of this command are its factory methods for creating nodes.

The methods accepted by this command are as follows:

cget
cget token -option

This method returns the value of the given configuration option.

configure
configure token ?option value...?

This method sets the value of the given configuration options.

Valid configuration options are:

[-doctype]

Specifies the token of the Document Type Declaration node.

This is a read-only option. Use the factory method to create a Document Type Declaration node.

[-implementation]

Specifies the token of the document's implementation.

This is a read-only option.

[-documentElement]

Specifies the token of the document's document element node. A document node may only have one document element, but may have other types of children (such as comments).

This is a read-only option. Use the factory method to create the document element node.

createElement
createElement token type

This method creates an element node as a child of the given node specified by token. token must be a node of type element, document or documentFragment. The new, child element is added as the last child of token's list of children. The new element's type is given by the type argument. The new element is created with an empty attribute list.

createElementNS
createElementNS token nsuri qualname

This method creates an element node in an XML Namespace as a child of the given node specified by token. token must be a node of type element, document or documentFragment. The new, child element is added as the last child of token's list of children. The new element is created in the XML Namespace given by the namespace URI nsuri. The new element's qualifed name (QName) is given by the qualname argument. Qualified names have the form prefix:local-part. The new element is created with an empty attribute list.

createDocumentFragment
createDocumentFragment token

This method creates a documentFragment node as a child of the given node specified by token. token must be a node of type element, document or documentFragment.

createTextNode
createTextNode token text

This method creates a textNode node as a child of the given node specified by token. token must be a node of type element, document or documentFragment. The new, child textNode is added as the last child of token's list of children. The new textNode is created with its value set to text.

createComment
createComment token data

This method creates a comment node as a child of the given node specified by token. token must be a node of type element, document or documentFragment. The new, child comment is added as the last child of token's list of children. The new comment is created with its value set to data.

createCDATASection
createCDATASection token text

TclDOM does not distinguish between textNodes and CDATASection nodes. Accordingly, this method creates a textNode node as a child of the given node specified by token. token must be a node of type element, document or documentFragment. The new, child textNode is added as the last child of token's list of children. The new node is created with its value set to text and has the attribute -cdatasection set to the value 1.

createProcessingInstruction
createProcessingInstruction token target data

This method creates a processingInstruction node as a child of the given node specified by token. token must be a node of type element, document or documentFragment. The new, child processingInstruction is added as the last child of token's list of children. The new node is created with its name set to target and its value set to data.

createAttribute
createAttribute token name

This method creates an attribute node for the given element specified by token. token must be a node of type element. The new attribute is created with its name set to name and an empty value.

Note

This method is included for completeness with respect to the DOM specification. The preferred method for setting element attributes is to use the ::dom::element command.

createEntity
createEntity token

Not currently implemented.

createEntityReference
createEntityReference token name

Not currently implemented.

createDocTypeDecl
createDocTypeDecl token name extid dtd entities notations

This method creates a Document Type Declaration node as a child of the given node specified by token. token must be a node of type document. name is the element type of the document element. If the document already has a document element then this name must match with that element type. extid is an external identifier to include in the document type declaration. dtd is an internal DTD subset to include in the document type declaration. This is specified as XML text. entities and notations are included for completeness with the DOM specification, but are not currently implemented.

Non-standard: This method is not a standard method as specified by the DOM Recommendation, see ::dom::DOMImplementation createDocumentType.

createEvent
createEvent token name

This method creates an event node in the document specified by token. token must be a node of type document. The event type is specified by name.

getElementsByTagName
getElementsByTagName token name

This method searches the node given by the argument token for child elements with a type matching the argument name. The name * matches all elements. All descendants of token are searched. This method returns a "live-list"; the return result of this method is the name of a Tcl variable, the content of which is a Tcl list containing tokens for all elements that match.

dom::node

This command implements generic functions for DOM nodes.

The methods accepted by this command are as follows:

cget
cget token option

This method returns the value of the given configuration option for the node given by token.

configure
configure token ?option value...?

This method sets the value of the given configuration option for the node given by token.

Valid configuration options are as follows:

[-nodeName]

Returns the node name. This is a read-only option.

The DOM specification gives the meaning of names for different types of nodes. For example, the -nodeName option of an element node is the element type.

[-nodeType]

Returns the type of the node given by token. This is a read-only option.

-parentNode

Returns the parent node of the node given by token. This is a read-only option.

-childNodes

Returns the name of a Tcl variable which contains a list of the children of the node given by token. The variable contains the "live" list of children. This is a read-only option.

-firstChild

Returns the first child node of the node given by token. This is a read-only option.

-lastChild

Returns the last child node of the node given by token. This is a read-only option.

-previousSibling

Returns the parent's child node which appears before this node. If this child is the first child of its parent then returns an empty string. This is a read-only option.

-nextSibling

Returns the parent's child node which appears after this node. If this child is the last child of its parent then returns an empty string. This is a read-only option.

-attributes

Returns the name of a Tcl array variable which contains the attribute list for an element node. If the node is not an element type node then returns an empty string. The indices of the array are attribute names, and the values of the array elements are their corresponding attribute values. This is a read-only option.

-nodeValue [data]

Specifies the value of a node. The DOM specification gives the meaning of values for different types of nodes. For example, the -nodeValue option of a textNode node is the node's text.

dom::element

This command provides functions for element type nodes.

Valid methods for this command are as follows:

cget
cget token option

This method returns the current setting of configuration options for an element. See the configure method for the list of valid configuration options.

configure
configure token ?option value...?

This method sets configuration options for an element. Note that element type nodes only have read-only options.

Valid configuration options are as follows:

-tagName [name]
The tag name, or element type, of this element.
-empty [boolean]

Sets whether this element was specified as an empty element when the document was parsed. That is, XML empty element syntax such as <Empty/> was used.

This option has no effect upon output (serialization) of the XML document. Empty element syntax is automatically used where appropriate.

getAttribute
getAttribute token name

This method returns the attribute value of the attribute given by name. If the attribute does not exist, then an empty string is returned.

setAttribute
setAttribute token name value

This method sets the attribute value of the attribute given by name. If the attribute already exists then its value is replaced, otherwise the attribute is created.

removeAttribute
removeAttribute token name

This method deletes the attribute given by name. If the attribute does not exist then the method has no effect.

getAttributeNode
getAttributeNode token name

Not implemented.

setAttributeNode
setAttributeNode token name

Not implemented.

removeAttributeNode
removeAttributeNode token name

Not implemented.

getElementsByTagName
getElementsByTagName token name

This method searches the node given by the argument token for descendant child elements with a type matching the argument name. The wildcard character * matches any element type. The return result is a "live-list" which is represented by a Tcl variable. This method returns the name of the Tcl variable that contains the list of tokens that match.

normalize
normalize token

This method recursively coalesces textNodes within the children of the given node. textNodes which are adjacent in the DOM tree cannot be distinguished in the serialized XML document.

dom::processinginstruction

This command provides functions for processingInstruction type nodes.

Valid methods for this command are as follows:

cget
cget token option

This method returns the current setting of configuration options for an element. See the configure method for the list of valid configuration options.

configure
configure token ?option value...?

This method sets configuration options for a processing instruction.

Valid configuration options are as follows:

-target [name]
This option sets the target of the processing instruction. This is a read-only configuration option.
-data [data]

This option sets the data of the processing instruction.

dom::event

This command provides functions for event type nodes.

Valid methods for this command are as follows:

cget
cget token

This method retrieves configuration options for an event.

Valid configuration options are as follows:

-altKey
This option determines whether the ALT modifier key has been specified for this event.
-timeStamp

This option gives the time at which the event was posted. The value is the number of milliseconds since the epoch, which is compatible with the Tcl clock command.

Note

The implementation of this method depends on the Tcl_GetTime function.This function only became publically available in Tcl 8.4. If a version of Tcl prior to 8.4 is being used, then this option will have the value 0.
configure
configure token ?option value...?

This method sets the configuration options for an event. See the cget method for the list of valid configuration options.

Implementations

This section documents the various implmentations of the TclDOM API.

Tcl Implementation

The Tcl implementation is provided by the dom::tcl package.

It is a reference implementation, and implements the TclDOM API as described above.

A DOM tree using this implementation may be created using the dom::tcl::create command.

libxml2 Implementation

The TclDOM/libxml2 implementation is a wrapper for the Gnome libxml2 library. It is provided by the dom::libxml2 package. It is a high-performance library, making use of Tcl objects for fast access to tree nodes.

A DOM tree using this implementation may be created using the dom::libxml2::create command.

Notes

  • The dom::libxml2::parse command is linked directly to the libxml2 parser. It is not possible to specify TclXML configuration options (ie. callbacks). The following configuration options are valid:

    -- fill this in --

    -- fill this in --

  • The implementation of most of the commands are incomplete. Missing methods and options will be documented here...