You are here: Home > Dive Into Python > XML Processing > Searching for elements | << >> | ||||
Dive Into PythonPython from novice to pro |
Traversing XML documents by stepping through each node can be tedious. If you're looking for something in particular, buried deep within your XML document, there is a shortcut you can use to find it quickly: getElementsByTagName.
For this section, you'll be using the binary.xml grammar file, which looks like this:
Example 9.20. binary.xml
<?xml version="1.0"?>
<!DOCTYPE grammar PUBLIC "-//diveintopython.org//DTD Kant Generator Pro v1.0//EN" "kgp.dtd">
<grammar>
<ref id="bit">
<p>0</p>
<p>1</p>
</ref>
<ref id="byte">
<p><xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/>\
<xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/></p>
</ref>
</grammar>
It has two refs, 'bit' and 'byte'. A bit is either a '0' or '1', and a byte is 8 bits.
Example 9.21. Introducing getElementsByTagName
>>> from xml.dom import minidom >>> xmldoc = minidom.parse('binary.xml') >>> reflist = xmldoc.getElementsByTagName('ref') >>> reflist [<DOM Element: ref at 136138108>, <DOM Element: ref at 136144292>] >>> print reflist[0].toxml() <ref id="bit"> <p>0</p> <p>1</p> </ref> >>> print reflist[1].toxml() <ref id="byte"> <p><xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/>\ <xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/></p> </ref>
Example 9.22. Every element is searchable
>>> firstref = reflist[0] >>> print firstref.toxml() <ref id="bit"> <p>0</p> <p>1</p> </ref> >>> plist = firstref.getElementsByTagName("p") >>> plist [<DOM Element: p at 136140116>, <DOM Element: p at 136142172>] >>> print plist[0].toxml() <p>0</p> >>> print plist[1].toxml() <p>1</p>
Example 9.23. Searching is actually recursive
>>> plist = xmldoc.getElementsByTagName("p") >>> plist [<DOM Element: p at 136140116>, <DOM Element: p at 136142172>, <DOM Element: p at 136146124>] >>> plist[0].toxml() '<p>0</p>' >>> plist[1].toxml() '<p>1</p>' >>> plist[2].toxml() '<p><xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/>\ <xref id="bit"/><xref id="bit"/><xref id="bit"/><xref id="bit"/></p>'
<< Unicode |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
Accessing element attributes >> |