How to Parse an XML Document in Python

In this article, we show how to parse an XML document in Python.

By parsing, we mean that we can extract data from an XML in Python, taking data that we want from the document.

Python has modules that allow for easy parsing of XML documents.

We can use the xml.etree.ElementTree module to extract data from simple XML documents.

So, below, I created an XML document that has the contact information for various people.

The XML document we will parse is the following document: contacts.xml

In this contacts.xml document, we have various information about a person, including the person's name, email, phone number, city, and state.

We will use Python to extract data from this XML document and display this data.

The code to extract data from all fields of the person tag is shown below.

from urllib.request import urlopen from xml.etree.ElementTree import parse opendoc= urlopen('http://www.learningaboutelectronics.com/Articles/contacts.xml') doc= parse(opendoc) for person in doc.iterfind('person'): name= person.findtext('name') email= person.findtext('email') phonenumber= person.findtext('phonenumber') city= person.findtext('city') state= person.findtext('state') print(name) print(email) print(phonenumber) print(city) print(state) print()

So, let's now go over all of this code.

So, the first thing we must do is import urlopen from urllib.request module.

We then must import parse from the xml.etree.ElementTree module.

We then create a variable called opendoc which we assign to urlopen(). Inside of this urlopen() function, we put the XML document that we want to parse (read). Since we want to parse the XML document, contacts.xml, from this site located in the Articles directory, the url is, http://www.learningaboutelectronics.com/Articles/contacts.xml

We then must pass this url through the parse() function and place this in the doc variable.

We want to extract data from the person tags in the XML document.

Therefore, we create a for loop in order to loop through all the person tags in the contacts.xml document.

The for loop will be, for person in doc.iterfind('person'). This locates all the person tags in the XML document.

We then create variables for all of the data in the person tag, which include the name, email, phonenumber, city, and state variables.

We create the variable, name, and set it equal to person.findtext('name'). This finds the value in between the name tags in the person tag.

We then do the same thing for email, phonenumber, city and state.

After we have the values for all of these fields in the assigned variables, we print out all of these variables, which represents each person's data.

At the end of these print values, we add the line, print(), to add a blank space in between each person's data, just as a form of a division.

After running all of this code, we get the following output, which is shown below.

John Smith Johnsmith@gmail.com (111)111-1111 Miami Florida Peter Phils Peterphils@gmail.com (222)222-2222 New York City New York Michael Thoms Michaelthoms@gmail.com (333)333-3333 Daytona Beach Florida

So, you can see how easy it is to parse an XML document in Python.

XML documents, again, are important for documents that are written in XML commonly, such as APIs, RSS feeds, etc.

You could customize the above code so that it parses only certain elements, for example, if you're just interested in the name and email of each person. In this case, you would just include the name and email variables.

But this is just a quick guide how to parse XML documents in Python.

Related Resources

How to Parse JSON Data in Python

How to Write JSON Data in Python

HTML Comment Box is loading comments...

Learning about Electronics

How to Parse an XML Document in Python