Introduction: Using XML on the Raspberry Pi With Python

About: SwitchDoc Labs, LLC is a software and hardware engineering company producing specialized products and designs for the small computer industry maker movement (Raspberry Pi, Arduinos and others). The Chief Tech…

This Instructable will discuss the use of XML on applications for the Raspberry Pi. Step one covers what is XML and the format of the data structures. Step two will cover building and parsing XML in Python and step three will show how XML is used as a communications protocol for a client / server application, RasPiConnect (www.milocreek.com). RasPiConnect is an iPad/iPhone app that connects and displays information for any number of Raspberry Pi's via a defined XML interface.

In this instructable you will learn:

  • What is XML?
  • How is XML Structured?
  • What is XML Used For?
  • How to Parse and Send XML using Python
  • See an Example Application for XML

More XML information, cool projects and blogs on www.switchdoc.com

Step 1: What Is XML? Why Is It Useful?

What is XML?

XML stands for eXtensible Markup Language. It is a language to structure, store and transport information in a hardware and software independent way. It kind of looks like HTML but it is used to transport information not to display information. HTML and XML are both examples of an SGML (Standard Generalized Markup Language).

What do you use XML for?

It is a little difficult to understand, but XML does not "Do" anything. XML is designed to transport information unlike HTML which is used to display information. You use XML to structure data (usually in a human readable format) and to send this data to other pieces of software on your own machine or across the Internet. Often user preferences or user data is also stored in XML and then written to files. If you need to send structured data, then XML is an excellent choice. It is easy to parse, easy to modify, and most importantly, easy to debug. One very useful characteristic of XML files is that they can be extended (more elements, attributes, etc.) without breaking applications. Providing, of course, those applications are well written (see Part Two of this series).

Here is a complete XML message:
 <?xml version="1.0" encoding="ISO-8859-1"?>
 <XMLCOMMAND>
     <OBJECTID>12</OBJECTID>
     <OBJECTSERVERID>BL-1</OBJECTSERVERID>
     <OBJECTTYPE>2048</OBJECTTYPE>
     <OBJECTFLAGS>0</OBJECTFLAGS>
     <RASPICONNECTSERVERVERSIONNUMBER>2.4
     </RASPICONNECTSERVERVERSIONNUMBER>
     <RESPONSE>
         <![CDATA[100.00, 0.00, CPU Load]]>
     </RESPONSE>
 </XMLCOMMAND>

Structure of an XML message

Unlike HTML, in XML you define your own tags. A well formed XML message has a "root" and then "branches" and "leaves". The first line is the XML declaration. It rarely changes. The second line describes the root element of the XML document.
<XMLCOMMAND>

XML special character secrets

Note that the end of the XML root has a closing tag:

 </XMLCOMMAND>
All elements in XML must have an opening and closing tag. This, in addition to the root is the definition of a "well-formed XML document". By the way, all tags in XML are case sensitive. A good XML coding practice is to make all of the tags uppercase. Doing this also makes the structure of the XML stand out when you read it.

Add child elements

xmlChild elements are used to provide additional data and information about the enclosing XML element (i.e. in the example above). Note that XML does not require the same set of child elements for each enclosing XML element, making upgrading or changing your elements easy. However, your parser does have to handle this situation! Child elements are XML elements underneath the root (OBJECTID, OBJECTSERVERID, OBJECTTYPE, OBJECTFLAGS, RASPICONNECTSERVERVERSIONNUMBER, RESPONSE). All of these tags must have a beginning and ending tag similar to the root. In addition, all elements can have child elements nested inside.

XML attributes

XML elements can have attributes, just like HTML. Attributes provide additional information about an element. By convention, attributes are usually given in lower case. It is good practice to use attributes in XML sparingly and in a consistent manner. You can rewrite the above XML as the following:
 <PICTURE id="1">
     <TYPE>gif</TYPE>
     <FILE>BPNSCFA.gif</FILE>
 </PICTURE>
Not having attributes makes the parsing of the XML easier in many ways.

There are two characters that are not allowed inside of an XML element. They are the "<" and "&". The ">" character is allowed, but it is also good practice to replace this character. The pre-defined entity references in XML for these characters are "<", "&" and ">".

Sending special data in XML

Sometimes you want to send general data in your XML element without replacing special characters. For example, you might want to send an HTML page inside an XML element (the RasPiConnect application does this) and you don't want to change all the characters. XML parses all text inside elements by default, but there is a way to change that: CDATA. Inside a CDATA structure, the XML parser ignores the data and it can be passed without change in an XML message. CDATA looks like this:
 <![CDATA[<XML & DOES & NOT <LIKETHIS>]]>

Validate your XML

There are many sites on the web that will validate that your XML is well formed. http://www.xmlvalidation.com is one such site. Cut and paste the XML from the first page to try it out.

Conclusion

XML is a simple, easily understood method for sending information in a hardware and software independent manner. The main advantages of XML are readability and portability between systems. It provides an easily extensible framework for information interchange. To learn more about XML try the following websites: http://www.w3schools.com/xml/http://www.quackit.com/xml/tutorial/

Step 2: Parsing XML With Python

What do we mean by parsing?

Parsing refers to the syntactic analysis of the XML input into its component parts in order to facilitate executing code based on the result of the analysis. In other words, the program is "reading" the XML to find values that it is looking for, paying attention to proper syntax and form. XML syntax includes a nested hierarchy of elements. This means that each level of the hierarchy is included as a fully enclosed subset of the previous level. In our example below, each object is fully enclosed ("nested") in the . You can extend this nesting as far down as you like. When you write parsing code this nesting usually results in for loops in Python iterating through all the objects at a level in the hierarchy.

Options for Parsing XML in Python

There are many different packages for parsing XML in Python. We will be using xml.etree.ElementTree. ElementTree is a simple to use, fast XML tree library built into Python. It is somewhat limited in features, but for straightforward XML message parsing it is hard to beat.

What do you need to know about ElementTree? Very few commands are needed to parse simple XML. These few will be illustrated below.

Python Example Code

 import xml.etree.ElementTree as ET
 incomingXML = """
 <XMLObjectXMLRequests>
  <XMLCOMMAND>
   <OBJECTSERVERID>W-1</OBJECTSERVERID>
   <OBJECTNAME>StatusWebView</OBJECTNAME>
   <OBJECTTYPE>1</OBJECTTYPE>
   <OBJECTID>7</OBJECTID>
  </XMLCOMMAND>
  <XMLCOMMAND>
   <OBJECTSERVERID>M-2</OBJECTSERVERID>
   <OBJECTNAME>Processes</OBJECTNAME>
   <OBJECTTYPE>64</OBJECTTYPE>
   <OBJECTID>0</OBJECTID>
  </XMLCOMMAND>
 </XMLObjectXMLRequests>"""
 root = ET.fromstring(incomingXML)
 print incomingXML
 # iterate through all the values
 for element in
 root.findall('XMLCOMMAND'):
     print 'XMLCOMMAND'
     print 'OBJECTNAME:',\
         element.find('OBJECTNAME').text
     print 'OBJECTTYPE:',\
         element.find('OBJECTTYPE').text
     print 'OBJECTSERVERID:',\
 element.find('OBJECTSERVERID').text
     print 'OBJECTID:',\
         element.find('OBJECTID').text

Setup the ElementTree data

After the import of the ElementTree code and writing the XML to a string (note: You could be reading this from a file or a web request), we first set up the root of the XML hierarchy. The root of this XML code is .

Iterate through the list

We know from looking at the XML file, that consists of a number of objects. We use a for loop to do this (each element inside the root is a object) using the ElementTree command findall (finding all XMLCOMMAND objects in this case).

Parse the individual items

In the interior of the for loop, we now parse the individual elements of the object. Here we use the ElementTree element command with the text attribute. Note that the elements are not in the same order! XML does not care if elements on the same level are in any particular order. Furthermore, it is not guaranteed that the first element will be the first one retrieved by ElementTree.

Expected elements can be missing from objects. In the case of missing elements in Python (using ElementTree) you absolutely must use an if statement to deal with the missing element. If you

do not then you risk causing a Python exception when operating on the returned value as ElementTree returns a None and not a valid value. If you are using strings as values, you will probably want to set your string variable to a "" (empty string) rather than allowing it to be set to a Python None. This is a very common mistake in writing ElementTree code.

 if (element.find('XXXX').text == None):
                 #do something

Uses for XML in Python programs

XML is used extensively in the software industry, ranging from HL7 messages in Healthcare, Simple Object Access Protocol (SOAP) for client-server information exchange, and even XML is used in Microsoft Word files. The key advantages of using XML are cross system use, readability, expandability and the ability to edit the XML in a text editor.

Programmers often use XML to read and write configuration files to the disk, speeding debugging and development. This makes it easier to set up test suites for programs as you can read the same XML structures from the disk as you would send across the Internet in a web request. The expandability of XML allows you to add new parameters and structures in your Python programs while maintaining backwards compatibility. Part Three of this series will show how this is done in Python.

Conclusion

XML is wordy and as a result uses a fair bit of disk space to store and memory to process. In the Raspberry Pi world and across the Internet this generally does not matter. However, in microcontrollers such as the Arduino, RAM memory space is at a premium, so a more "dense" and simple protocol such as JSON is more appropriate. If disk space is at a premium, XML will compress extremely well because of all the duplication of keywords and descriptions.

XML is easy to read, parse and debug for beginners and seasoned programmers alike.

Step 3: An XML Application in Python

The Application - RasPiConnect (www.milocreek.com)

What is XML being used for in this program?

RasPiConnect

XML is being used for three purposes in this program. 1 ) For the communications channel (over HTTP) from the App to the Raspberry Pi. 2) For the communications channel from the Raspberry Pi to the App (over HTTP) and 3) For persistent program and screen configuration storage in the App.

The Communications Channels

The Client communicates with the Server by sending and receiving HTTP. The Objective C code on the Apple iOS device for sending and receiving is beyond the scope of this article. The Python code on the Server for receiving and sending XML code is very straight forward. In the Server we use the web.py library from webpy.org. This is a light weight webserver program readily available by running the following command on your Raspberry Pi.
  sudo apt-get install python-webpy
There is a bi-directional communication channel between the Server and Client. Both directions are handled by HTTP requests. The Client sends requests for data and action with all the requests grouped together. The XML sent from the Client to the Server looks like this:
  <XMLObjectXMLRequests>
   <XMLCOMMAND>
    <OBJECTSERVERID>LT-1</OBJECTSERVERID>
    <OBJECTNAME>CPU Text and Label
    </OBJECTNAME>
    <OBJECTTYPE>1</OBJECTTYPE>
     ...
   </XMLCOMMAND>
    ...
   </XMLCOMMAND>
  </XMLObjectXMLRequests>

This XML contains multiple requests to the Server for retrieving information and sending action requests to the Server. Note the multiple entities in the structure.

The returning XML from the Client looks very similar.
  <XMLRESPONSES>
   <XMLCOMMAND>
    ...
   </XMLCOMMAND>
   <XMLCOMMAND>
    <OBJECTSERVERID>LT-1</OBJECTSERVERID>
    <RESPONSE>
     <![CDATA[43.31, 43.31, CPU Temp (deg
 C)]]>
    ...
   </XMLCOMMAND>
  </XMLRESPONSES>

Parsing the XML

Parsing this XML into the individual entities ( above), is a simple use of the ElementTree Python library as shown previously in Part Two of this article. Once the requests have been parsed and validated, the server executes the requests one at a time, while building a new XML structure containing the responses to the commands. The structure is then sent to the Client using one HTTP connection rather than multiple connections.

 ...
 class RasPi:
     def POST(self):
         web.header('Content-Type',
 'text/html')
         incomingXML = web.data()
         root = ET.fromstring(incomingXML)
         # iterate through all the values
         for element in
 root.findall('XMLCOMMAND'):
 ...

Building XML to send to the Client

Building the XML to be sent back via the HTTP request from the Client to the Server is done by constructing a string of concatenated XML commands and then returning the string as part of the web.py POST HTTP request.
   # start of building the XML responses
   outgoingData="<XMLRESPONSES>"
     ...
     outgoingData +="<XMLCOMMAND>"
      ...
      outgoingData +="</OBJECTTYPE>"
      outgoingData +="<OBJECTID>"
      outgoingData += "%i" % objectID
      outgoingData +="</OBJECTID>"
      # done with FOR loop
   outgoingData+="</XMLRESPONSES>"
   return outgoingData

Receiving and sending the XML

In web.py, the incoming XML is placed in a string as above and then parsed. The responses are sent back to the client from the POST function by returning a string.

Conclusion

XML is a very useful means for storing and transmitting data across disparate computer systems. It is usable by large and small computers alike. To learn more about using XML on a Python based platform try the following websites:

http://docs.python.org/2/library/xml.etree.elementtree.html

http://eli.thegreenplace.net/201 2/03/1 5/processing-xml-in- python-with-elementtree

and a tutorial video on elementtree: http://www.youtube.com/watch?v=LNYoFo1 sdwg

More XML information, cool projects and blogs on www.switchdoc.com