Minnesota  State Archives

Preserving State Government Digital Information:
XML Basics

General Information

XML stands for Extensible Markup Language.  This language was developed as a machine readable format that allows structured information to be transmitted over the Internet with a level of simplicity.  XML is further defined through the World Wide Web Consortium’s (W3C) recommendation (5th edition, November 2008)

XML is a way to organize data that employs user-generated tags to describe data.  For example, if you want to create a list of albums in a collection, you may put them in a spreadsheet with columns representing the title, artist, date of release, etc., with each row containing the data to fill in the fields.  If you were to use XML to descirbe the same information, the column headers become XML 'tags' (<artist>) and the information in the rows would remain as the data. The inherent structure of XML makes it easy to work with. 

  • <CD>
    • <artist>John Mellencamp</artist>
    • <title>The Best That I Could Do</title>
  • </CD>

XML is used for data storage and transmission, assists with data consistency, and can be manipulated with other programming languages.  XQuery is used to extract XML data, XSLT and XPath perform data transformations on XML data while schemas help to define sets of tags that can be used for consistency. 

In addition, XML lets you separate content from how information is displayed.  Most web pages are written in HTML (a specific mark up language, similar to XML); the pages themsleves are then displayed in the desired arrangement by using style sheets.

In the XML example above, style sheets (CSS or XSLT) could be used totransform the information about the CD to be viewed not with the XML tags but as a two-column list on a page or as a picture of the album cover with the name of the artist and title below the image.

Separating content information from display information allows data to be used in multiple ways and in multiple instances, increasing flexibility of use. 

 

Who Uses XML

The development and use of XML has grown over the past decade. Schemas are used to define XML language sets for use by multiple groups. Application Programming Interfaces (APIs) allow developers to process XML data. RSS, ATOM, SOAP, and XHTML are all XML based languages.  XML is currently a default format incorporated into office software suites such as Microsoft Office, OpenOffice, and iWork. 

Based on a recommendation by the National Conference of State Legislatures (NCSL), many states have adopted XML-based bill drafting systems.  A survey of XML usage by state agencies was completed in early 2009. 

Minnesota converted to an XML-based bill drafting system in 2005, and has found that the format allows for greater flexibility.  One advantage is the increased number of methods for putting records online. In 2009, these records allowed Minnesota to be part of a pilot project to test an XML native database as a method of access. 

In addition, the NDIIPP pilot project created an XML schema for legislative data that Minnesota and California both tested it with state specific bill data.

 

Project Resources

Outside Resources

 

 

NDIIPP Project Homepage

December 22, 2011; links updated December 22, 2011