Minnesota  State Archives

Electronic records management guidelines

Metadata

Summary

Metadata, usually defined as "data about data," is used to describe an object (digital or otherwise), its relationships with other objects, and how the object has been and should be treated over time. Metadata allows users to locate and evaluate data without each person having to discover it anew with every use. A structured format and a controlled vocabulary, which together allow for a precise and comprehensible description of content, location, and value, are its basic elements.

Anyone who has suffered the exercise in irrelevance offered by an Internet search engine will appreciate the value of precise metadata. Because information in a digital format is only legible through the use of intermediary hardware and software, the role of metadata in information technology is fundamentally important.

Whatever you want to do with the information contained within a record (e.g., protect its confidentiality, present it as evidence, provide citizens access to it, broadcast it, share it, preserve it, destroy it) will be feasible only if you and your users can understand and utilize the metadata associated with it. To use metadata effectively you must understand and apply standards that are appropriate to your needs.

 

Key Concepts

To understand, create, and use metadata effectively, you will need to know more about:

 

Legal Needs and Statutory Mandates


As part of a government entity, you need to pay particular attention to metadata in order to help you meet basic legal needs and statutory mandates.

For example, Minnesota’s Records Management Act mandates that government agencies cannot dispose of records without the approval of the state’s Records Disposition Panel. Before approval is granted the Records Disposition Panel must understand what the records are, what their significance is, and the proposed method of disposal.  In the records management process, metadata is usually structured as a records retention schedule or an Application for Authority to Dispose of Records form (PR-1).

Similarly, the Minnesota Government Data Practices Act classifies data under nine different categories which specify how, when, or if the public may gain access to government data. You cannot guess what level of access or security to provide just by looking at the data itself. You need some additional information – some metadata – in order to follow the law. 

The Official Records Act, the Records Management Act, and the Minnesota Government Data Practices Act (MGDPA) are some of the laws most pertinent to the use of metadata.  These laws mandate that government agencies must create and keep records in order to be accountable for their actions and decisions and that these records should be accessible to the public unless categorized as not-public by the state legislature.  These laws also help establish the records management process for government records.

The metadata requirements of all of these statutes are encompassed in the state’s Recordkeeping Metadata Standard, discussed below.  For more information on the specifics of these laws and the legal framework you must consider when dealing with government records refer to the Legal Framework chapter of these guidelines and the Minnesota State Archives’ Preserving and Disposing of Government Records.

 

Metadata Categories and Functions

Metadata is generally categorized into four or five groupings based on the information the metadata captures, as described below:   

  • Descriptive Metadata: Metadata that describes the intellectual content of a resource and used for the indexing, discovery and identification of a digital resource.
  • Administrative Metadata: Metadata that includes management information about the digital resource, such as ownership and rights management.
  • Structural Metadata: Metadata that is used to display and navigate digital resources and describes relationships between multiple digital files, such as page order in a digitized book.
  • Technical Metadata: Metadata that describes the features of the digital file, such as resolution, pixel dimension and hardware.  The information is critical for migration and long-term sustainability of the digital resource.
  • Preservation Metadata: Metadata that specifically captures information that helps facilitate management and access to digital files over time.  This inherently includes descriptive, administrative, structural, and technical metadata elements that focus on the provenance, authenticity, preservation activity, technical environment, and rights management of an object.

Recording these various types of metadata may support a variety of functions for government agencies, but the primary uses are for:

  • Legal and statutory reasons (e.g., to satisfy records management laws and the rules of evidence)
  • Technological reasons (e.g., to design and document systems)
  • Operational or administrative reasons (e.g., to document decisions and establish accountability)
  • Service to citizens, agency staff, and others (e.g., to locate and share information)

In all of these cases, utilizing metadata will be most effective if the metadata uses a structured format with a controlled vocabulary when appropriate. “Structured format” means the metadata is defined in terms of specific, standardized elements or fields, based on document type.  For example, metadata elements for a library catalog entry for a book include author, title, subject(s), and location, among other things. Unless all the elements are there, users will not be able to evaluate the metadata; they won’t be able to answer the question “Is this the book I want?”  This structure could be created in-house by institutional practices or more formal by following a specific metadata standard. 

“Controlled vocabulary” means that there is an approved of or standard set of terms that can be used as content for each metadata element.   Using a controlled vocabulary ensures consistency across a collection and allows items to be found easier and to be compared easier.  For example, using a controlled vocabulary in the metadata element ‘subject’ in a library catalog entry may restrict entries to the Library of Congress’s list of Subject Headings rather than allowing any keyword being entered in that field.  Controlled vocabulary assists with record comparison across various collections or objects. 

 

Metadata Standards

To work effectively, metadata has to be precise and comprehensible. The entire community of creators and users has to understand what it means. There are a variety of metadata standards in use around the world, with three principal standards in general use in Minnesota government today. Minnesota's Office of Enterprise Technology recommends the following standards in its enterprise architecture:

 

Minnesota Record Keeping Metadata Standard

The Minnesota Recordkeeping Metadata Standard is designed to support the accountability of government and the proper use of government records as mandated by law. It is based on Dublin Core, but includes additional metadata elements that help support legal mandates over time.  [Dublin Core is an official international standard (NISO Standard Z39.85; ISO Standard 15836) with fifteen metadata elements.]

The standard consists of twenty elements, ten of which are mandatory and ten optional. In addition, many of these elements contain a number of sub-elements, some mandatory and some optional. To ensure compatibility across metadata sets, six of the ten mandatory elements have direct counterparts both in the Dublin Core and the geographic metadata standards. Overall, the recordkeeping metadata elements are: 

  • Agent. An agency or organizational unit that is responsible for some action on or usage of a record, or an individual who performs some action on a record, or who uses a record in some way.  (mandatory)
  • Rights Management. Legislation, policies, and caveats that govern or restrict access to or use of records.  (mandatory)
  • Title. The names given to the record.  (mandatory)
  • Subject. The subject matter or topic of a record. (mandatory)
  • Description. An account, in free text prose, of the content and/or purpose of the record.  (optional)
  • Language. The language of the content of the record.  (optional)
  • Relation. A link between one record item and another, between various aggregations of records, or a link between a record and another information resource.  (optional)
  • Coverage. The jurisdictional, spatial, and/or temporal characteristics of the content of the record.  (optional)
  • Function. The general or agency-specific business function(s) and activities that are documented by the record.  (optional)
  • Date. The dates and times at which such fundamental recordkeeping actions as of the record’s or records series’ creation and transaction occur.  (mandatory)
  • Type. The recognized form or genre a record takes, which governs its internal structure.  (optional)
  • Aggregation Level. The level at which the record(s) is/are being described and controlled or the level of aggregation of the unit of description.  (mandatory)
  • Format. The logical form (content medium and data format) and physical form (storage medium and extent) of the record.  (optional)
  • Record Identifier. A unique code for the record.  (mandatory)
  • Management History. The dates and descriptions of all records management actions performed on a record from its registration into a recordkeeping system until its disposal.  (mandatory)
  • Use History. The dates and descriptions of both legal and illegal attempts to access and use a record, from the time of its registration into a recordkeeping system until its disposal. (optional)
  • Preservation History. The dates and descriptions of all actions performed on a record after its registration into a recordkeeping system which ensures that the record remains readable and accessible for as long as it has value to the agency and to the community at large.  (optional)
  • Location. The current (physical or system) location of the record or details about where the record usually resides.   (mandatory)
  • Disposal. Information about policies and conditions that pertain to or control the authorized disposal of records or information about the current retention schedule and disposal actions to which the record is subject.  (mandatory)
  • Mandate. A source of recordkeeping requirements. For example, a piece of legislation, formal directive, policy, standard, guideline, set of procedures, or community expectation which (explicitly or implicitly) imposes a requirement to create, keep, dispose of, or control access to and use of a record.  (optional)

 

Minnesota Web Metadata Standard

The Web Metadata Standard also uses the Dublin Core metadata standard.  For example, when you use the search engine on Minnesota’s North Star site, it is the standard Dublin Core metadata elements used in the Minnesota Web Metadata Standard that helps you find exactly what you’re looking for by organizing the contents for easy access and retrieval.

The Web Metadata Standard set includes these elements:

  • Title. The name of the resource given by the creator or publisher.
  • Creator. The name of the person who created the resource.
  • Subject. The topic of the resource.
  • Description. A short, text description of the resource’s contents.
  • Publisher. The name of the entity that published the resource. Note that the publisher is not the person who posted the resource to the web site, but the entity responsible for the publication of the resource, such as your agency.
  • Contributor. Someone aside from the creator who made a significant contribution to the resource.
  • Date. Either the creation date or the publication date. Your agency will need to determine which date to use.
  • Resource Type. The category the resource belongs to, such as committee minutes, press release, or report.
  • Format. The file format of the resource. For more information on file formats, refer to the File Formats guidelines.
  • Identifier. A text string or number unique to the resource, such as a URL or other formal name. See the File Naming chapter in these guidelines for more information on naming web site files for longevity and ease of use.
  • Source. Information about the source from which the current resource is derived (e.g., an abstract of a report).
  • Language. The language used in the resource (e.g., English, Spanish).
  • Relation. An element that refers to related resources.
  • Coverage. Either geographic (e.g., Minnesota) or temporal (e.g., the years 2000–2001).
  • Rights Management. A text statement regarding copyright and use permission.

 

Minnesota Geographic Metadata Guidelines

The Minnesota Geographic Metadata Guidelines (MGMG) provide a common approach for documenting all types of geographic data. They have been designed to be straightforward, intuitive, and complete. The guidelines are based on a standard developed by the Federal Geographic Data Committee in 1993: The Content Standards for Digital Geospatial Metadata. In developing the MGMG, the Standards Committee of the Minnesota Governor’s Council on Geographic Information created a streamlined implementation of the federal standard, while retaining the essence of its original content. Information about the guidelines is available on their website

The Minnesota Geographic Metadata Guidelines includes a number of metadata elements, arranged in seven sections:

  • Identification Information
  • Data Quality Information
  • Spatial Data Organization Information
  • Spatial Reference Information
  • Entity and Attribute Information
  • Distribution Information
  • Metadata Reference Information

 

Metadata and Information Technology

Metadata is useful for the management of information in any storage format, paper or digital, but it is critically important for information in a digital format because the information is only discoverable through the use of intermediary hardware and software. We can open up a book or hold microfilm up to a light to determine what it says; but we can’t just look at a CD and say what’s on it. We cannot possibly hope to locate, evaluate, or use all the files on a single computer or network, let alone the Internet, without metadata.

Databases often store and provide access to metadata separately from the digital files.  Metadata can also be stored with or embedded in a digital file.  Most software applications automatically create metadata and associate it with files, generally making the standardization of metadata simpler.  One example of automatic and standardized metadata is the header and routing information that accompany an e-mail message. Another is the set of properties created with every Microsoft Word document; certain elements such as the title, author, file size, etc., are automatically created, but other elements can be customized and created manually.  However, the more manually entered metadata, the harder it becomes to enforce standards.  If a lot of manually entered metadata is necessary, policies and procedures for metadata entry are a necessity.  By standardizing the process it will be easier to manage, access, and preserve the files long-term.  Normally, some combination of automatically and manually created information is best for precise and practical metadata.  

If information technology makes metadata necessary, it’s also information technology that makes metadata useful. Useful metadata can inform business rules and software code that transforms it into “executable knowledge.” For example, metadata can be used for batch processing of files. A date element is critical to records management, as most record retention schedules are keyed to record date of creation. Metadata in more sophisticated data formats, such as eXtensible Markup Language (XML), allow for extraction, use, and calculation based on specific components of a metadata record.

 

Key Issues to Consider

Now that you are familiar with some of the basic concepts and types of metadata, you can consider some of the issues that have to be addressed in order to use metadata effectively. The most important are:

  • Audiences: Most people who rely on metadata are unaware they’re using it or even that it exists. Nevertheless, when you create metadata, you have to be aware of the audiences for your information in order to determine the appropriate standards and approaches. To make your decisions, you should know which information resources your audiences use, which questions they ask, and what their level of expertise is.
  • Partnerships: To increase the value of both metadata and the information it describes, you need to work with other creators, custodians, and users of information. If you agree on metadata standards, tools, and practices in collaboration with others, you will create a much more beneficial information management program for your whole organization.
  • Implementation: Selecting a standard is a good first step. Putting it into practice is a more useful and difficult one. Creating and maintaining metadata over time will demand attention, resources, and staff. You will get a good return on that investment if you keep in mind your legal mandates, your business processes, and your customers as you choose what standards and practices are most appropriate for you.
  • Education: One critical element of a practical metadata program to keep in mind is education. You will need to know about what others are doing with the standards, the tools, and the uses of metadata. Over time these may change and you will need to keep up with recent developments.
  • Promotion: To promote the understanding, use, and creation of metadata, as well as to ensure that there are enough resources to support a metadata program, it is important to draw people’s attention to metadata and its importance.

 

Discussion Questions

  • What metadata do we need to manage the collection? 
  • Who is the intended audience or user group for our collection? What do they expect or need?
  • Are there any institutional, technological or legal demands we need to consider?  What are our legal needs? Does our agency have a records management or data practices office?
  • What business functions is the metadata supposed to fulfill?
  • What metadata already exists?
  • Does our agency have an information and/or technical architecture? What metadata standards does it recommend?
  • Are our software applications creating metadata?
  • What are the metadata standards pertinent to our profession or business functions?
  • Are the offices or departments of our agency already creating metadata? Are they using different standards? 
  • How much time/money can be spent on creating metadata?
  • Do the managers and resource allocators in our agency support a metadata program? Have we made a business case to them? 

 

 

Metadata, Annotated List of Resources  go to Annotated list of resources

Next Chapter, File Naming  go to Annotated list of resources

Go to Table of Contents

 

Electronic Records Management Guidelines, March 2012, Version 7.

Links verified March 12, 2012.