Minnesota  State Archives

Electronic Records Management Guidelines

Long-Term Preservation

Summary

During the course of routine business, your agency generates thousands upon thousands of electronic records, from e-mail to web pages to complex e-government transactions.  Most are useful for only a short period of time, but some you may need to keep permanently.  For those records, you will need to implement a well-considered, well-documented plan for their preservation in order to ensure that they remain trustworthy and useful over time.  Tools such as migration, conversion, metadata, and eXtensible Markup Language (XML) will help you not only preserve your records, but also realize their full value.

Legal Framework
For more information on the legal framework to consider when developing a preservation plan for your records, refer to the Legal Framework chapter of these guidelines and the Minnesota State Archives’ Preserving and Disposing of Government Records. Also consider the Information and Communications Technology Policy, which mandates state agency compliance with Minnesota’s enterprise technical architecture “to ensure that individual agency information systems complement and do not needlessly duplicate or conflict with the systems of other agencies. . . . [and to] promote the most efficient and cost-effective method of producing and storing data for or sharing data between those agencies.”  Section 16E.07 of this same statute establishes the North Star portal as the state’s official online government information service with the idea that “the greatest possible access to certain government information and data is essential to allow citizens to participate fully in a democratic system of government.”

 

Key Concepts

The value of your information justifies your investment in information technology.  There is no point to an agency investing large sums in hardware and software if it cannot preserve the use-value of the information it creates, exchanges, and stores.  In the short term, this is often not a problem.  But, over time, it will be.  As technology changes, current hardware and software will become obsolete, and then you might face some hard choices.  The challenge is to preserve the usefulness and trustworthiness of your information in an efficient and cost-effective manner.

Any preservation plan for electronic records must take into account the changes in hardware and software, the limitations of storage media, and the potential use-value of your information.  As you begin exploring your options, you will need to be familiar with the following:

 

Needs Assessment

As a first step in developing your preservation plan, you should do a needs assessment to help guide your decisions.  While the complexity of such an analysis will vary from situation to situation, these basic components should always be included.

First, you need to understand the value of your information.  The value of your information will justify your investment in technology, over the short- and the long-term.  Minnesota’s enterprise technical architecture notes that information is the state’s most important asset.  But all information is not created equal; some has much more value than others.  Some of your information, as records, will have legal and evidentiary significance and may well demand special attention.  Most of the information you want to preserve will be important to your agency’s mission or, increasingly, to the business of other agencies as well.  As e-government develops in complexity and sophistication, more and more agencies will be expected to work within the framework of a common technological architecture and to share the information they create. 

The practical side of understanding the value of your records is determining their retention requirements.  How long do you really need to keep them?  Why are you keeping them?  Do they have to be kept in electronic format or is there another, more cost-effective option for long-term storage?  For instance, a word processing document might be printed and kept as a paper record without losing any of its value.  In contrast, printing a web page means a significant loss of information and functionality.

It is also important to ascertain if access to certain data in your records is restricted by statute.  The state’s Government Data Practices Act and some federal laws, such as the Health Insurance Portability and Accountability Act (HIPAA), will determine if data needs to be protected as confidential or non-public.  If it does, then you will need to ensure that your long-term storage and access policies account for those obligations.

In the broadest sense, the demands governing the access and use of your records will determine what preservation options are most appropriate and will dictate the metadata you should create and store along with the records.  Metadata is the “data about the data,” that allows you to manage, find, and evaluate your information over time.  Minnesota’s enterprise technical architecture includes metadata standards for GIS data, web content management, and recordkeeping.  There are a number of international standards that are pertinent as well.  While all-important for the long-term preservation of data, metadata takes on additional significance when you share your information because others must understand the information’s structure and content in order to put it to fullest use.  For more information about metadata, refer to the Metadata chapter of these guidelines and the Needs Assessment document created by the Minnesota led National Digital Information Infrastructure and Preservation Program (NDIIPP) project.

 

Physical Storage Options

As mentioned, choosing the most appropriate storage option for your situation will depend upon your records’ access requirements.  There are basically three options available to you:

  • Online storage.  Records are kept on a server or hard drive and are immediately available for use over a network.  This option is best for records that must be accessed frequently.
  • Near-line storage.  Records are stored on media such as optical disks in jukeboxes or tapes in automated libraries which are attached to a network.  Because retrieval is slower than with online storage, this option is most appropriate for records that are accessed occasionally.
  • Offline storageRecords are stored on removable media and must be manually retrieved.  This option provides the slowest access and should be used for records that are only rarely needed.

If you choose near-line or offline storage, you will need to consider what media will best suit your needs.  To do this, you should start by analyzing your current and projected volume of stored records, along with the size of the files themselves and any associated metadata.  Also take into account any security requirements, such as viewing, use, and modification restrictions. 

Different media have different storage characteristics.  For more information on media types, storage options and how this may affect your preservation strategy, please review the information in the Digital Media and Digital Media Storage chapters of these guidelines.  

 

File Format Options

Most records are created using specific, proprietary software applications.  Over time, these applications will be upgraded or be phased out altogether.  Because upgraded applications may or may not be able to read files created with previous versions, backward compatibility is not a given and cannot be counted on as a preservation tool.  Maintaining the software on your own is an option, but over and above the question of costs, that carries the risk the software will fail in time, leaving you with no way to access your records.  One common alternative is continually to convert your files from version to version and format to format as your software environment changes.

While non-proprietary formats are the ideal for the long-term preservation of files, they are few in number and each has its limitations.  ASCII or plain text will capture data in the lowest common denominator of formats, losing structure and functions in the process.  Rich Text Format (RTF) is a Microsoft format, although it is supported by a variety of vendors and software applications.  Portable Document Format (PDF), a popular choice for file sharing and storage, is an Adobe product.  Because Adobe makes PDF’s specifications publicly available, many believe that it is an open standard when, in fact, the company is under no obligation to continue this practice into the future.  Furthermore, PDF has a problem with backward compatibility, with newer versions often incorrectly rendering files created with older ones.  To address these problems, an archival version, PDF/A, was developed and became an ISO standard in 2005.

For long-term preservation and use, eXtensible Markup Language (XML) is currently a good choice of formats.  An international standard since 1998, XML is both a file format and a text-based, self-describing, human-readable markup language that is independent of hardware and operating systems.  Because it is infrastructure-independent, XML is one of the best solutions for re-purposing the content of your records and/or sharing them with others.  Proper use of XML requires a certain amount of planning and up-front commitment of money and time, but its structured nature makes it suitable for automation and will allow you to more easily take advantage of whatever new open formats will follow in the future. 

For more information on this and additional file types and their associated formats, refer to the File Formats chapter in this series. 

 

Digital Preservation Techniques

There are several approaches, some more practical than others, to ensure that electronic records remain useful over time.  One is to save all of the hardware, software, and documentation needed to support the records.  Known as the “computer museum” approach, it is not very realistic on a large scale because, given how rapidly hardware and software environments change, it means storing and maintaining huge quantities of outdated equipment with no assurance that any of it will work when needed.

Emulation has a similarly antiquarian flavor.  Emulator programs simulate the behavior, look, and feel of other programs, thus preserving the functionality of the records in their original format without the necessity of saving the original equipment and software.  However, emulation has so far proven more attractive in theory than in practice. There are few examples of success using this approach, and costs have proven high.  It has a further limitation in that, at best, emulation simply reproduces earlier, less sophisticated versions of an application.  Given all the expenses of technology, it seems problematic to limit the value of information by preserving it in a static framework.

Encapsulation is a third approach to preservation.  It involves combining the object to be preserved with all of the necessary details of how to interpret it within a wrapper or package, all possibly formatted in XML.  While appealing in its comprehensiveness, encapsulation has several drawbacks: file sizes are large because of all of the included information; format specifications must be determined; the encapsulated records must somehow be generated, usually separate from the act of record creation; and the encapsulated records must still be migrated over time.

The most common approach to preserving electronic records involves a combination of two other techniques: migration and conversion.  Migration is the process of  moving files to new media (also know as “refreshing”) or computer platforms in order to maintain their value.  Conversion entails changing files from one format from one to another and may involve moving from a proprietary format, such as Microsoft Word, to a non-proprietary one such as a plain text file or XML.  To avoid losing data in the process, you should perform initial tests and analysis to determine exactly what changes will occur and whether they are acceptable.  With both migration and conversion, special attention must be paid to also maintaining the accessibility of any associated metadata.  When properly planned and executed, the migration and conversion approach probably represents the easiest and most cost-effective preservation method available today. 

 

Preservation Planning

A preservation plan should address an institution’s overall preservation goals and provide a framework that defines the methods used to reach those goals.  At a minimum, the plan should define the collections covered by the plan, list the requirements of the records, practices and standards that are being followed, documentation of policies and procedures related to preservation activities, and staff responsibility for each preservation action.  It is important to remember that preservation activities are not static and that the preservation plan will need to be reviewed and readdressed on a regular basis to remain viable and useful.

The costs of preservation will be a major factor in the development of your plan.  To some degree costs will help determine the level of your preservation efforts. Often there is not enough money available to preserve all electronic records for the long-term.  Understanding financial resources allows you to make informed decisions on what to preserve.  Without this understanding even the best laid preservation plans will fail. 

The Electronic Resource Preservation and Access Network (ERPANet) divides costs into four major categories: technical infrastructure, financial plan, staffing infrastructure, and outsourcing costs. Technical infrastructure costs include equipment purchase, maintenance, and upgrades necessary to keep networks online and adjust to software and hardware obsolesce.  A solid financial plan must be backed up with a commitment to long-term funding.  Staffing costs include the costs of hiring and training employees.  Any services that are outsourced will have a direct effect on your preservation costs.  Costs also depend on the record format, level of security required, and the length of time the materials need to be preserved. 

A cost-benefit analysis should be done to analyze each aspect of your workflow to determine the most cost-effective method of preserving identified records.  If funding is limited or changes over time, you may need to reanalyze and possibly scale back the amount of materials you are able to preserve.  Choices will have to be made.  Your retention schedules, needs and risk assessments will be able to help you make these decisions.  “Although the costs of preserving digital materials might be high, the cost, consequences and implications of not having a digital preservation policy may be higher and in some cases they could affect the feasibility of the preservation.” (ERPANet)

When developing a preservation plan, there are many models that can be used as a guideline.  A few of these models include:

In addition, specific examples of completed preservation plans can be found online and cover a wide range of topics from how to preserve historical sites and buildings to preserving digital objects.

The following outlines the basic structure of a digital preservation plan and highlights the common points from the resources above. 

Purpose Statement: Why are you writing the preservation plan?  Why is digital preservation important to your institution? 

Relation Statement: How does the document relate to others across the institution?  Does it complement current records management policies? 

Objective Statement: What are the goals of digital preservation?  Are the goals based on a specific project or do they reach the institution level?  What are the overarching goals of your preservation program?  You may want to include both short- and long-term goals. 

Periodic Review Statement:  How often will the preservation plan be reviewed?  Is it based on a schedule or on an event that triggers review?  Record what changes are made, who made them and when.  This will help ensure your preservation plan is sustainable over time.

Descriptive Statement: What materials will be preserved?  What will not be preserved?  What formats are supported?  How long will each category of records be preserved?  How will access be provided?  Who has access to the records?  Where are files stored?  What hardware and software are being utilized?

 Implementation Plans:  Documentation that explains the overall details of the preservation plan such as:

  • Staff responsibilities: What staff position is responsible for ensuring the preservation plan is carried out? 
  • Financial responsibilities: Who is responsible for the long-term monetary sustainability of preservation activities including changing technologies, necessary staff, storage space, physical media, and what is required for following set preservation strategies?   How will these goals be achieved and sustained?
  • Record Requirements: What metadata is required?  What metadata standard is being used?  What file formats are accepted?  What file sizes are accepted? What is the file structure?  Include descriptive information about content and context.
  • Access and Use Restrictions: Are the records public or non-public?  Are there any restrictions based on intellectual property rights?  Include specific restrictions in the plan?  Provide attribution statements if necessary. 
  • Best Practices, Standards: What best practices are being followed?  For what records?  What standards are being followed?   For what records?  What metadata standard is being followed?  What file format standards are being followed?  Naming conventions?
  • Risk Management: Understand the risks of your digitization plan including what risks your chosen file formats might pose, what risks the acts of migration might pose, and what risks there are to your IT infrastructure.  Have a disaster plan that includes disaster recovery procedures in case of a disaster.   
  • Stakeholders: Who uses your digital files?  Who is dependent on them?  Do you work with others?  Do others deposit files?
  • Preservation strategies: Define what is being preserved; this should include describing the content, structure of the content, and the relationships between documents.  Define the type/s of strategies being used to preserve the files such as migration or conversion and how often these strategies are to be carried out?  Is there a time schedule or do triggers determine when files are migrated?  Do you use outside services?  For what?  What are their procedures and processes?
  • Storage Requirements: What type of storage environment is necessary?  What media type/s will be used for storage?  Are files backed up? How often?  With what process?  Where are files stored?  If offsite storage is being used, include information on the contracts and the vendor’s requirements for file formats etc…
  • Quality Control/Security Measures: What methods will be used for quality control? How often will integrity checks be made?  How?  What processes are you using to ensure the trustworthiness of the files?  How is their authenticity and integrity preserved?  How will provenance be tracked? How are access and use restrictions controlled?  Are audit trails and logging activity processes necessary to ensure trustworthiness?  What other security measures are being taken?  Who is responsible for maintaining such processes?

Glossary: A glossary of terms may be useful if people who are unfamiliar with digital preservation will be accessing the preservation plan. 

 

E-Government and Collaboration

The State of Minnesota’s e-government framework should influence your preservation plans.  The long-term preservation of records will demand a variety of investments and decisions that will involve time, staff, technology, and specialized expertise.  Practically speaking, the state probably cannot afford to have every agency make all those investments independently. Similarly, no agency, even with the best of intentions, can consistently make all those decisions correctly.   Finding effective and economic solutions means working together.

The state’s enterprise technical architecture reflects the idea of collaboration.  In order to facilitate the development of e-government, the architecture identifies a series of issues, approaches, and standards that will make your agency’s investments in information technology more likely to succeed.  At the same time, these also will facilitate the long-term preservation of digital resources through sharing services and solutions.  In developing your preservation strategy, start by looking at what other agencies are doing and what you can learn from their experiences.

 

Key Issues to Consider

Long-term retention requires long-term preservation.  To ensure long-term preservation, a preservation plan and associated policies should be developed.  The foundation of your preservation plan should be your needs assessment, as well as an analysis of the costs, benefits, and risks involved with each of the options you are studying.  Your records management, information technology, and legal staff should all be involved in the process to make sure your plan meets your business requirements and fits in with your general electronic records management strategy.  Be sure to document your decision-making process in addition to your choices and plans for implementation.

At the minimum, your preservation plan should include the following items:

  • Rationale and requirements for your preservation program.
  • List of relevant records series and their retention and access requirements.
  • Explanation of the selected preservation technique(s), including schedules for preservation actions, quality assurance testing, backups, etc. and instructions for documentation.
  • Pointer to a business continuity or disaster recovery plan.

Once completed, your preservation plan should not gather dust on a shelf.  Rather, it should be a reference document for all preservation activities, and it should be kept up to date as your situation changes (e.g., changes in use needs, hardware, software, media, security/access requirements, retention periods, legal mandates).

 

Discussion Questions

When beginning to develop a preservation plan, you will be faced with many choices.  These are just a few of the questions you should ask during the process.

  • How long do we need to keep these records?  What will be the costs associated with such preservation tasks as migration and conversion over time?
  • What best practices can we identify and apply to our situation?  Can we cooperate with other agencies or organizations to share expertise or save money?
  • Do we need to keep the records in electronic format or is another format, such as paper or microfilm, more appropriate?  How much functionality do we need to retain over time?
  • How often are these records accessed?  What is the best storage solution (e.g., online, near-line, offline)?
  • What is the most appropriate storage media for the records?  How will we ensure that we retain the hardware necessary to handle the media?  What documentation should we collect and maintain regarding the media and hardware?
  • How will we ensure that the record content is accessible and readable over time?  Is the format and necessary software proprietary or non-proprietary?  What documentation should we collect and maintain regarding format and software?
  • How will we perform periodic quality assurance checks to ensure accessibility and trustworthiness over time?  How will we document these checks?
  • What indexing and metadata schemes should we employ to ensure that the records can be easily located and evaluated for use?
  • How will these records be used?  Will they be shared with others inside our organization?  Outside?  Would XML enhance the use-value of the records?
  • Have the records been compressed or encrypted?  If so, how does this fit into our management plan?
  • Are there data access issues that require special security measures?
  • What hardware and software configurations are we moving to in the foreseeable future?  How do these records fit in with that plan?
  • What staff training is necessary to ensure compliance with the preservation plan?

 

Long-Term Preservation, Annotated List of Resources  go to Annotated list of resources

Next Chapter, Business Case  go to Annotated list of resources

 

Go to Table of Contents

 

Electronic Records Management Guidelines, March 2012, Version 5.

Links verified March 13, 2012.