Center for Archival Resources On Legislatures (CAROL)
There are many components to digital preservation. Digital preservation actions must address digital media as well as the content and metadata. Different methods exist for preserving digital content over time. The method you choose should support the goals of providing access to the files for as long as necessary. If you need to provide access to a file for less than five years, less effort will go into preserving those files while more attention will be required to keep files accessible for ten years or longer. As preservation is an ongoing activity, the files that need to be accessible for the long-term will most likely need to have preservation actions taken on them multiple times to keep them accessible and viable. Three main components - media, content, and metadata - are addressed below.
Electronic media is not stable like its analog (paper) counterpart. Media includes:
- magnetic disks (internal and external hard drives) and tape
- optical media (CDs, DVDs, WORM disks)
- solid state media (flash drives, media cards)
Use care when selecting digital media for preservation purposes. For example, the lifespans of CDs or DVDs is closer to three to five years rather than the published ten, twenty five or longer lifespans. Because of the lack of moving parts, solid-state media could be more stable. Available media is constantly evolving; make choices based on needs and requirements.
Additional information on electronic media can be found in the Electronic Records Management Guidelines produced by the Minnesota State Archives.
When people think of digital preservation most are concerned with long-term access to the digital content. How do you keep digital files accessible over time? The method used will depend on how long the records need to be accessible, what format they are in, and other requirements.
Methods for ensuring access to digital content are provided below.
As a first step, keep multiple copies of the records. In addition, it is important to keep these copies on various types of media in geographically unique locations. Separating copies geographically protects content from local events, catastrophic or not.
- Example: Replication is the basis for LOCKSS (Lots of Copies Keeps Stuff Safe), an international community initiative to provide libraries with digital preservation tools. Content from one institution is replicated and kept at partnering institutions, geographically separating copies of the records. The Arizona NDIIPP state project PeDALS is based on LOCKSS.
Another small-scale, or short-term solution, is to refresh the media the files are stored on. particularly if the lifespan of the files is longer than the lifespan of the digital media it is stored on.
- Example: Transferring files from a CD created 3 years ago to another CD or to a external hard drive.
Over time files may change or become corrupted; preservation at the very basic level is making sure the files themselves do not change over time. Measuring change is done at the bit level (bits are the 1's and 0's the computer uses to read and interpret files). If any of these bits change, the file has been corrupted.
- Example: Use a fixity algorithm to run a checksum on a file to create a baseline fixity value. Rerun the same algorithm at a later date to see if the same fixity value is returned. If the values are the same, the file has not changed. If the values differ, the file has been modified or corrupted in some way. Checksum algorithms include: MD5, SHA-1, and SHA-2.
Conversion / Migration
Over time file formats are often replaced by newer ones. The software programs of choice today will likely not be supported or available in the future. The longer you need to keep a file, the greater chance that you will need to move it do an updated file format to maintain accessibility.
Conversion can be used to move the files from one format to another, from one operating system to another, or from one programming language to another.
There are risks involved with conversion; there is a chance of a loss of functionality. Functionality loss depends on formats being converted to and from and the tools being used for the conversion process. If there is a loss of functionality, agencies must be comfortable with these tradeoffs; what are you willing to give up? Is it acceptable?
If possible, keep the original files from which a conversion was performed on. New tools for these transformations are developed as needed by interested parties, and a better migration tool may come along that you would prefer to use.
- Example: Using ImagMagick to convert from one image format to another.
- ANSI/ARMA 16-2007 Standard: The Digital Records Conversion Process: Program Planning, Requirements, Procedures
If converting files from one format to another creates an unacceptable loss, emulation might be an option. Emulation is the process of replicating the functionality of an obsolete system. The goal of this is to eliminate the loss of functionality that may occur with conversion. Hardware and software emulators recreate the computer environment and maintain the look and feel of the emulated system. Emulators have been used most often in recreating older video game environments on newer systems (e.g., you can playing Atari games on today's computers and feel like you are using an Atari game system).
Developing these systems takes a large investment of time. The initial costs for emulation may be higher than the benefits of the results. Keep in mind that the newer the system, the harder that system is to emulate (it is more complex); as well as the continued use of proprietary systems and formats.
- Example: The Multi Arcade Machine Emulator (MAME) recreates the playing environments for several thousand classic arcade games from the 1970s through the modern era.
For more information the British Library's Research and Innovation Report 106 (1998), Comparison of Methods and Costs of Digital Preservation goes into detail about each of these methods.
Metadata associated with a digital file or group of files must also be preserved. Metadata helps with object discovery, understanding the object's structure and technical environment, as well as understanding rights and change management. The various types of metadata is futher described in CAROL here.
Preservation metadata records information about the technical environment of a digital object, provenance over time, preservation actions, and rights management. This information is necessary to address the authenticity of a file, as well as being able to understand how it might have changed over time.
As described on Wikipedia, preservation metadata often includes the following information:
- Provenance: Who has had custody/ownership of the digital object?
- Authenticity: Is the digital object what it purports to be?
- Preservation activity: What has been done to preserve the digital object?
- Technical environment: What is needed to render and use the digital object?
- Rights management: What intellectual property rights must be observed?
Preservation metadata is valuable information that must remain accessible throughout an objects lifespan.
- Example: PREMIS (Preservation Metadata) is a data dictionary and supporting XML schemas for core preservation metadata needed to support the long-term preservation of digital materials. These include information on the digital object, associated rights, actions taken on the object, and the people, organizations, or software environment as required.
- Example: Minnesota Recordkeeping Metadata Standard (IRM 20) was developed to facilitate records management by government entities at any level of government. It address the issues of access restrictions, data practices, and records retention and disposition, thereby enabling the practical implementation of statutory mandates for records management.
Links to the main sections of CAROL are provided below.
February 15, 2012; links verified April 1, 2013.