Friday, October 3, 2008

Reading Notes - Week 6

Hedstrom: Research Challenges in Digital Archiving and Long-term Preservation

Future research capabilities will be seriously compromised without significant investments in research and the development of digital archives.

Digital collections are vast, heterogenuous, and growing at a rate that outpaces our ability to manage and preserve them.

Human labor is the greatest cost factor in digital preservation.

need systems that are: self-sustaining, self-monitoring, self-repairing.

redundancy, replication, security against intentional attacks & technological failures, issues of forward migration: critical

Economic and policy research needs span a wide range of issues such as incentives for organizations to invest in digital archives and incentives for depositores to place content in repositiories.

questions of intellectual property rights, privacy, and trust.

digital preservation will not scale without tools and technologies that automate many aspects of the preservation process and that support human decision-making.

models needed to support: selection, choice of preservation strategies, costs/benfits of vatious levels of description/metadata.

it is important to recognize that metadata, shemas, and ontologies are dynamic

managing schema evolution is a major research issue.

Research issues in the area of naming and authorization nclude development of methods for uniquye and persistent naming of archived digital objects, tools for certification and authentication of preserved digital objects, methodds for version control, and interoperability among naming mechanisms

research is needed on the requirements for a shared and scalable infrastructure to suport digital archiving

a metadat schema registry is also needed


Littman: Actualized Preservation Threats

Chronicling America, three goals for the program are to support the digitization of historically significant newspapers, facilitate public access via a web site, provide for the long-term preservation of these materials by constructing a digital repository.

made the explicit decision not "trust" the repository until some later point; stored and backed up in a completely seperate environment

four preservation threat categories: media failure, hardware failure, software failures, operator errors

a number of hard drive failures; in one case a second problem occurred while storage system was rebuilding; resulted in the loss of a small amount of data from the system. fortunately, file system diagnostics were able to identify & restore corrupted files

first software failure was failure to successfully validate digital objects created by awardees; gaps remained in validation that allowed awardees to to submit METS records that passed validation and were ingested into the repository, but did not conform to the appropriate NDNP profile.

transformation failure: transformation of the METS record has proven to be complex and error prone; the transformation that put the original METS record inline was stripping the XML markup.

XMS file system was corrupted, resulting in the loss of some data

most sifnificant threats to preservation occurred as a result of operator errors. deletion of a large number of files from a section of a file system; lack of auditing capabilities contribured to this problem.

mistakes performed during ingest

already implemented some significant architectural changes to address.


Lavoie: Technology Watch Report

digital preservation – securing the long-term persistence of information in digital form

cultural heritage institutions, businesses, government agencies, etc. – with the need to take steps to secure the long-term viability of the digital materials in their custody. Many of these entities do not perceive an archival function within the scope of their organizational mission.

no perceived consensus on the needs and requirements for maintaining digital information over the long-term. A unifying framework that could fill this gap would be invaluable in terms of encouraging dialog and collaboration among participants in standards-building activities, as well as identifying areas most likely to benefit from standards development.

two primary functions for an archival repository: first, to preserve information – i.e., to secure its long-term persistence – and second, to provide access to the archived information

obtain sufficient intellectual property rights, along with custody of the items, to authorize the procedures necessary to meet preservation objectives. For example, if the OAIS must create a new version of the archived item so that it can be rendered by current technologies, it must have the explicit right to do so.

must not only preserve information, but also a sufficient portion of its associated context to ensure that the information is understandable, and ultimately, useable by future generations. "Contextual information" that might be preserved includes, but is not limited to, a description of the structure or format in which the information is stored, explanations of how and why the information was created, and even its appropriate interpretation.

first functional component is Ingest, the set of processes responsible for accepting information submitted by Producers and preparing it for inclusion in the archival store.

Archival Storage. This is the portion of the archival system that manages the long-term storage and maintenance of digital materials entrusted to the OAIS.

Data Management is the third functional component of an OAIS. The Data Management function maintains databases of descriptive metadata identifying and describing the archived information in support of the OAIS’s finding aids; it also manages the administrative data supporting the OAIS’s internal system operations, such as system performance data or access statistics

Preservation Planning. This service is responsible for mapping out the OAIS’s preservation strategy, as well as recommending appropriate revisions to this strategy in response to evolving conditions in the OAIS environment.

Access is the fifth functional component of an OAIS-type archive. As its name suggests, the Access function manages the processes and services by which Consumers – and especially the Designated Community – locate, request, and receive delivery of items residing in the OAIS’s archival store.

Administration. The Administration function is responsible for managing the day-to-day operations of the OAIS, as well as coordinating the activities of the other five high-level OAIS services

OAIS information model is built around the concept of an information package: a conceptualization of the structure of information as it moves into, through, and out of the archival system. An information package consists of the digital object that is the focus of preservation, along with metadata necessary to support its long-term preservation and access, bound into a single logical package

Submission Information Package, or SIP, is the version of the information package that is transferred from the Producer to the OAIS when information is ingested into the archive.

Archival Information Package, or AIP, is the version of the information package that is stored and preserved by the OAIS.

Dissemination Information Package, or DIP, is the version of the information package delivered to the Consumer in response to an access request.

Taken together, the Content Information and Preservation Description Information represent the archived digital content, the metadata necessary to render and understand it, and the metadata necessary to support its preservation.

Jones/Baegrie: Introduction & Digital Preservation

growing awareness of the significant challenges associated with ensuring continued access to these materials, even in the short term.

The need to create and have widespread access to digital materials has raced ahead of the level of general awareness and understanding of what it takes to manage them effectively.

institutions that have not played a role in preserving traditional collections do not have a strong sense of playing a role in preserving digital materials. Individual researchers were keen to "do the right thing" but frequently lacked the clear guidance and institutional backing to enable them to feel confident of what they should be doing

Digital preservation has many parallels with traditional preservation in matters of broad principle but differs markedly at the operational level and never more so than in the wide range of decision makers who play a crucial role at various stages in the lifecycle of a digital resource

While there is as yet only largely anecdotal evidence, it is certain that many potentially valuable digital materials have already been lost.

Machine Dependency. Digital materials all require specific hardware and software in order to access them

The speed of changes in technology means that the timeframe during which action must be taken is very much shorter than for paper

Fragility of the media.The media digital materials are stored on is inherently unstable and without suitable storage conditions and management can deteriorate very quickly

The ease with which changes can be made and the need to make some changes in order to manage the material means that there are challenges associated with ensuring the continued integrity, authenticity, and history of digital materials.

The implications of allocating priorities are much more severe than for paper.

The nature of the technology requires a life-cycle management approach to be taken to its maintenance

widely acknowledged that the most cost-effective means of ensuring continued access to important digital materials is to consider the preservation implications as early as possible, preferably at creation, and actively to plan for their management throughout their lifecycle.

All public institutions such as archives, libraries, and museums need to be involved in applying their professional skills and expertise to the long-term preservation of digital materials, just as they have taken a role in the preservation of traditional materials.

Preservation costs are expected to be greater in the digital environment than for traditional paper collections

need actively to manage inevitable changes in technology at regular intervals and over a (potentially) infinite timeframe.

lack of standardisation in both the resources themselves and the licensing agreements

as yet unresolved means of reliably and accurately rendering certain digital objects so that they do not lose essential information after technology changes

for some time to come digital preservation may be an additional cost on top of the costs for traditional collections unless cost savings can be realised

Because digital material is machine dependent, it is not possible to access the information unless there is appropriate hardware, and associated software which will make it intelligible.

While it is technically feasible to alter records in a paper environment, the relative ease with which this can be achieved in the digital environment, either deliberately or inadvertently, has given this issue more pressing urgency

Although computer storage is increasing in scale and its relative cost is decreasing constantly, the quantity of data and our ability to capture it with relative ease still matches or exceeds it in a number of areas.

approaches to digital preservation:
-Preserve the original software (and possible hardware) that was used to create and access the information. technology preservation strategy
-Program future powerful computer systems to emulate older, obsolete computer platforms and operating systems as required.This is the technology emulation strategy.
-Ensure that the digital information is re-encoded in new formats before the old format becomes obsolete.This is the digital information migration strategy

The dramatic speed of technological change means that few organisations have been able even fully to articulate what their needs are in this area, much less employ or develop staff with appropriate skills

Roles are also changing within as well as between institutions. Assigning responsibility for preservation of digital materials acquired and/or created by an organisation will inevitably require involvement with personnel from different parts of the organisation working together

Some consideration also needs to be given in the selection to the level of redundancy needed to ensure digital preservation. A level of redundancy with multiple copies held in different repositories is inherent in traditional print materials and has contributed to their preservation over centuries

The IPR issues in digital materials are arguably more complex and significant than for traditional media and if not addressed can impede or even prevent preservation activities. Consideration may need to be given not only to content but to any associated software

No comments: