Digital Preservation Policy

Policy

A. Purpose

It is the mission of the Texas Digital library (TDL) to enable digital initiatives in support of research, scholarship, and learning in Texas. As a part of this mission, the TDL endeavors to collect, preserve, and disseminate scholarly materials for the benefit of both producers and consumers of academic research and scholarship. TheTexas Data Repository, TDL’s instance of the Dataverse software, encompassing all of the Dataverse collections of its member institutions, is the digital resource intended to address a consortium-level need for publishing, managing, and providing access to research-generated data sets. The following Digital Preservation Policy describes the extent to which the TDL will support sustainable access to the digital research data and related content deposited in the Texas Data Repository.

The preservation objectives of the Texas Data Repository are:

  • to collect, preserve, and disseminate the data sets and related information generated by researchers affiliated with any of the TDL’s member institutions who choose to deposit their content therein;
  • to enable researchers affiliated with any of the TDL’s member institutions to comply with the mandates of funding agencies to manage, preserve, and share their research data; and
  • to provide the means for users to discover and access the data sets and metadata generated by academics affiliated with any of the TDL’s member institutions over the long term.

Part of the TDL’s vision in establishing a consortium Dataverse repository installation is to make research materials freely available to anyone, anywhere, and at any time. The TDL is an advocate for open access to scholarly work including research data. The incentives to researchers for publishing and preserving their research data in the Texas Data Repository are:

  • data that might be precariously stored on fragile, random, or unsustainable storage devices can be securely preserved for the long term;
  • data that might otherwise become neglected over time can be preserved and made accessible for other interested researchers to use and cite potentially providing wider visibility and impact for the research;
  • many funding agencies and scholarly journals require data management plans that detail how the data will be managed, made accessible, and preserved.

B. Scope

The TDL accepts the responsibility to preserve and provide access to research data, including associated metadata and documentation that is properly deposited in the Texas Data Repository. This responsibility includes the provision of digital means to preserve and ensure ongoing access to said content for a minimum period of ten years after it is deposited in the TDR Dataverse repository. Long-term preservation of TDR Dataverse repository content, beyond the ten-year retention period, is subject to the TDL’s selection criteria, appraisal of the content, and budgetary and technical support of resources necessary to meet this goal. Metadata for content removed from the TDR Dataverse repository, regardless of reason or retention period, may be preserved for an undetermined period of time after said content’s removal.

The Texas Data Repository content will be selected and appraised according to the following preservation priorities and levels of commitment:

  1. Research data associated with publications – great effort will be made to ensure the long-term preservation of data associated with journal or scholarly publications, so long as the data meets the TDL collection policies and the Texas Data Repository remains the data’s hosted or cited repository.
  2. Stand-alone data publications with high research value – reasonable effort will be made to ensure the long-term preservation of data and metadata of stand-alone publications that library professionals identify as having high research value to the broader academic community.
  3. Other data files and materials – efforts may or may not be made to retain ephemeral materials considered to lack significant or long-term value, although particular files may be preserved on a select basis as appropriate.

Additionally, the Texas Data Repository will accept data submissions of any format, but only provides full support (i.e. data exploration, analysis, and meta-analysis via the TwoRavens suite of statistical tools) to tabular data preferably in the following formats:

  • SPSS (POR and SAV formats)
  • STATA
  • R data
  • CSV

These files can be in compressed ZIP format at ingest, however, they may not exceed 2 GB in size. Any individual file uploaded to the repository must be under 4GB, though any uploads over 2GB, and some below that threshold, may be slow or stall due to variables outside of TDL's control. Please email support@tdl.org if you having trouble uploading files. If you have files over 4GB, we will consider support options on a case by case basis and in consultation with your Institutional TDR liaison. Please see http://guides.dataverse.org/en/latest/user/tabulardataingest/index.html and http://guides.dataverse.org/en/latest/user/dataset-management.html for more specific information on data set and metadata formats.

Texas Data Repository provides basic, bit-level preservation through fixity checks and secure backup of deposited content (See also Information Security). Further and more in-depth digital preservation activities and services must be provided by a digital preservation program at the institution where the research data was originally generated.

C. Strategic Plan

The TDL has an official backup strategy that requires all digital content to be:

  • the copy of the data residing on the production server, which is an Amazon Elastic Block Store (EBS) volume;
  • nightly snapshots that can be used to restore the entire service to a particular date within the preceding month;
  • and one snapshot from each month, retained for one year. 

The TDL systems also provide security services key to basic digital preservation, namely access control, network monitoring and protection, encryption, and system updates (see Information Security Policy). There are currently no institutional limitations to the overall quantity of data that can be stored on TDL servers, only limitations on the size of individual files (4 GB) uploaded via the Texas Data Repository application and a recommendation for datasets not to exceed 10GB.

Procedures

The Dataverse software's best practices for data management and preservation include:

  • automatic extraction of metadata from tabular files and FITS;
  • standard descriptive metadata schemas such as OAI DC, DDI (for statistical and social science), ISA-Tab (for biomedical), FITS (for astronomy);
  • re-formatting of tabular data to simple open format text files;
  • data and metadata versioning;
  • database maintenance;
  • checksum generation upon ingest (UNF for tabular data, MD5 for other files);
  • persistent URL – DOI (minted by DataCite);
  • deaccessioning of data, but not citation metadata, if necessary.

The TDL systems infrastructure includes bit-level fixity checking via Amazon EBS and S3 host service. For more information about security, backups and integrity checking, see also Information Security.

References

The Dataverse Project, “Harvard Dataverse Preservation Policy,” http://best-practices.dataverse.org/harvard-policies/harvard-preservation-policy.html

Purdue University Research Repository (PURR), “PURR Digital Preservation Policy,” https://purr.purdue.edu/legal/digitalpreservation

Texas Digital Library, “Our Mission and Vision,” https://www.tdl.org/strategic-plan/vision/

Preserving digital Objects With Restricted Resources, “Tool Grid,” http://digitalpowrr.niu.edu/tool-grid/

Digital Curation Centre, “DataVerse,” http://www.dcc.ac.uk/resources/external/dataverse

Harvard Dataverse, “UCLA Social Science Data Archive Dataverse,” http://dataarchives.ss.ucla.edu/archive%20tutorial/archivingdata.html

Harvard’s Institute for Quantitative Social Science (IQSS), “About TwoRavens,” http://datascience.iq.harvard.edu/about-tworavens

University of North Carolina – The Odum Institute, “Digital Preservation Policies,” http://www.irss.unc.edu/odum/contentSubpage.jsp?nodeid=629

Harvard Dataverse Project, “User Guide: Tabular Data File Ingest,” http://guides.dataverse.org/en/latest/user/tabulardataingest/index.html

Elizabeth Quigley, IQSS-Harvard University, “The Expanding Dataverse,” http://dataverse.org/files/dataverseorg/files/introduction_to_dataverse.pdf?m=1447352697