Digital Preservation of DSpace Content


This service enhancement for TDL-hosted DSpace users is only provided for TDL members using Digital Preservation Services (DPS).

DSpace Replication Task Suite

If any DPS member would like to use their digital preservation storage for preserving their DSpace content, they can work with TDL to configure and implement a modified version of DSpace's Replication Task Suite (RTS). Documentation for the RTS as it has been applied to DuraSpace's DSpace and DuraCloud instances can be found here. The TDL implementation offers limited support for the RTS, as documented on this page. 

Members will work with TDL to create and name a space or spaces in their DuraCloud dashboard which will be the target for uploading their DSpace content. TDL will install the TDL-modified RTS to the institutional repository and train the member to use the system. By default, AIPs are generated in the default DSpace AIP Format, but there is also an option to generate BagIt-based AIPs using the Library of Congress specification instead of using the default DSpace AIP format. TDL and the member will decide this during the configuration and set up of their TDL-RTS.

TDL-RTS will only overwrite content already in DuraCloud if the local and destination checksums differ. If that's the case then it will transfer the new AIP generated from DSpace. And any AIPS generated from DSpace locally that don't exist in DuraCloud will be transferred.  So it adds new content whenever transmit is run.

Once a member has been set up with the TDL-RTS in their TDL DSpace, the following are the basic steps to preserve your content in your digital preservation store.

  1. In DuraCloud@TDL, the User must create a space or spaces to which they will upload content from DSpace.
  2. In DSpace, the User must login as repository administrator.
  3. Go to Curate tab in the edit menu for a community, a collection, or an item.
  4. Select the Replication Suite of Tasks from the dropdown menu.
  5. Proceed using any of the 3 options that TDL has enabled, listed below.

TDL offers three RTS options:

  1. Estimate Storage Space for AIP(s): This allows a user to estimate the amount of space that a generated AIP would fill in their digital preservation storage prior to transmitting the content. To use this function, an administrative user logs into DSpace, navigates to their collection and chooses the "Edit Collection" option in their administrative interface. The 'curate' tab in that edit screen shows a dropdown list of various curation tasks.  The user selects the "Estimate Storage Space for AIP(s)" task and clicks "Perform". After a moment or so, the results will display on the page. Ex. ID: 123456789/1 (Party Images) estimated AIP size: 5 gigabytes. The size provided is an estimate and does not include the metadata files, but should provide a number that is close to what will be transmitted to storage. 
  2. Transmit AIP(s) to Storage: This allows the user to replicate their selected content into their DuraCloud@TDL designated space. 
    1. The RTS will create a METS-based AIP in the default DSpace AIP Format, compressed into a 'zip' archive. It packages every community, collection, item, METS and content file as well as group information about the site.
      1. The other alternative supported by the replication task suite is Library of Congress 'Bagit' packaging, which may compressed either into a 'zip' file or a 'tgz' ('gzipped tar'), a compression standard more common in Unix systems. Most TDL Digital Preservation Services users are moving their content into Chronopolis, where bagging is part of the ingest process, so TDL recommends using the former option.
    2. The User can Perform this task immediately in the DSpace Admin UI, or, if the collection is rather large, they may instead Queue the task for later execution by using the queueing facility available in the curation system.

    3. Please note that the Transmit AIP(s) to Storage task, like all other replication tasks, operates on whatever DSpace object(s) they are given. Thus, if the object is a collection, the task creates (and transmits, of course) an AIP for the collection object itself (metadata and logo), as well as AIPs for each item in the collection. If the task is given an identifier for a single Item, then only one AIP will be created and transmitted.

    4. DSpace will report the successful transmission in the GUI, so do not navigate away once you've chosen Perform.
  3. Verify Replication: After running the Transmit AIP(s) to Storage task, this verifies that your selected content was successfully replicated into DuraCloud@TDL.
    1. This function only works immediately after the replication if you are using Amazon S3 as your primary digital preservation storage.
    2. TDL Digital Preservation Services members using Chronopolis and Glacier for preservation storage will not be able to use this task once their content has been moved from DuraCloud@TDL to the respective storage option. 


DuraSpace created a video to support Users running RTS tasks. Keep in mind, only the three options mentioned above are available to TDL Digital Preservation Services members.

 


For support with restoration of content to DSpace from a Digital Preservation Storage location, please contact TDL support (support@tdl.org) so they can schedule the task.