Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • The Submission Information Package (SIP) will consist of one dataset (including all of the files and metadata (descriptive, administrative, and structural) at the dataset and file level) PUBLISHED in TDR.
  • Each version of a dataset will serve as a SIP. 
  • Each SIP will be packaged as a Research Data Alliance conformant zipped BagIt bag. The bag contains an OAI-ORE map file and a datacite.xml file. Additionally, a separate copy of the datacite.xml file is included. The SIP is submitted to Chronopolis via DuraCloud. 
  • TDL will submit previously published datasets and prior versions of existing datasets using a manual admin API call. If not already generated by Dataverse, this workflow creates a JSON-LD serialized OAI-ORE map file, which is also available as a metadata export format in the Dataverse web interface.

Ingest

From TDR to DuraCloud

  • TDL will establish and maintain one DuraCloud dashboard to move content from the TDR to Chronopolis. This includes all published content in the institutional dataverses within the TDR. 
  • All “published” datasets in the TDR will be automatically included in the ingest. TDL will do this twice a year. 
  • TDL will use the command line to automate dataset staging into Duracloud.
  • SIPs (resultant Bags from Dataverse) will be generated consisting of one dataset each that includes all the files and the Dataverse metadata at the dataset and file levels. Each version of the dataset will become its own SIP.  

From DuraCloud to Chronopolis

  • SIPs in DuraCloud spaces are manually transferred into a staging space in DuraCloud called the snapshot space. 
  • DuraCloud creates a properties file and stores it in the snapshot space.
  • Based on the snapshot properties file, DuraCloud makes the snapshot space read-only.
  • The Bridge Ingest application, part of the Chronopolis platform, facilitates the upload from DuraCloud to Chronopolis
    • Creates a directory for the snapshot on University of California San Diego’s file system (referred to as bridge storage in DuraSpace documentation)
    • Pulls all content from the space to /data directory under the snapshot directory
    • Moves the snapshot properties file from the data directory to the snapshot directory (which is one level above the data directory)
    • Stores the properties for all content items in a json file, also in the snapshot directory
    • Creates an MD5 manifest of all content items, adding each item after the MD5 has been verified to match the DuraCloud checksum. The initial MD5 value was generated upon upload from Dataverse to DuraCloud.
    • Creates a SHA-256 manifest of all content
    • The Bridge Ingest app:
  • The Chronopolis Intake Service sends requests to the Bridge Ingest App for new snapshot content.
    • DuraCloud host and port
    • Store ID
    • Space ID
    • Snapshot ID
    • Information provided on snapshot call: 
  • Chronopolis pulls all content from bridge storage into Chronopolis preservation storage.
    • For specific steps, see Chronopolis’s ChronCore Processes document
    • Chronopolis verifies content against manifest
    • Chronopolis creates Bags for the content

...