R15. The repository functions on well-supported operating systems and other core infrastructural software and is using hardware and software technologies appropriate to the services it provides to its Designated Community.
The guideline has been fully implemented for the needs of the repository.
Dataverse follows the guidance given in the OAIS reference model across the whole of the archival process. For example, the infrastructure supports separation between the Submission Information Package, Archival Information Package and Dissemination Information Package.
Dataverse is committed to using standard-compliant metadata to ensure that Dataverse metadata can be mapped easily to standard metadata schemas and be exported into JSON format (XML for tabular file metadata) for preservation and interoperability. See https://dataverse.org/presentations/metadata- model-dataverse-project-helping-more-data-become-discoverable. Detailed below are what metadata schemas are supported for Citation and Domain Specific Metadata in Dataverse:
Citation Metadata: compliant with DDI Lite, DDI 2.5 Codebook, DataCite 3.1, and Dublin Core’s DCMI Metadata Terms (see .tsv version). Language field uses ISO 639-1 controlled vocabulary.
Geospatial Metadata: compliant with DDI Lite, DDI 2.5 Codebook, DataCite, and Dublin Core (see .tsv version). Country / Nation field uses ISO 3166-1 controlled vocabulary.
Social Science & Humanities Metadata: compliant with DDI Lite, DDI 2.5 Codebook, and Dublin Core (see .tsv version).
Astronomy and Astrophysics Metadata : These metadata elements can be mapped/exported to the International Virtual Observatory Alliance’s (IVOA) VOResource Schema format and is based on Virtual Observatory (VO) Discovery and Provenance Metadata (see .tsv version).
Life Sciences Metadata: based on ISA-Tab Specification, along with controlled vocabulary from subsets of the OBI Ontology and the NCBI Taxonomy for Organisms (see .tsv version).
See also the Dataverse 4.0 Metadata Crosswalk: DDI, DataCite, DC, DCTerms, VO, ISA-Tab document.
In addition, Dataverse is an open source web application, an application that utilizes the following components:
o Java(dataverseisaJavaEnterpriseEdition(EE)webapplication) o Glassfish o PostgreSQL o Solr
o SMTPserver o Persistentidentifierservice o R,rApache,andTwoRavens o Apache o Shibboleth
There are no significant deviations from the standards.
The Service Level Agreement (SLA) indicates that the TDL will be “responsible of the stewardship, technological oversight, and upgrades of the data repository software infrastructure.” The TDL 2017 Annual Report lists a series of accomplishments and aspirations, including goals with respect to the repository. https://www.tdl.org/wp-content/uploads/2017/10/TDL_AnnualReport_2017.pdf
In late 2018, TDR Dataverse data has been moved to S3. Plans for digital preservation storage may include copying data to the Chronopolis network in 2019-20.
Finally, TDL updates its Operating Systems (OS) quarterly at a minimum, and immediately when important security patches are made available.
Software Inventory/System Documentation:
The software requirements are listed above and in the Dataverse Installation instructions.
The TDR is hosted by Amazon Web Services. As such, the bandwidth is (likely) sufficient to the TDR’s needs. TDL also adjusts it as needed and we are constantly monitoring usage via Munin.
R16. The technical infrastructure of the repository provides for protection of the facility and its data, products, services, and users.
Implemented: The guideline has been fully implemented for the needs of the repository.
The Texas Digital Library “actively addresses the need to ensure the accuracy, integrity, authenticity, and permanence of the digital content that it manages, as well as the security of the services and platforms that it provides.” The technical infrastructure including the operational servers are located in a secure data center, where only authorized employees have access to the equipment after identification.
The TDL systems and services are hosted with Amazon Web Service (AWS), which provides cloud security services and support (https://aws.amazon.com/security/) to include:
Secure Network Architecture – segmentation and firewalls throughout;
Secure Access Points – API endpoints allowing HTTPS access;
Encryption – connections encrypted by SSL;
Network Monitoring and Protection – against DDoS and MITM attacks, IP spoofing, etc.; and • Identity Management and Authentication – secure log-in via password and SSH key pair.
Additionally, the TDL updates its Operating Systems (OS) quarterly at a minimum, and immediately when important security patches are made available.
The TDL has an official backup strategy in which the TDL retains:
the copy of the data residing on the production server, which is an Amazon S3 volume;
nightly snapshots that can be used to restore the entire service to a particular date within the
and one snapshot from each month, retained for one year.
Backups are stored in Amazon Elastic Block Store (EBS) snapshots, which is replicated storage with regular systematic data integrity checks.
The AWS cloud spans “55 availability zones.” Geographic distribution increases the chances that a major catastrophe will not lead to total system loss. https://aws.amazon.com/security/
A white paper produced by Amazon in May of 2017, Amazon Web Services: Overview of Security Processes, discusses, among other topics, both physical and operational security processes as well as business continuity and disaster planning.