Technology

(DRAFT) 15. Technical Infrastructure

R15. The repository functions on well-supported operating systems and other core infrastructural software and is using hardware and software technologies appropriate to the services it provides to its Designated Community.

Compliance Level:

  • The guideline has been fully implemented for the needs of the repository.

Reference Standards:

Dataverse follows the guidance given in the OAIS reference model across the whole of the archival process. For example, the infrastructure supports separation between the Submission Information Package, Archival Information Package and Dissemination Information Package.

Dataverse is committed to using standard-compliant metadata to ensure that Dataverse metadata can be mapped easily to standard metadata schemas and be exported into JSON format (XML for tabular file metadata) for preservation and interoperability. See https://dataverse.org/presentations/metadata- model-dataverse-project-helping-more-data-become-discoverable. Detailed below are what metadata schemas are supported for Citation and Domain Specific Metadata in Dataverse:

  • Citation Metadata: compliant with DDI Lite, DDI 2.5 Codebook, DataCite 3.1, and Dublin Core’s DCMI Metadata Terms (see .tsv version). Language field uses ISO 639-1 controlled vocabulary.

  • Geospatial Metadata: compliant with DDI Lite, DDI 2.5 Codebook, DataCite, and Dublin Core (see .tsv version). Country / Nation field uses ISO 3166-1 controlled vocabulary.

  • Social Science & Humanities Metadata: compliant with DDI Lite, DDI 2.5 Codebook, and Dublin Core (see .tsv version).

  • Astronomy and Astrophysics Metadata : These metadata elements can be mapped/exported to the International Virtual Observatory Alliance’s (IVOA) VOResource Schema format and is based on Virtual Observatory (VO) Discovery and Provenance Metadata (see .tsv version).

  • Life Sciences Metadata: based on ISA-Tab Specification, along with controlled vocabulary from subsets of the OBI Ontology and the NCBI Taxonomy for Organisms (see .tsv version).

  • See also the Dataverse 4.0 Metadata Crosswalk: DDI, DataCite, DC, DCTerms, VO, ISA-Tab document.

    In addition, Dataverse is an open source web application, an application that utilizes the following components:

    o Java(dataverseisaJavaEnterpriseEdition(EE)webapplication) o Glassfish
    o PostgreSQL
    o Solr

    o SMTPserver
    o Persistentidentifierservice o R,rApache,andTwoRavens o Apache
    o Shibboleth

    • OAuth2

o Geoconnect

There are no significant deviations from the standards.

Infrastructure Development:

The Service Level Agreement (SLA) indicates that the TDL will be “responsible of the stewardship, technological oversight, and upgrades of the data repository software infrastructure.”
The TDL 2017 Annual Report lists a series of accomplishments and aspirations, including goals with respect to the repository. https://www.tdl.org/wp-content/uploads/2017/10/TDL_AnnualReport_2017.pdf

In late 2018, TDR Dataverse data has been moved to S3. Plans for digital preservation storage may include copying data to the Chronopolis network in 2019-20.

Finally, TDL updates its Operating Systems (OS) quarterly at a minimum, and immediately when important security patches are made available.

Software Inventory/System Documentation:

The software requirements are listed above and in the Dataverse Installation instructions.

http://guides.dataverse.org/en/4.9.4/installation/index.html

Most (if not all) of the code is available on GitHub.

https://github.com/TexasDigitalLibrary/dataverse

Community Supported Software:

Dataverse is open-source and supported by a growing Global Dataverse Community Consortium (http://dataversecommunity.global/).

Different installation versions can be seen here: https://github.com/IQSS/dataverse/releases
Other relevant GitHub addresses: https://github.com/IQSS/ and https://github.com/IQSS/dataverse Connectivity:

The TDR is hosted by Amazon Web Services. As such, the bandwidth is (likely) sufficient to the TDR’s needs. TDL also adjusts it as needed and we are constantly monitoring usage via Munin.

16. Security

R16. The technical infrastructure of the repository provides for protection of the facility and its data, products, services, and users.

Compliance Level:

  • Implemented: The guideline has been fully implemented for the needs of the repository.

Backup Procedures:

The Texas Digital Library “actively addresses the need to ensure the accuracy, integrity, authenticity, and permanence of the digital content that it manages, as well as the security of the services and platforms that it provides.” The technical infrastructure including the operational servers are located in a secure data center, where only authorized employees have access to the equipment after identification.

The TDL systems and services are hosted with Amazon Web Service (AWS), which provides cloud security services and support (https://aws.amazon.com/security/) to include:

  • Secure Network Architecture – segmentation and firewalls throughout;

  • Secure Access Points – API endpoints allowing HTTPS access;

  • Encryption – connections encrypted by SSL;

  • Network Monitoring and Protection – against DDoS and MITM attacks, IP spoofing, etc.; and • Identity Management and Authentication – secure log-in via password and SSH key pair.

Additionally, the TDL updates its Operating Systems (OS) quarterly at a minimum, and immediately when important security patches are made available.

The TDL has an official backup strategy in which the TDL retains:

  • the copy of the data residing on the production server, which is an Amazon S3 volume;

  • nightly snapshots that can be used to restore the entire service to a particular date within the

    preceding month;

  • and one snapshot from each month, retained for one year.

    Backups are stored in Amazon Elastic Block Store (EBS) snapshots, which is replicated storage with regular systematic data integrity checks.

    The AWS cloud spans “55 availability zones.” Geographic distribution increases the chances that a major catastrophe will not lead to total system loss.
    https://aws.amazon.com/security/

    A white paper produced by Amazon in May of 2017, Amazon Web Services: Overview of Security Processes, discusses, among other topics, both physical and operational security processes as well as business continuity and disaster planning.

https://d0.awsstatic.com/whitepapers/Security/AWS_Security_Whitepaper.pdf

IT Security System, Disaster Plan, Business Continuity Plan:

As mentioned above, cloud hosting through Amazon Web Services provides coverage with respect to potential disasters. At the same time, the University of Texas System has an Office of Risk Management. This office’s web presence has a section devoted to business continuity and emergency management. https://www.utsystem.edu/offices/risk-management/emergency-management-and-business-continuity

Finally, the University of Texas at Austin also has an Emergency Operations Plan.

https://preparedness.utexas.edu/sites/preparedness.utexas.edu/files/Emergency%20Operations%20Pla n%202018.pdf
https://preparedness.utexas.edu/emergency-plans