Integrations and Features

As of November 2022, the Texas Data Repository decided to add a few key integrations and external tools to the Dataverse instance. The three integrations are for GitHub, Dropbox, and the Open Science Framework. The one feature added is the Embargo option for data sets. You’ll find more information on these additions below.

GitHub

Dataverse integration with GitHub is implemented via a Dataverse Uploader GitHub Action. It is a reusable, composite workflow for uploading a git repository or subdirectory into a dataset on a target Dataverse Installation. The action is customizeable, allowing users to choose to replace a dataset, add to the dataset, publish it or leave it as a draft version on Dataverse. The action provides some metadata to the dataset, such as the origin GitHub repository, and it preserves the directory tree structure.

Dataverse Uploader Actions can be found here: https://github.com/marketplace/actions/dataverse-uploader-action

Dropbox

If your researchers have data on Dropbox, this will make it easier for them to get it into the Dataverse installation.

Open Science Framework (OSF)

The Center for Open Science’s Open Science Framework (OSF) is an open source software project that facilitates open collaboration in science research across the lifespan of a scientific project.

For instructions on depositing from OSF to the Dataverse installation, researchers can visit: https://help.osf.io/hc/en-us/articles/360019737314-Connect-Dataverse-to-a-Project

Embargoes

A Dataverse instance may be configured to support file-level embargoes. Embargoes make file content inaccessible after a dataset version is published - until the embargo end date. This means that file previews and the ability to download files will be blocked. The effect is similar to when a file is restricted except that the embargo will end at the specified date without further action and during the embargo, requests for file access cannot be made. Embargoes of files in a version 1.0 dataset may also affect the date shown in the dataset and file citations. The recommended practice is for the citation to reflect the date on which all embargoes on files in version 1.0 end. (Since Dataverse creates one persistent identifier per dataset and doesn’t create new ones for each version, the publication of later versions, with or without embargoed files, does not affect the citation date.)

Embargoes are intended to support use cases where, for example, a journal or project team allows a period after publication of a dataset and/or the associated paper, during which the authors still have sole access to the data. Setting an embargo on relevant files and publishing the dataset in Dataverse publicizes the persistent identifier (e.g. DOI or Handle) for the dataset (and files if the instance is configured to create persistent identifiers for them) and makes the metadata, and any of the content of un-embargoed files immediately available, but automatically denies access to any embargoed files until the specified embargoes expire. Once a dataset with embargoed files has been published, no further action is needed to cause the embargoed files to become accessible as of the specified embargo end date. (Note that embargoes can be set along with using the ‘restrict’ functionality on files. The restricted status will affect their availability as normal (and described elsewhere) once the embargo expires.)

  • Setting the same embargo on all files in the dataset can be seen as providing a dataset-level embargo - making the dataset persistent identifier and metadata available but restricting access to all files.

  • “Rolling” embargoes on time-series data can be supported by publishing multiple datasett versions and adding new embargoes on the files added in that version. For example, every year, files containing the prior year’s results can be added to a dataset and given an embargo ending one year later than the embargoes set in the last dataset version, and the new dataset version can then be published. The datafiles published in the different versions will become available when their individual embargoes expire at yearly intervals.

As the primary use case of embargoes is to make the existence of data known now, with a promise (to a journal, project team, etc.) that the data itself will become available at a given future date, users cannot change an embargo once a dataset version is published. Dataverse instance administrators do have the ability to correct mistakes and make changes if/when circumstances warrant.