What is the Texas Data Repository?
- The Texas Data Repository is a platform for publishing and archiving datasets (and other data products) created by faculty, staff, and students at Texas higher education institutions. The repository is built in an opensource application called the Dataverse software, originally developed and used by Harvard University.
What is Dataverse?
- The Texas Data Repository is an installation of open source Dataverse software. Dataverse software is a free and open source software platform for publishing, citing, and preserving research data. The Dataverse software was developed by the Institute for Quantitative Social Science (IQSS) at Harvard University. Harvard’s IQSS operates the largest Dataverse repository, with roughly 60,000 datasets. Each Dataverse software installation is interoperable with other Dataverse installations and systems (like Open Journal Systems), providing opportunities for greater visibility of data.
What’s the difference between a dataverse and a dataset?
- A Dataverse collection is a container for datasets (research data, code, documentation, and metadata) and other Dataverse collections, which can be setup for individual researchers, departments, journals and organizations.
- A researcher who logs in to the Texas Data Repository can create a Dataverse collection as a place to collect all their datasets. (Alternatively, they could just deposit datasets into the repository without creating a Dataverse collection to contain them. There is no requirement to create a Dataverse collection.)
- If a researcher does create a Dataverse collection, they become the administrators of that Dataverse collection, with the ability to change to customize the logo and description of that Dataverse collection, control access restrictions to the Dataverse collection, create templates for datasets deposited there, and create guestbooks to track who downloads their datasets.
Who is behind this project?
- The Texas Data Repository is hosted and maintained by the Texas Digital Library, a consortium of academic libraries in Texas with a proven history of providing shared technology services that support secure, reliable access to digital collections of research and scholarship.
What is the Texas Digital Library?
- The Texas Digital Library (TDL) is a consortium of higher education institutions in Texas that provides shared services in support of research, teaching, and the advancement of scholarship. TDL provides a range of services and engagement opportunities to members including shared software hosting, technical support, professional development opportunities, and collaborative engagement.
How do I know if my institution is participating in TDR?
- Institutional members of the Texas Digital Library may opt in to the Texas Data Repository service and offer it to researchers at their university. Members listed on the Help page of the Texas Data Repository have opted into the service. If your institution is not listed, contact the Texas Digital Library or the library on your campus to inquire about participation.
- For a list of Texas Digital Library member institutions, visit http://tdl.org/members.
How do I get started?
- Log in to https://dataverse.tdl.org using your institutional username and password. You’ll see a drop down menu where you can select your institution’s name. Then just use your regular institutional credentials to log in.
- Optionally, create a new Dataverse collection for collecting all your datasets in a single location. A Dataverse collection is simply a container for collecting multiple datasets or studies.
- Add a dataset, which can include multiple files such as raw data files and supplementary materials.
- To help others discover and understand the data, provide some information about what you’ve uploaded in the forms provided.
- Hit publish! Or, if you would like to restrict access to the data, you can share only with a select few.
- Go to the TDR User Guide for more detail.
Why should I deposit data in the Texas Data Repository?
There are a number of reasons why you might want to deposit data, including?
- To comply with funding requirements. The Texas Data Repository can help you comply with funder mandates data archiving and sharing, and gives you resources for developing data management plans and grant applications.
- To ensure reliable, managed access for data. The Texas Data Repository gives you a convenient and reliable place to collect and share your data. And by depositing data there, you benefit from the TDL’s focus on long-term access and preservation of your content.
- To increase scholarly impact. By publishing your data in the Texas Data Repository, you give your data a DOI, making it easy for others to cite reliably.
- To collaborate with research teams. Some situations may necessitate restricting access to data, at least for a period of time. The Texas Data Repository allows you to share your data with a select group of colleagues, version your data, and publish it when you’re ready.
- To have access to local support through your institution’s library. Along with robust technical support from the Texas Digital Library, you can rely on trained librarians at your home institution to assist with multiple phases of the research cycle, including data management planning, preparation for data publishing, and long-term curation.
Policies and Data Basics
What kind of data can I deposit?
- Researchers can deposit a wide variety of data and related electronic materials to the Texas Data Repository, including spreadsheets, sensor and instrument data, surveys, GIS data, and imagery, along with associated material such as codebooks or data dictionaries. Any individual file uploaded to the repository must be under 4GB, though any uploads over 2GB, and some below that threshold, may be slow or stall due to variables outside of TDL's control. Please email firstname.lastname@example.org if you having trouble uploading files. If you have files over 4GB, we will consider support options on a case by case basis and in consultation with your Institutional TDR liaison.
- The Texas Data Repository encourages data deposit from all disciplines and can accept any type of data file, though it is advisable to provide data in non-proprietary formats in order to ensure broader use for researchers with access to different analytic software.
- The Texas Data Repository does NOT accept content that contains confidential or sensitive information, and requires that contributors remove, replace, or redact such information from datasets prior to upload. Confidential or sensitive information refers to all identifiable information, such that re-identification of any subjects from the amalgamation of the information available from all of the materials is possible and can include: social security numbers; credit card numbers; medical record numbers; health plan numbers; other account numbers of individuals; or biometric identifiers (fingerprints, retina, voice print, DNA, etc.).
What format should I use to provide access to my data?
- Researchers can upload any type of data file through the Dataverse softwar. Additional features and support for certain types of data files exist, such as:
- Tabular data (CSV, RData, DTA) - more info here
- GIS Shape data (DBS, PRJ, SBN, SBX, SHP.xml, SHP.EA.ISO.xml, SHP.iso.xml, SHX) - more info here
- Astronomy data: FITS
- Whenever possible, it is advisable to provide data in non-proprietary formats (e.g., CSV or XML) in order to ensure broader use for researchers with access to different analytic software. Note that data must be provided in an electronic format. Additional information on best practices for file formats is available here.
How large can datasets be?
- The Texas Data Repository is intended for small data. Any individual file uploaded to the repository must be under 4GB, though any uploads over 2GB, and some below that threshold, may be slow or stall due to variables outside of TDL's control. Using a wired connection for uploads over 2GB is recommended. Full datasets (which may include multiple files) should be no more than 10GB. Please email email@example.com if you having trouble uploading files.
- If you have files over 4GB, we will consider support options on a case by case basis and in consultation with your Institutional TDR liaison.
What are the options for access to my data and licensing?
What will it cost me to deposit data?
- Currently, there is no charge to researchers from participating institutions who deposit their data in the Texas Data Repository, within the volume limits explained above (4GB per file, 10GB total per dataset).
- The Texas Digital Library through funding from its member institutions maintains and supports the Texas Data Repository.
I have data that contains identifiable information about human subjects. Can I add it to the repository?
- No. Users of the repository must NOT add data to the repository that contains confidential or sensitive information.
- Confidential or sensitive information refers to all identifiable information, such that re-identification of any subjects from the amalgamation of the information available from all of the materials is possible and can include: social security numbers; credit card numbers; medical record numbers; health plan numbers; other account numbers of individuals; or biometric identifiers (fingerprints, retina, voice print, DNA, etc.).
Sharing and Publishing Data
How can I share my data with colleagues without making the data public?
- There are several ways to share data you deposit in the TDR without publishing the data. Go to the User Guide Sections 3.2 and 3.4 for more detail.
- You can provide access to restricted files with anyone (whether they have a TDR user account or not) by creating a private URL.
- You can include others with TDR user accounts to collaborate in the management of datasets, files, and Dataverse collections. More about sharing data.
What if someone I want to share my data with someone who doesn’t have a user account in TDR?
- If you just want to provide “read” access to restricted files, you can create and share a private URL that anyone (whether they have a user account or not) can use to access the data. See Section 3.4 of the User Guide for instructions on sharing via a private URL.
- Other types of sharing and collaborating require that those you are sharing with have user accounts in the TDR. See Section 3.5 of the user guide for instructions on inviting them to create user accounts.