Batch Ingest (DSpace 6)

The batch process can be used to upload anything to DSpace and is particularly useful when you have more than just a handful of items. Though the batch load process saves a lot of time that you would otherwise spend manually uploading items one-by-one using the submission form in DSpace, it still requires time and attention to detail to prep the files correctly.

Prerequisites

User doing the batch upload must have Administrator permissions in DSpace.

All files for batch loading must use DSpace's Simple Archive Format (SAF). Install SAF Creator from the following link to create the SAF package: GitHub - jcreel/SAFCreator

Batch Load Process

Prepping Metadata File

Make a CSV with column headers that match the corresponding metadata fields in DSpace. This file MUST be a CSV and not an Excel file. If you try to run the batch upload with an Excel file, it will not work. At minimum the metadata fields should include:

  • dc.title

  • dc.date.issued.

Other commonly used metadata fields include:

  • dc.contributor.author

  • dc.contributor

  • dc.description

  • dc.description.abstract

  • dc.type

  • dc.language.iso

  • dc.relation.ispartofseries

  • dc.subject

If a field has multiple entries, separate them by using double pipes ||. For example, if an item has two authors or multiple subject headings, enter each of them separated by double pipes.

Column A in the metadata spreadsheet should always be ‘filename.’ This should correspond to the associated file that you will upload to DSpace. This column is what allows the SAF Creator to match the metadata to a specific file during the upload process.

Example of metadata headings with a sample item. Note the double pipes ( || ) in the dc.contibutor.author field to delineate multiple authors and the dc.subject field for multiple keywords.

Filename

dc.title

dc.contributor.author

dc.type

dc.date.issued

dc.language.isco

dc.description

dc.subject

Coolpic.tif

Rad Tree

Johnson, Emily || Salisbury, Shari

Image

2022

en_US

 

A picture of the coolest tree in the entire world

 Trees || Oaks || dendrology

Prepping and Saving Files

  • To make the batch process run smoothly, I recommend saving copies of all your files on OneDrive as usual while you are prepping the files. Once you are ready to run the SAF Creator, make a copy of the files on your Desktop as outlined below. This will make it easier to find the files when you open SAF Creator.

  • All files need to be in a single folder along with the metadata CSV file

  • Once you create a folder on your desktop with the upload files and the metadata CSV file, create another empty folder within it called “Batch.” You will need this later on once you run the SAF Creator.

  • The file names cannot have any spaces or funny characters, as this sometimes causes issues when you are uploading to DSpace. Keep the file names simple.

Running the SAF Creator and Creating SAF Package

Once the metadata spreadsheet is complete and your files are prepped, it’s time to run the SAF Creator.

  • On the Batch Details tab, select the appropriate files and folders for the items you are uploading

    • Select metadata CSV file = metadata CSV spreadsheet (should be located within the folder of all the items)

    • Select source files directory = “project” folder (the folder on your desktop that includes all of the items and the metadata CSV spreadsheet)

    • Select SAF output directory = empty Batch folder you created and placed in the project folder.

      • Once the program runs properly, it will place everything into the SAF format into this folder, that we will later upload to DSpace.

  • Once all 3 areas have been filled with the appropriate files/folders, select “Load specified batch now!”

  • Tab over to “Batch Verification”

  • Select “Verify Batch” and ensure you have no errors

    • If you get errors while verifying the batch, go back to your spreadsheet and ensure all spaces, punctuation, etc. is the same within all your items in the folder.

    • The program will usually give you an indication of where the error is so it should hopefully be easy to find. Once you correct the errors, try running SAF Creator again.

  • Tab back to “Batch Details” and select “Write SAF data now!”

    • Now you should see items in the empty Batch folder that you created previously

  • Zip the Batch folder in order to be able to properly upload it to DSpace

    • On a PC, right click the Batch folder and select “Send to,” and then select “Compressed (zipped) folder.” A new zipped folder with the same name will be created in the same location.

Upload SAF file to DSpace

Prerequisites: Operator must have admin privileges to bulk load records.

  • Go to your DSpace repository and login

  • Navigate to the collection you wish to bulk load the records into. 

    • If the collection does not exist create the collection

  • Click on the “Batch Import (Zip)” link. 

  • Select the collection you wish to import the records into (pull down window)

  • Click “Choose file” and navigate to the zipped file created in the previous set of instructions

  • Click “Upload SimpleArchiveFormat.zip” button.

  • If the upload was successful you should get the following page:

  • If you get an error message, it will usually provide you some clue as to what went wrong and you can make changes and try again.

Sources

Credits

These instructions were contributed by Emily Johnson (UT San Antonio) and adapted from documentation created by Taylor Fairweather Leitch (formerly at West Texas A&M University). The SAF Creator was developed by James Creel, Texas A&M University.