This page describes how to register multiple files. For prerequisites when using Globus with DME, refer to Preparing to Use Globus. If you want to provide metadata for each object (data file) or collection, also refer to Preparing a Metadata File for Bulk Registration.
For a video of this procedure, refer to Video of Registering in Bulk from Globus via the GUI or Video of Registering in Bulk from S3 via the GUI.
To register data files:
- Log in as described in Logging In via the GUI. The Dashboard appears.
- Select a method:
- If you are not familiar with the data destination, browse for it, as described in Browsing for Data via the GUI. Navigate to and right-click the collection where you want to register your data files. Click Add Bulk.
- If you are familiar with the data destination, select this method: Click Register tab > Bulk.
Specify the data source:
To register data from Globus:
Select Globus and click Select Data from Globus Endpoint. A Globus page appears.
In Globus, select an endpoint.
- Select the files or folders that you want to register into DME.
- Click Submit. The Register Bulk Data page reappears, with information about your Globus selection at the bottom of the Data Source section.
- To register data from S3, select AWS S3:
- Specify the name of the source S3 bucket.
- Specify the path to and the name of the file or folder in the source bucket.
- Specify whether you are registering a file or a folder.
- Specify the AWS access key.
- Specify the AWS secret access key.
- Specify the region
- To register data from Google Drive:
- Select Google Drive.
- Click Authorize DME to Access Your Google Drive. A Google page appears. Follow the prompts. The DME Register Bulk Data page reappears, indicating the successful generation of an access token.
- Click Select Data from Google Drive. The Select Files or Folder dialog box appears.
- Navigate to and select the folder in which you want to transfer the data. Click Select. The DME Register Bulk Data page reappears, with information about your Google Drive selection at the bottom of the Data Source section.
Scroll down to the remaining data source fields.
Finish specifying the data source:
Field Instructions Include Criteria Use patterns to specify the source files to include. If you specify more than one pattern, the system considers a union of all patterns. For details, refer to Specifying Include Criteria. Exclude Criteria Use patterns to specify the source files to exclude. If you specify more than one pattern, the system considers a union of all patterns. For details, refer to Specifying Include Criteria. Criteria Type Specify the type of patterns in your criteria:
- Simple: For details, refer to Specifying Include Criteria.
- RegEx: For details, refer to the following page: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
Dry Run If you want to preview a list of the source file(s)/folder(s) that the system would register based on what you specified in the other Data Source fields, select this option. (This option tests only the Data Source entries. It does not test the Data Destination entries.) Bulk Metadata File If you want to provide required metadata for each object or collection, click Choose File. Navigate to and select the prepared metadata file. (For details, refer to Preparing a Metadata File for Bulk Registration.) Scroll down to the remaining portions of the Register Bulk Data page. If you browsed to the data destination, that portion of the page has only the Collection Path field, with the path already specified.
If necessary, specify the data destination for the parent collection (the collection that will contain all of the new data):
If the Base Path field is available, select the base path specified by your group administrator. An information icon () appears next to the Base Path field and the system begins to populate values in the Collection Type field and the Collection Path field.
Consider examining the valid hierarchies for the selected base path. To do so, click the information icon next to the Base Path field. A Valid Hierarchies chart appears indicating the collection type or types allowed within each collection type. In the chart, a check mark indicates each collection in which you can register data files.
To close the Valid Hierarchies chart, click elsewhere on the page.
- If the Collection Type field is available, and if there is more than one collection type, select the one in which you want to register data. For guidance on selecting a collection type, refer to your group administrator. For some collection types, the system displays a list of required metadata attributes.
If necessary, append a forward slash (/) and a collection name to the system-generated collection path. Avoid using invalid characters such as the space character, question mark (?), semicolon (;), backslash (\), or double quote ("). Consider the following example:
Collection Path Example System-generated /SAMPLE_Archive User-modified /SAMPLE_Archive/Sample_Collection_Name The last collection in the path can be new or existing.
- Specify the metadata for the parent collection. The system applies this metadata to the entire collection, not to individual files:
- To add a metadata attribute:
- Click Add Metadata, visible on the right or left side of the page. A blank attribute row appears.
- Specify a unique attribute name.
If you change your mind about adding an attribute, leave the name and value fields blank for that attribute. The system ignores a blank row. If you proceed to update the collection with a new attribute, the attribute name is permanent.
- Click Add Metadata, visible on the right or left side of the page. A blank attribute row appears.
- In each attribute row, specify a unique value that describes the content you are registering. The character limit for each metadata value is 2700.
Example Attribute Example Value data_owner Jane Doe project_id 1234567890 sample_name L1 project_start_date 2020-12-31 For some date attributes, such as project_start_date, the system expects the "yyyy-MM-dd" format, as in the above example.
Click Register. The system checks whether it can access the objects and collections you have specified, using the data you have entered:
- If not, the system displays an error message.
- If so, the system responds based on your selections and displays a message at the top of the Register Bulk Data page:
- If you selected the Dry Run option, the system displays a list of the source file(s)/folder(s) that it would have registered based on what you specified in the other Data Source fields. If necessary, revise your entries and click Register again until you are satisfied with the dry run list. When you are ready to perform the registration, clear the Dry Run option and click Register again.
- Otherwise, the system displays a message with the task ID of the registration request.
- When the system displays the task ID, consider visiting the Data Registration Task Details page to view the progress of the registration. If you provided a metadata file, this page indicates any difficulty processing that metadata. For instructions, refer to Viewing Registration Status.