Automated Archival of Data to DME

Below are two approaches available to perform automated archival of data to DME.

Command-Line Utilities (CLU)

The DME command-line utilities (CLU) provide shell commands for programmatic access from bioinformatics pipe-lines and workflows. Users looking into registering their data via the CLU can integrate the upload process into their workflow. The metadata is supplied in JSON files that are uploaded through the CLU commands. The CLU package can be obtained from the GitHub repository. The CLU commands needs to be installed and run on the server where is the data is located or mounted. The commands available are described here. Recommended commands for registration are dm_register_dataobject_multipart (for single file) and dm_register_directory (for bulk uploads).

Automated Archival Workflow

The DME Archival workflow was developed to support users requiring recurring bulk uploads. It enables fully automated archival of datasets on a pre-configured schedule. The source directories specified by the user are scanned to locate the files to be archived. Fault tolerance and multi-threading capabilities are built-in to achieve reliability and high throughput. The metadata is extracted from metadata input files based on the rules configured in a customer user module. Supported input file formats are JSON, XML, and CSV/Excel. The workflow can be customized using flexible configuration options available. These options include the source path where the data shall be picked up from, whether any pre-processing is required such as tarring the folder, whether any patterns should be applied to include and exclude some files/folders, and whether to look for a specific file to indicate it is ready to be picked up. The detail configuration options are described here.

Content

Space Tools

Command-Line Utilities (CLU)

Automated Archival Workflow