NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: For HPCDATAMGM-1940: Changed 32 to 5.
Excerpt
hiddentrue

dm_register_directory

Note

This page is a work in progress.

If your user account has the Write or Own permission level on an existing collection in DME, and if that existing collection has been configured to contain another collection, you can copy data recursively register data (copy files within folders) from your file local system to into that existing collection.  The following command recursively copies By default, the dm_register_directory command registers all files and directories from in the directory specified in the <source-path> to the <destination-path> in DME. source directory. 

The character limit for each metadata value is 2700.

Panel
borderColorsilver
borderStylesolid
titleContents of This Page

Table of Contents

Registering Directory Contents

  1. Panel
    borderColor#C0C0C0
    borderStylesolid
    Expand
    titlePrepare to run the command.
    1. Make sure that the destination collection (where you want to register your data) already exists. For instructions, refer to the following pages:
    2. Optional: To register a subset of the files in the source directory, prepare to use one of the following options:
      • You can create a flat list of data files. In each line of this file, specify a relative path to the source path. Plan to use the -l option in the command.
      • You can use regex
    In your file system, create any files required for the command that you intend to run:If you want to use regular expression
      • patterns to exclude files from registration,
  2. If you want to use regular expression patterns to include files in registration, 
  3. If you want to register specific files and exclude all others,
  4. Provide metadata for each directory and file that you intend to register. 

      • include files, or both.   
        • To include files in registration, create an include file and plan to use the -i option in the command.
        • To exclude files from registration, create an exclude file and plan to use the -e option in the command. 

        For information, refer to Specifying Include Criteria. If you proceed with a command that specifies both include and exclude files, the system first identifies files that match any of the include regex patterns. Then the system removes from the registration job the files that match any of the exclude patterns.

    1. Specify the metadata required for the collections and data files you intend to create in DME. For each directory and each data file that you intend to register, create a metadata file with the name <[directory,filename]>.metadata.json. Create each metadata file in the same path as its corresponding directory or data file. For example, for a file named sample.txt, name the corresponding metadata file sample.txt.metadata.json. (Keep in mind that the system registers the metadata files along with the corresponding data, unless you specify otherwise with a flat list or regex patterns.) In each metadata file, specify the metadata that you want to include, as follows:

      Code Block
      { 
          "metadataEntries": [
            {
              "attribute": "description",
              "value": "my-dataObject-description"
            },
            {
              "attribute": "example_date",
              "value": "20201231",
              "dateFormat": "yyyyMMdd"
            }
          ]
      }
    2. For each date attribute, specify one of the following date formats, and specify the date value in that format:

      • yyyyMMdd
      • yyyy.MM.dd
      • yyyy-MM-dd
      • yyyy/MM/dd
      • MM/dd/yyyy
      • MM-dd-yyyy
      • MM.dd.yyyy

      The system parses your date using the date format you specify. Then however, if the date attribute has a metadata validation rule in a different format, the system stores the date in the format specified by that rule.

    3. Optional: If you want to create a parent collection for the directory contents (or update the metadata of an existing parent collection), also specify in your JSON file the metadata for the parent collection. To view the syntax, click the following link:

      Panel
      borderColor#C0C0C0
      borderStylesolid
      Expand
      titleSyntax
      Code Block
      {
          "metadataEntries": [
              {
               "attribute": "description",
               "value": "description"
              },
              {
               "attribute": "example_date",
               "value": "20201231",
               "dateFormat": "yyyyMMdd"
              }
          ],
          "createParentCollections": true,
      	"parentCollectionsBulkMetadataEntries": {
      		"pathsMetadataEntries": [{
      			"path": "/Example_Archive/PI_Lab1/Project_New",
      			"pathMetadataEntries": [{
      					"attribute": "collection_type",
      					"value": "Folder"
      				},
      				{
      					"attribute": "example info",
      					"value": "123456"
      				}]
      		}]
      	}
      }
  5. Run the following command:

    Panel
    borderColorsilver
    borderStylesolid
    Clipboard
    AllowLineWraptrue

    dm_register_directory [optional parameters] <source-path>

    Run the following command:

    Code Block
    dm_register_directory [OPTIONS] <source-path>

    <destination-path>


    The following table describes each option:

    option option[EXCLUDE_FILE_PATH] EXCLUDE_FILE_PATH[INCLUDE_FILE_PATH] INCLUDE_FILE_PATH[FILES_LIST] the FILE_LIST the FILES_LIST . Register all collections before using this option[NUM_THREADS] upload 32 copying Cleversafe transfers
    OptionDescription
    -a[ARCHIVE_TYPE]c

    If you want to

    specify the type of the archive, specify one of the following valid values:
    • S3
    • POSIX

    The default is POSIX.

    -c

    turn on checksum calculation for validation of data transfer, specify this option.

    Include Page
    shared info - checksum option
    shared info - checksum option

    • If the collection is 50 MB or larger, the system saves the computed checksum as user metadata (source_checksum). This checksum metadata contains the eTag. 
    • If the collection is smaller than that threshold, the system uses the computed checksum to perform checksum verification upon uploading the file. The system also saves the checksum as system metadata (checksum)
    If you want to turn off the checksum calculation, specify this option. By default, the system performs this calculation as a validation procedure. However, this validation can increase the time required to run the command, depending on the file size
    • .
    -d

    Include Page
    shared info - dry run

    CLU
    shared info - dry run

    CLU

    -e <path-to-exclude-file>If you want to exclude files, specify this option. The system excludes the files that match the patterns specified in the file <path-to-exclude-file>.
    -hIf you want to print a usage (help) message for this command, specify this option.
    -i <path-to-include-file>If you want to include files, specify this option. The system includes the files that match at least one of the patterns specified in <path-to-include-file>.
    -l <files-list>If you want to register specific files and exclude all others, specify this option. The system registers the files mentioned in the <files-list> only. In each line of the <files-list> file, specify a relative path to the <source-path>.
    -mIf you want to register metadata only (for files that already exist), specify this option. The system does not register files.
    -sIf you want to skip the default confirmation prompt and register directly, use this option.
    -t <num-threads>

    If you want to use this parameter, contact NCIDataVault@mail.nih.gov for guidance.

    Show If
    groupGP-CFW-DMEDOC-DEV, GP-CFW_ADMINS
    Panel
    borderColorsilver
    borderStylesolid
    titleVisible to Internal Users Only

    If you want to process the files more efficiently with a smoother

    registration, consider specifying the number of threads (from 1 to

    5) that you want the system to use while

    registering files to

    DME. Your network and the server directly impact the way the system

    registers files. You can set the number of threads so the system has enough memory while processing massive files.

    • For faster processing of your command, use more threads.
    • For faster processing of the system as a whole, use fewer threads.
    By default, the dm_register_directory command copies all files and directories in the source directory. To register a subset of the files in the source directory, use the -e option, the -i option, or both. Use regular expression patterns in the files that you specify with those options. For information, refer to Specifying Criteria for Bulk Upload. The system first identifies files that match any of the include regular expression patterns. Then the system removes from the registration job the files that match any of the exclude patterns. 

    The default is one thread.

    -xIf you want to extract metadata from the header of TIFF or BMP image files, use this option.

    The following table describes each parameter:

    ParameterDescription
    <source-path>
    A path in your file local system, specifying the data that you want to copyregister. The system copies registers the contents of the folder you specify.
    <destination-path>
    A path within DME (POSIX) or S3, specifying where you want the system to create the data.

    Metadata can be provided for each directory and file in a specific metadata file called <[directory,filename]>.metadata.json. The metadata file should exist in the same path in its corresponding directory/file. For example, the metadata for the file sample.txt should be sample.txt.metadata.json. In the metadata file, specify the metadata that you want to upload, as follows:

    Code Block
    { 
        "metadataEntries": [
          {
            "attribute": "description",
            "value": "my-dataObject-description"
          },
          {
            "attribute": "my-second-attribute-name",
            "value": "my-second-attribute-value"
          }
        ]
    }

...

  1. If the system prompts you to confirm the registration, type Y and press Enter. The command recursively registers files and directories from the directory specified in the <source-path> to the <destination-path>. 

Verifying Uploaded Files

To verify uploaded files, use the dm_get_dataobject command. The system-generated metadata data_transfer_status displays ARCHIVED for files successfully uploaded to DME. For details, refer to Retrieving the Metadata of a Data File via the CLU.

Example

To view an example, click the following link:

Panel
borderColor#C0C0C0
borderStylesolid
Expand
titleExample

The following command copies the contents (folders and files) of the JaneDoe folder in the

...

code

local system to the Project_New collection in DME. (The command does not copy the JaneDoe folder itself.) 

Panel
borderColorsilver
borderStylesolid
Clipboard
AllowLineWraptrue

dm_register_directory

/cygdrive/c/Users/JaneDoe

-i include.txt -e exclude.txt /NCI/JaneDoe /Example_Archive/PI_Lab1/Project_New

For instructions on performing similar tasks in the GUI, refer to the following pages:

The example command specifies the following include file:

Code Block
**/images/**
**/data/**

All of the paths in the include file are relative to the source folder, as specified in the command. This file includes the following files:

  • All files in an images folder in the source folder (and subfolders).
  • All files in a data folder in the source folder (and subfolders). 

The example command specifies the following exclude file:

Code Block
**.metadata.json
**/log/**
*.properties

All of the paths in the exclude file are relative to the source folder, as specified in the command. This file excludes the following files:

  • All files in the source folder (and subfolders) with filenames that end in .metadata.json.
  • All files in any log folder in the source folder (and subfolders). 
  • All files in the source folder with filenames that end in .properties.  

...