NIH | National Cancer Institute | NCI Wiki  

Why should I use DME?

DME is a metadata-based data management platform that provides secure, virtualized, searchable storage. Its reliable storage mechanism and ease of accessing and sharing large datasets eliminate the need for users to maintain copies of datasets in their local or network-attached storage. DME associates user provided metadata with the datasets, which makes searching and retrieving data very easy. 


What type of data can I store in DME?

DME accepts high-value scientific datasets that must be retained over a long period.


How much data can I store in DME?

You can store any amount of data if it is a high-value scientific dataset accompanied by provenance and domain metadata. For storage planning and procurement purposes, we require an estimate of your anticipated data growth and the duration you need to retain the data.


How is my data secured?

DME employs geographical redundancy to protect data against accidental backend deletions. There is no direct access to the backend storage or the metadata database.

Login to the system is through the NIH AD account, and the user must have an active DME account. All users must be explicitly permissioned to view any data or metadata in DME. Only the data owner or group administrators authorized by the data owner can provide access to the data.


How will I be billed for the service?

There is no charge for the storage or service.


How is data organized in DME?

We provide domain-specific templates to help you define your data hierarchy. These are fully customizable and can be tailored according to your unique data management needs. To obtain the maximum value out of this platform, we recommend choosing a data hierarchy and metadata structure that aligns with how you need to search and locate your data in DME.


What type of metadata does DME expect with the data?

DME requires minimal provenance metadata to establish the data's traceability. We recommend also adding domain-specific metadata that identifies the data's characteristics and specifying the processes, platforms, and instruments used to generate it.


What interfaces does DME provide for managing the data? 

DME provides a web-based graphical user interface for uploading and browsing the datasets and performing metadata-based searches. You can easily download the data to multiple endpoints such as Globus endpoint, AWS S3 bucket, Google Cloud, and Google Drive. REST APIs and command-line utilities are available for programmatic integration with bioinformatics pipelines and workflows.


How do I upload my data to DME?

You can upload data using the web application, DME REST APIs, or command-line utilities. Additionally, the DME team operates an archival workflow service that can scan your source directories on a defined schedule or as needed and then copy the data and metadata to DME.


How do I supply metadata to DME?

The REST APIs and command-line utilities accept the metadata in JSON files. The web application accepts metadata for single collections and files through forms and for bulk data through spreadsheets. All mandatory metadata must be supplied along with the data. You can add optional metadata at any time.

If you have requested the DME archival workflow service, then the workflow will upload all your data and metadata.


How long will my data remain in DME?

DME provides initial storage for seven years, which can be extended depending on your needs or federal requirements such as FDA mandates. We do not delete any data from DME without the documented consent of the data owner, irrespective of how long the data has been there.


How can I share my data in DME with external collaborators?

DME provides multiple options for sharing data with external collaborators. You can transfer the data to the collaborator's Globus endpoint, AWS bucket, Google Cloud Platform, or Google Drive.


Can I download my data in DME to other servers and platforms in NCI and NIH?

Yes, you can download your data in DME to any server with a Globus endpoint that is accessible to you. The FRCE Batch cluster in Frederick, the Biowulf cluster, and the Helix system provide Globus endpoints for users. You can also download data to your local machine directly (for individual files) or using the free Globus Personal Connect software. You can access your datasets in DME from the NIH Integrated Data Analysis Platform (NIDAP). Using the DME web application, you can perform dbGaP submissions of your datasets residing in DME. 


Where can I find more information on DME?

More information on DME can be found in the DME User Guide. You can also reach out to ncidatavault@mail.nih.gov.


  • No labels