Nanobioinformatics has been largely recognized as an essential element of our nation’s competitiveness in nanotechnology and a rational approach to employ weight-of-the-evidence strategies that ensure its safe development according to the National Nanotechnology Initiative, 2006. The ability to manipulate matter at the atomic scale will enable a broad range of beneficial applications in the electronics, healthcare (e.g. nanomedicine, imaging, and diagnostics), cosmetics, technologies and engineering industries. Pertinent to the development of promising biomedical nanotechnologies and to the safety of nanoscale materials in general, is a thorough understanding of nanomaterial-biological interactions. However, a rational approach must be employed early on in nanotechnology evolution to direct the safe development of novel nanotechnologies and provide accurate predictions of nanomaterial-biological interactions based on weight-of-the-evidence \[2\]. This inevitably will require data mining and computer simulation for visualization of the important parameters in an almost infinite set of data from global research efforts in nanoscience and nanotechnology \[3\]. To date, the lack of standardization has been one of the most significant barriers to data sharing.
The nano-TAB specification is intended to facilitate the submission and exchange of nanomaterial descriptions and characterization data (metadata and summary data) along with the other files (raw/derived data files, image files, protocol documents, etc.) among individual researchers and to/from nanotechnology resources like the NCI’s cancer Nanotechnology Laboratory (caNanoLab) portal \[4\] and the Nanomaterial-Biological Interactions (NBI) knowledgebase \[5\]. Nano-TAB also serves to empower organizations to adopt standard methods for representing data in nanotechnology publications; and to provide researchers with guidelines for representing nanomaterials and characterizations to achieve cross-material comparison.
The nano-TAB project is an effort of the National Cancer Institute (NCI) Cancer Biomedical Informatics Grid (caBIG^®^) Nanotechnology Informatics Working Group (Nano WG). Its proper use as a standard requires familiarity with other components of the caBIG complement of informatics tools that are all designed to support the meaningful exchange of data across the nanotechnology community. In Section X, the major components of caBIG are described and adjustments to the existing elements of ISA-TAB are given in Section X.
The nano-TAB format specification is based on an existing specification developed by the European Bioinformatics Institute (EBI), namely, the investigation/study/assay (ISA-TAB) format specification. The ISA-TAB format is used by the ‘omics’ (proteomics, genomics, metabolomics, and transcriptomics) communities to share data and metadata associated with different assays and technology types in their experiments. The ISA-TAB file structure relies on three primary files---investigation, study, and assay (ISA) files. Raw/derived data files and any other files (e.g., image files, protocol documents) specific to each assay are shared along with the three primary ISA-TAB files if the data files are referenced in the primary ISA-TAB files. ISA-TAB does not provide format specification for files other than the investigation, study, and assay files. The ISA-TAB investigation file is used for three purposes: (1) to record all declarative information referenced in other files; (2) to relate assay files to study files; and (3) to group multiple study files that are part of the same investigation. The ISA-TAB study file is used to record information about the source, sampling methodology, treatment, preparation, and characteristics of the subjects (biospecimens) studied using one or more assays under an investigation.
The caBIG® (cancer Bioinformatics Grid) LS DAM \[6\] provides a shared view of the semantics of the life sciences domains that are represented by the different workspaces in the caBIG infrastructure. It has a nanotechnology subdomain, which was developed based on caNanoLab object model and NPO terms. LS DAM makes a distinction between biospecimens (for example, cell line, tissue samples, body fluid samples, organ parts) and materials that are not derived from a cell, tissue, organ, or body (for example, nanoparticle formulations, drug formulations, solvent, and so forth). This motivated the use of the term “material sample” in the nano-TAB material file. Weekly Nano WG web-conferencing was used to ensure the alignment of nano-TAB with the LS DAM.
Like ISA-TAB, nano-TAB provides fields for entering and referencing terms selected from ontologies and standard terminologies. The ontologies are available at BioPortal (http://www.bioontology.org), which is maintained by the National Center for Biomedical Ontologies. Though the investigator may use alternative ontology and vocabulary sources, the ability to evaluate and share data require that all parties have access to those being used (they should be available to the investigators). All terms and fields used in this standard utilize the NCI EVS and NanoParticle Ontology elements.
NanoParticle Ontology (NPO) \[7\] is an ontology that is designed and developed within the framework of the Basic Formal Ontology (BFO) \[8\] and implemented in the ontology web language (OWL) \[9\]. It is being developed to represent the knowledge underlying the description, preparation, and characterization of nanomaterials. NPO development began with the representation of knowledge underlying the chemical composition, preparation, physiochemical, and functional/biological characterization of nanoparticles that are formulated and tested for applications in cancer diagnostics and therapeutics. The NPO provided the knowledge framework for developing the nano-TAB material file format. The NPO provides a subset of the terms and relationships for the description and characterization of nanomaterials in the nano-TAB file format. The NPO is being further developed for the following purposes: (1) to provide terms for annotating nanotechnology research data; (2) to provide the knowledge framework required for developing data-sharing models and standards in nanomedicine; (3) to enable semantic integration of data; (4) to enable unambiguous interpretation of the description and characterization of nanomaterials; and (5) to enable knowledge-based searching and comparison of nanomaterial descriptions and characterization results.
Nano-TAB extension to ISA-TAB--- While nano-TAB leverages the three primary ISA-TAB files, it extends ISA-TAB by providing specification for a fourth file (called the material file) for representing the composition and characteristics of nanoparticle formulations and small molecules. Raw/ derived data files and any other files (e.g., image files, protocol documents) specific to each assay have to be shared along with the four primary nano-TAB files. Nano-TAB does not provide any specification for how to format files other than the four primary files: investigation, study, assay and material files. Although nano-TAB adopts ISA-TAB field names and their definition in the investigation, study, and assay files, some of the definitions are modified and additional fields are introduced. These modifications and extensions are required to expand the scope of information captured from nanotechnology data sets into the nano-TAB files.
In nanotechnology, samples from biological and non-biological sources can be the primary subjects of a study. Therefore, in nano-TAB, samples derived from biological sources are called biological specimens or biospecimens (e.g., cell line, body fluids, organs, etc.). Whereas, samples derived from non-biological sources are simply called material samples (e.g., nanomaterials, nanoparticle formulations, small molecules). For physico-chemical characterizations of nanomaterials, the sample is the nanomaterial. For in-vitro and in-vivo characterizations, the sample is the biological specimen (cell line, animal, and so forth). Hence, in nano-TAB, the concept of a sample (as used in ISA-TAB specification) is redefined to include both biological specimens and material samples. The ISA-TAB study file can only be used to record the source and characteristics of biospecimens studied in an assay, and cannot support the representation of materials. Therefore, in nano-TAB, the material file is used to describe material samples, while the study file is used to describe biospecimens.
ISA-TAB specifies that the names of the primary files end with .txt extensions. Nano-TAB file names may end in either .txt or .xls extensions. The nano-TAB files used as examples in this document were prepared in excel spreadsheets, and so their filenames have the .xls extension.
Nano-TAB uses the three primary files of ISA-TAB-- investigation file, study file, and assay file; and, introduces a fourth file called the material file (FIG 1). Other files such as raw/derived data files, image files, protocol documents, etc., referenced in the nano-TAB files have to be shared along with the nano-TAB files.
When sharing primary nano-TAB files, other files referenced in these files have to be shared along with the primary files. It is anticipated that content management systems will become available to facilitate the sharing and exchange of files. Until then, these files could be bundled together in a folder and shared as a zip file.
FIG 1. Nano-TAB File Structure
In FIG 2, the nano-TAB file development process is described. Typically, the investigation file is developed first and describes the overall investigation, associated studies and assays. The investigation file is a text file with a naming convention of “i_xxx.txt” or “i_xxx.xls,” in which xxx can be any name provided by the investigator. Once the investigation file has been completed, one or more study files (following the convention “s_xxx.txt” or “s_xxx.xls”) can be created. Similarly, one or more material files can be created. The material file describes the nanomaterial (or small molecule) and its components including structural information and follows the naming convention “m_xxx.txt” or “m_xxx.xls”. Assay files (following the convention “a_xxx.txt” or “a_xxx.xls”) are created for all assays performed. Each assay is defined by the endpoint measured and the technique used to measure that endpoint. Data files (raw or derived) specific to each type of assay can be associated to the respective assay files by referencing the names of the data files in the assay files.
FIG 2. Nano-TAB File Development Process
Once the nano-TAB files have been created, the files can be validated and submitted into nanotechnology resources that support the nano-TAB specification. It is anticipated that validation of the files may occur via a validation service that leverages a modified version of the ISA-TAB validator \[10\]. It is also anticipated that nanotechnology resources like caNanoLab, the Nanomaterial-Biological Interactions (NBI) knowledgebase ([http://nbi.oregonstate.edu/]), and other resources will provide facilities for importing/exporting nano-TAB files as the nano-TAB specification evolves.