5.2.1.5.2 - Search Sept. 6, 2010

Search Functional Profile

Search, using different criteria.

The wealth of data must be accessible, resulting in the need for exploration of available datasets. This includes the ability to view seamlessly across independent data sets, allowing a secondary user to integrate data from multiple sources.

The semantic infrastructure will provide metadata for discovery of these datasets.

Provide an application developer with the ability to define application-specific attributes (for example, defined using ISO 21090 healthcare datatypes) and an information model that defines the relationships between these attributes and other attributes in the broader ecosystem. In particular, the last requirement suggests linked datasets, where application developers can connect data in disparate repositories as if the repositories are part of a larger federated data ecosystem.

This Functional Profile includes, but is not limited to, the following capability elaborations:

Derived From Requirements

Gap Analysis::CDISC::CDISC-18 - Use CDISC standards to query and aggregate data across organizations, data sets, time, and geographies The major goal is to be able to query across data sets, get to the instance data, aggregate it and perform a wide range of operations on this data. By multiple data sets CDISC means different clinical trials, different EHR systems, and obviously queries across clinical trials and EHR systems. The KR should be able to identify those data sets that are constructed using its information models and model elements. This includes any alignments between a local information models and (for example) a specific CDISC standard. This alignment would be an information model in its own right, part of the KR, and the basis for subsequent Searches (i.e., transforms are based on information models). The KR would point to local data sets that were instances of its approved information models. It would also store any official transforms (alignments) between these local data sets; these transforms each have an information model and point to the local date sets that these transforms were applicable to.
Semantic Infrastructure Requirements::caGRID 2.0 Platform and Terminology Integration::Data Exploration and Query The wealth of data must be accessible, resulting in the need for exploration of available datasets. This includes the ability to view seamlessly across independent data sets, allowing a secondary user to integrate data from multiple sources. In addition, the query capability must support sophisticated queries such as temporal queries and spatial queries. The semantic infrastructure will provide metadata for discovery of these datasets. Comples temporal and spatial queries will be informed by the metadata but will be formulated and executed by the platform. Link to use case satisfied from caGRID 2.0 Roadmap: The oncologist must be able to quickly find glioblastoma data sets, indicating the fields that he is interested in comparing from his clinical data in order to find similar disease conditions and associated treatment plans. Temporal queries allow clinicians to identify changes in patient condition and treatment over time.
Semantic Infrastructure Requirements::caGRID 2.0 Platform and Terminology Integration::Data Management Data management includes linking of disparate data sets and updates of data across the ecosystem. Data updates may include updates to multiple data sources, necessitating the need for transactions. Linkages between the different disparate data sets will be managed by the semantic infrastructure. Data updates that trigger transactions are captured by the platform and are propagated upstream to the semantic infrastructure. An example would be the platform monitoring events to identify changes to data. Link to use case satisfied from caGRID 2.0 Roadmap: the patient has an electronic medical record that spans multiple institutions. The clinical workup data (for example, genomics and proteomics data) is linked to the clinical care record; similarly pathology and radiology findings must be attached to the patient's electronic medical record.
Semantic Infrastructure Requirements::caGRID 2.0 Platform and Terminology Integration::Data Representation and Information Models This set of requirements includes providing an application developer with the ability to define application-specific attributes (for example, defined using ISO 21090 healthcare datatypes) and an information model that defines the relationships between these attributes and other attributes in the broader ecosystem. In particular, the last requirement suggests linked datasets, where application developers can connect data in disparate repositories as if the repositories are part of a larger federated data ecosystem. Additional requirements include the ability to publish and discover information models. Support is needed for forms data and common clinical document standards, such as HL7 CDA. To support the use of binary data throughout the system, the binary data must be typed and semantically annotated. All Information models, their representation and binding to data-types and terminologies will be managed by the semantic infrastructure. The ability to publish and discover information models will be supported by the semantic infrastructure, and the platform will leverage these capabilities. Link to use case satisfied from caGRID 2.0 Roadmap: The pathology, radiology and other data have various data formats which must be described, and the information model for the patient record must link between these various datatypes. The complete information model includes semantic links between datasets to build a comprehensive electronic medical record. Annotations on data are defined and included in the information model.

queryDataSetsTimeAndGeography

Query and aggregate data across organizations, data sets, time, and geographies.

Data management includes linking of disparate data sets and updates of data across the ecosystem. Data updates may include updates to multiple data sources, necessitating the need for transactions.

Linkages between the different disparate data sets will be managed by the semantic infrastructure. Data updates that trigger transactions are captured by the platform and are propagated upstream to the semantic infrastructure. An example would be the platform monitoring events to identify changes to data.

Link to use case satisfied from caGRID 2.0 Roadmap: the patient has an electronic medical record that spans multiple institutions. The clinical workup data (for example, genomics and proteomics data) is linked to the clinical care record; similarly pathology and radiology findings must be attached to the patient's electronic medical record.

linkedDataSetManagement

Manage linkages between data in disparate repositories.

Content

Space Tools

Derived From Requirements

queryDataSetsTimeAndGeography

linkedDataSetManagement