The following are links to some useful external materials
- Requirements for caBIG Infrastructure to Support Semantic Workflows
- Recommendations for caBIG to Support Semantic Workflows
- Use Cases for Semantic Workflows in caBIG
The following are high level use case statements related to these requirements
- Semantic Metadata
- Define semantic metadata for analytical service
- Define semantic metadata for scientific data
- Define semantic metadata for translation services
- Dynamic Workflows
- Define workflow constraints
- Desired output
- Desired input
- Data query parameters
- Analytical parameters
- Desired operations
- Computational constraints/requirements
- Time constraints/requirements
- Storage constraints/requirements
- Generate workflow
- Validate workflow
- Run workflow
- Track workflow
- Share workflow
- Share dynamic workflow (template/constraints)
- Version workflow (design, creation, evolution)
- Define workflow constraints
- Provenance Tracking
- Create intermediate data
- Fetch intermediate data
- Link data (process)
- Establish data ownership and security (attribution)
- Version data (republishing/updates)
The following are non-functional requirements that do not result in actor-oriented use cases
- Define a semantic workflow standard encoding (e.g. OWL-S, WSMO, SWSL, SWSF)
- Define a provenance standard encoding
The following are some basic discovery related use cases that pertain to the requirements
Discover data of interest
Use Case Number |
Init3dbw2.pm21.1 |
Brief Description |
Discover data of interest: A researcher wants to find data that has already been collected for use with caArray. They are able to find the data and to inspect the system to learn about what type of cells are in the database, what type of pathology is available for the data, etc. |
Actor(s) for this particular use case |
Cancer Researcher |
Pre-condition |
Data services exist and are accessible. |
Post condition |
Data of interest is discovered. |
Steps to take |
|
Alternate Flow |
None. |
Priority |
High. |
Associated Links |
|
Fit criterion/Acceptance Criterion |
None. |
Discover related data
Use Case Number |
Init3dbw2.pm21.2 |
Brief Description |
In some cases, two semantically equivalent data element can be annotated with different semantic concepts that may or may not themselves be related. In these cases, there needs to be a mechanism to define semantic equivalence between the data elements, the concepts, or expand/contract the scope of the semantic query in the case of related concepts. An example of this use case is that there needs to be a way to discover data elements both with StartDate and Begin+Date, e.g. through a semantic equivalence of the two or through a widening/narrowing query. |
Actor(s) for this particular use case |
Metadata Specialist, Cancer Researcher |
Pre-condition |
Two data element exist and are individually discoverable |
Post condition |
The two data elements are discovered as semantically equivalent |
Steps to take |
|
Alternate Flow |
If the two data elements are annotated with related concepts, the following alternate flow is possible:
|
Priority |
High. |
Associated Links |
None. |
Fit criterion/Acceptance Criterion |
None. |
Aggregate data
Use Case Number |
Init3dbw2.pm21.3 |
Brief Description |
Aggregate data of interest: A researcher is able to query the system to find data that can be combined with their data. It is able to compare the characteristics of the dataset to ensure that the data are combinable, for example . |
Actor(s) for this particular use case |
Cancer Researcher |
Pre-condition |
A number of datasets have been identified for aggregation. |
Post condition |
Combinable data has been aggregated. |
Steps to take |
|
Alternate Flow |
None. |
Priority |
Low. |
Associated Links |
|
Fit criterion/Acceptance Criterion |
None. |
The following use cases have direct overlap with these requirements but have been captured under Init1dbw6.pm8.U0 - Support caB2B to integrate services on caGrid
- Expose service workflow metadata
- Discovery of analytical steps
- Storage and access of intermediate data
- Workflow sharing
- Define a metadata category
- Discover services by metadata category
- Perform operations based on metadata category
- Discover Related Data based on metadata
- Query for Data based on permitted valid values
Expose service workflow metadata
Use Case Number |
Init1dbw6.pm8.U1 |
---|---|
Brief Description |
It is commonplace in bioinformatics to string together a number of data and analytical operations in order to produce the desired output. In order for Cancer Researchers to discover which services can be piped together, it is necessary that the designers of the services expose the appropriate metadata. |
Actor(s) for this particular use case |
Information Modeler |
Pre-condition |
A service exists that needs to be annotated. |
Post condition |
The service is annotated sufficiently to be discovered and integrated into an analytical pipeline. |
Steps to take |
|
Alternate Flow |
None. |
Priority |
High. |
Associated Links |
|
Fit criterion/Acceptance Criterion |
Sufficient metadata needs to be defined so that the service can be discovered and linked into a workflow. |
Discovery of analytical steps
Use Case Number |
Init1dbw6.pm8.U2 |
---|---|
Brief Description |
Once metadata about a service is defined and exposed, it must be queriable by users of the service. They must, through consuming of the metadata alone, be able to determine which services can act as consumers of the data the service produces, as well as produces of the data the service consumes. Furthermore, the user must be able to determine that the service is appropriately placed within the workflow. |
Actor(s) for this particular use case |
Cancer Researcher |
Pre-condition |
Service-level metadata is exposed for a number of services that can be linked via a workflow. |
Post condition |
The Cancer Researcher knows which services can act as inputs to which other services. |
Steps to take |
|
Alternate Flow |
The query could begin with a dataset or data service, and the Cancer Researcher would be identifying all downstream data and analytical services. |
Priority |
High. |
Associated Links |
|
Fit criterion/Acceptance Criterion |
The user must be able to identify services based on input/output types, as well as find the appropriate translation services if needed. |
Storage and access of intermediate data
Use Case Number |
Init1dbw6.pm8.U3 |
---|---|
Brief Description |
When services are chained together into bioinformatic pipelines, it is often desirable to be able to store and then later access intermediate results of queries and analytics. These can be used to modify the pipeline as needed, or to share intermediate results with other investigators. |
Actor(s) for this particular use case |
Cancer Researcher |
Pre-condition |
A service that produces data has been identified and is accessible, as well as the mechanism by which intermediate data will be stored. |
Post condition |
The results of the service are available via the intermediate data service. |
Steps to take |
|
Alternate Flow |
None. |
Priority |
Low. |
Associated Links |
|
Fit criterion/Acceptance Criterion |
Access to the intermediate data must be as seamless as access to any other service, and the data should be secured based on rules that the Cancer Researcher identifies. |
Workflow sharing
Use Case Number |
Init1dbw6.pm8.U4 |
---|---|
Brief Description |
Once a user identifies a service workflow of interest, he should be able to share that workflow in a way that makes it easy to encode, share with colleagues, reuse/rerun, modify, and extend. |
Actor(s) for this particular use case |
Cancer Researcher |
Pre-condition |
A set of services of interest has been identified. |
Post condition |
The service workflow is stored and shared. |
Steps to take |
|
Alternate Flow |
The steps listed above can be performed in any order any number of times with the exception that the workflow must be encoded and saved first. |
Priority |
Low. |
Associated Links |
|
Fit criterion/Acceptance Criterion |
The workflow must be accessible in much the same way as any other service. |
Define a metadata category
Use Case Number |
Init1dbw6.pm8.U5 |
---|---|
Brief Description |
The Metadata Category is the ability to save a particular view of classes and their associations in order to find services that match. For example, a Cancer Researcher may be interested in A->B->C and wants to be able to query services that support those classes and associations. |
Actor(s) for this particular use case |
Cancer Researcher |
Pre-condition |
None. |
Post condition |
A Metadata Category has been defined. |
Steps to take |
|
Alternate Flow |
The Cancer Researcher may want to load, update, delete, or share an existing Metadata Category. |
Priority |
Low. |
Associated Links |
|
Fit criterion/Acceptance Criterion |
None. |
Discover services by metadata category
Use Case Number |
Init1dbw6.pm8.U6 |
---|---|
Brief Description |
Once a Metadata Category is created, a Cancer Researcher can use it to discover services that support the underlying classes, attributes, and associations. |
Actor(s) for this particular use case |
Cancer Researcher |
Pre-condition |
A Metadata Category has been identified. |
Post condition |
A set of services of interest have been identified. |
Steps to take |
|
Alternate Flow |
None. |
Priority |
Low. |
Associated Links |
|
Fit criterion/Acceptance Criterion |
None. |
Perform operations based on metadata category
Use Case Number |
Init1dbw6.pm8.U7 |
---|---|
Brief Description |
Once a user has identified a set of services that support a Metadata Category, he can invoke operations across those services and aggregate the results based upon the classes, attributes, and associations within the Metadata Category. |
Actor(s) for this particular use case |
Cancer Researcher |
Pre-condition |
A set of services of interest has been identified via the Metadata Category. |
Post condition |
The results from the cross-service operation are aggregated and presented to the user. |
Steps to take |
|
Alternate Flow |
None. |
Priority |
Low. |
Associated Links |
|
Fit criterion/Acceptance Criterion |
None. |
Discover Related Data based on metadata
Use Case Number |
Init1dbw6.U8 |
---|---|
Brief Description |
caB2B relies on establishing/identifiying relationships between concepts (entities) and/or properties (attributes) exposed through a service's implementation models (currently registered in caDSR/advertised on caGrid). For instance, we need to be |
Actor(s) for this particular use case |
Cancer Researcher |
Pre-condition |
A particular concept of interest has been identified via the terminology browser or metadata browser. |
Post condition |
The results from the cross-model discovery operation are aggregated and presented to the user. |
Steps to take |
|
Alternate Flow |
None. |
Priority |
High. |
Associated Links |
|
Fit criterion/Acceptance Criterion |
Only Models where matching classes, attributes and associations are returned by the operation. |
Query for Data based on permitted valid values
Use Case Number |
Init1dbw6.U9 |
---|---|
Brief Description |
caB2B uses value sets (that bind to concepts) to filter/constrain |
Actor(s) for this particular use case |
Cancer Researcher |
Pre-condition |
The enumerations (value sets) or allowable ranges or units of measures for attributes/data elements/variables in a particular database are available to help form queries of the database. |
Post condition |
Data matching the selected enumerations (values) or ranges is retrieved via an SI operation. |
Steps to take |
|
Alternate Flow |
None. |
Priority |
High. |
Associated Links |
|
Fit criterion/Acceptance Criterion |
The researcher is able to discover what the possible or allowable values are for a data field and select/enter more values of interest, results are returned that match only the entered values. |