NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin
Scrollbar
iconsfalse

...

Page info

...

title
title

Federated discovery,

...

searching,

...

and

...

data

...

aggregation

...

  • User

...

  • can

...

  • query

...

  • for

...

  • all

...

  • pre-cancerous

...

  • biospecimens

...

  • from

...

  • caTissue

...

  • instances

...

  • like

...

  • those

...

  • at

...

  • Washington

...

  • University,

...

  • Thomas

...

  • Jefferson

...

  • University,

...

  • and

...

  • Holden

...

  • Comprehensive

...

  • Cancer

...

  • Center.

...

  • User

...

  • can

...

  • identify

...

  • the

...

  • sample

...

  • obtained

...

  • for

...

  • Glioblastoma

...

  • multiforme

...

  • (GBM)

...

  • and

...

  • the

...

  • corresponding

...

  • CT

...

  • image

...

  • information.

...

  • This

...

  • query

...

  • can

...

  • be

...

  • performed

...

  • by

...

  • querying

...

  • across

...

  • caTissue

...

  • and

...

  • NBIA.

...

  • User

...

  • can

...

  • find

...

  • out

...

  • if

...

  • a

...

  • sample

...

  • used

...

  • in

...

  • an

...

  • expression

...

  • profiling

...

  • experiment

...

  • is

...

  • available

...

  • for

...

  • a

...

  • SNP

...

  • analysis

...

  • experiment.

...

  • This

...

  • query

...

  • can

...

  • be

...

  • performed

...

  • by

...

  • querying

...

  • across

...

  • caTissue

...

  • and

...

  • caArray.

...

  • User

...

  • can

...

  • search

...

  • for

...

  • a

...

  • particular

...

  • gene

...

  • based

...

  • on

...

  • the

...

  • Entrez

...

  • Gene

...

  • ID

...

  • and

...

  • its

...

  • related

...

  • information

...

  • e.g.

...

  • messenger

...

  • RNA

...

  • and

...

  • protein

...

  • information

...

  • from

...

  • GeneConnect.

...

Data

...

element

...

equivalence

...

and

...

discovery

...

  • Find

...

  • all

...

  • malignant

...

  • breast

...

  • cancer

...

  • tumors,

...

  • return

...

  • all

...

  • tissues

...

  • that

...

  • have

...

  • site

...

  • "breast"

...

  • or

...

  • auxiliary

...

  • site

...

  • is

...

  • a

...

  • subtype

...

  • of

...

  • "breast"

...

  • across

...

  • different

...

  • tissue

...

  • banking

...

  • systems,

...

  • even

...

  • if

...

  • these

...

  • have

...

  • been

...

  • coded

...

  • differently

...

  • in

...

  • different

...

  • systems

...

  • Find

...

  • a

...

  • standard

...

  • data

...

  • element

...

  • that

...

  • matches

...

  • your

...

  • local

...

  • data

...

  • element,

...

  • assert

...

  • that

...

  • these

...

  • are

...

  • the

...

  • same

...

  • Find

...

  • all

...

  • prostate

...

  • cancer

...

  • specimens,

...

  • return

...

  • all

...

  • specimens

...

  • with

...

  • a

...

  • clinical

...

  • diagnosis

...

  • of

...

  • "prostate

...

  • cancer"

...

  • or

...

  • related

...

  • terms

...

  • (query

...

  • expansion

...

  • based

...

  • on

...

  • ontology)

...

Data

...

identification

...

and

...

searching

...

Scientist

...

would

...

like

...

to

...

gather

...

the

...

clinical

...

data

...

and

...

associate

...

biospecimen

...

from

...

a

...

particular

...

participant/patient.

...

Scientist

...

would

...

also

...

like

...

to

...

identify

...

any

...

associated

...

microarry

...

experiments

...

performed

...

on

...

the

...

biospecimen

...

and

...

check

...

for

...

availability

...

of

...

additional

...

biospecimens

...

for

...

further

...

analysis.

...

Workflow

...

authoring

...

When

...

dragging

...

services

...

onto

...

the

...

authoring

...

tool

...

dashboard,

...

these

...

services

...

should

...

be

...

automatically

...

"piped"

...

together

...

where

...

applicable

...

(i.e.

...

when

...

output

...

from

...

1

...

service

...

maps

...

to

...

the

...

input

...

of

...

another

...

service).

...

Leveraging

...

metadata

...

capable

...

of

...

mapping

...

outputs

...

to

...

inputs

...

will

...

facilitate

...

this.

...


In

...

cases

...

where

...

services

...

cannot

...

be

...

directly

...

piped

...

together,

...

the

...

tool

...

should

...

help

...

identify

...

shim

...

services

...

that

...

can

...

be

...

used.

...

This

...

will

...

require

...

possible

...

extension

...

of

...

metadata

...

around

...

shim

...

services.

...


If

...

there

...

do

...

not

...

exist

...

shims

...

to

...

assist

...

in

...

piping

...

services

...

together,

...

the

...

authoring

...

tool

...

should

...

help

...

(automatically)

...

generate

...

shim

...

services

...

based

...

on

...

the

...

semantic

...

requirements.

...

The

...

ability

...

to

...

describe

...

a

...

published

...

paper

...

as

...

a

...

"metadata

...

description"

...

and

...

"SOP"...

...

and

...

then

...

use

...

that

...

metadata

...

for

...

search/discovery/authoring

...

new

...

workflows,

...

capturing

...

any

...

new

...

steps

...

or

...

features

...

of

...

the

...

new

...

workflow

...

+

...

the

...

original

...

SOP

...

in

...

the

...

metadata

...

registry.

...

Easy

...

extension

...

of

...

existing

...

systems

...

In

...

previous

...

caIntegrator

...

projects

...

there

...

was

...

a

...

lot

...

of

...

custom

...

development

...

that

...

was

...

required

...

for

...

every

...

new

...

study

...

because

...

the

...

data

...

of

...

interest

...

was

...

different

...

for

...

every

...

study.

...

For

...

instance,

...

in

...

the

...

Rembrandt

...

study

...

they

...

were

...

dealing

...

with

...

a

...

brain

...

tumor

...

study

...

so

...

the

...

clinical

...

data

...

contained

...

some

...

common

...

things

...

like

...

Age,

...

Survival

...

Length,

...

and

...

Gender

...

but

...

it

...

also

...

included

...

study

...

specific

...

attributes

...

like

...

Karnofsky

...

Score,

...

Lansky

...

Score,

...

Anti-convulsant

...

status,

...

and

...

Steroid

...

Dose.

...

Each

...

study

...

will

...

likely

...

have

...

different

...

data

...

sets

...

that

...

are

...

of

...

interest

...

for

...

a

...

specific

...

study,

...

and

...

as

...

the

...

study

...

progresses

...

they

...

may

...

even

...

add

...

new

...

attributes.

...

Rather

...

than

...

going

...

through

...

a

...

full

...

modeling

...

effort

...

for

...

every

...

study

...

and

...

then

...

generating

...

a

...

new

...

data

...

model

...

and

...

object

...

model

...

and

...

updating

...

it

...

throughout

...

the

...

project

...

we

...

would

...

like

...

to

...

build

...

a

...

system

...

that

...

allows

...

the

...

user

...

to

...

dynamically

...

define

...

the

...

data

...

sets

...

they

...

want

...

to

...

use

...

and

...

be

...

able

...

to

...

store

...

this

...

in

...

a

...

generic

...

model.

...

However,

...

we

...

do

...

not

...

want

...

to

...

lose

...

the

...

semantic

...

meaning

...

of

...

each

...

of

...

these

...

attributes

...

and

...

we

...

also

...

want

...

a

...

computable

...

model

...

that

...

will

...

allow

...

us

...

to

...

query

...

across

...

multiple

...

studies.

...

Life

...

Sciences

...

data

...

is

...

dynamic

...

-

...

data

...

descriptions

...

and

...

annotations

...

are

...

diverse

...

and

...

evolve

...

very

...

rapidly

...

in

...

this

...

domain.

...

Therefore,

...

there

...

is

...

a

...

requirement

...

to

...

be

...

able

...

to

...

easily

...

add

...

additional

...

data

...

elements

...

to

...

an

...

applications

...

at

...

run

...

time

...

(not

...

linked

...

to

...

a

...

software

...

release).

...

These

...

could

...

be

...

discovered

...

in

...

a

...

metadata

...

repository

...

or,

...

if

...

the

...

appropriate

...

data

...

element

...

does

...

not

...

yet

...

exist,

...

it

...

may

...

need

...

to

...

be

...

created.

...

These

...

newly

...

added

...

data

...

elements

...

need

...

to

...

then

...

be

...

immediately

...

discoverable

...

and

...

made

...

available

...

through

...

the

...

application

...

programmatic

...

interface.

...

Enabling

...

ontological

...

indexing

...

and

...

searching

...

of

...

literature

...

Interdisciplinary

...

research

...

is

...

characterized

...

by

...

language

...

barriers,

...

with

...

research

...

results

...

distributed

...

over

...

a

...

wide

...

range

...

of

...

journals

...

(some

...

PubMed,

...

some

...

beyond)

...

and

...

data

...

distributed

...

across

...

numerous

...

community

...

repositories.

...

Semantically

...

data

...

tends

...

to

...

be

...

described

...

in

...

very

...

different

...

ways

...

with

...

widely

...

varying

...

applications

...

resulting

...

in

...

search

...

and

...

analysis

...

gaps.

...

For

...

example

...

caNanoLab

...

provides

...

limited

...

search

...

functionality

...

and

...

offers

...

limited

...

integration

...

with

...

other

...

valuable

...

resources.

...

The

...

new

...

semantics

...

infrastructure

...

must

...

provide

...

for

...

services

...

leveraging

...

domain

...

ontologies

...

as

...

the

...

basis

...

for

...

indexing

...

the

...

published

...

literature

...

and

...

for

...

searching

...

and

...

aggregating

...

from

...

multipledata

...

resources.

...

caOBR

...

represents

...

one

...

implementation

...

that

...

is

...

somewhat

...

characteristic

...

of

...

this

...

entire

...

class

...

of

...

requirements.

...

OBR

...

exposes

...

indices

...

to

...

caGRID

...

applications

...

and

...

makes

...

caGrid

...

resources

...

available

...

for

...

old

...

BR

...

indexing.

...

It

...

enables

...

all

...

BR-based

...

annotation

...

of

...

grid

...

resources

...

and

...

provides

...

analytical

...

capabilities

...

as

...

well.

...

It

...

utilizes

...

natural

...

language

...

processing-based

...

indexing

...

and

...

annotation

...

of

...

caNanoLab

...

data

...

and

...

caB2B

...

interface

...

for

...

OBA/OBR

...

analysis

...

of caNanoLab 

Reference vocabulary and value set terminologies

The CTRP has an immediate need for NCI level Vocabulary Services for Diseases and Interventions (Agents, Devices, etc.) that would allow CTRP to leverage the existing terms in EVS for these key lists of values, rather than relying on the existing lists taken from the PDQ Terminology File. And avoid the need for CTRP to build curation applications for these lists.

Thre is a need to develop an ontology for LIMS applications that includes a collaborative platform for improving existing terminologies and for cross mapping between various terminology sets to extend the body of knowledge for caBIG community and improve interoperability between laboratory information and hospital systems. 

Searching infrastructure

A user sends out a query for all microarray data associated with subjects with lung cancer at the following institutes: 1) Dana Farber; 2) Mayo; 3) NCI; 4) Wash U St Louis. The query is a union of results, and does not require results to be joined.  On the webpage, a status bar appears listing the four microarray services being queried. Next to each service name is a status bar saying that results have not returned yet. There is also a button asking the user if she would like to end the query against this service. After 30 seconds, the status bar changes for the Dana Farber service. Suddenly, it says "4 results have returned" and the Dana Faber "End Query" button disappears. 22 seconds later, the same thing happens for Mayo, and 15 more results are "returned". Still, no results show up, and the user is looking at a status page. 11 seconds later, WUSTL returns with 57 results, at which point, the researcher decides to press the "End Query" button next to NCI. Suddenly all results are returned, along with a message stating that the NCI query was terminated at the user's request.

A user wants to query for breast cancer tissue samples. The application shows her a list of 7 caTissue services available. Next to each service is a number that says how many public <Specimens> (could be a different object) are available at this service. Four of these services have zero specimens, so the user elects NOT to search against these services and selects the other 3 as candidate services to query.

A user is interested in seeing any microarrays performed against lung samples obtained from non-smokers with stage 3 lung cancer. She queries Mayo and OSU because both hospital systems have been running independent lung cancer trials. Realizing that Mayo and OSU are working independently, she puts a flag into her search criteria indicating that cross institute joins are not required. The web application creates the necessary queries, one joining across OSU and one joining across Mayo. For the Mayo join, data returns from caTissue followed by data from the Mayo caArray service. The application recognizes that these two datasets can be joined together based on the specimen ID and returns results to the user. These results are displayed to the user along with a message stating that [1] institute has not returned data to date. A minute or two later, the data from the missing institute returns and is appended to the results from Mayo that the user is currently browsing through.

Scrollbar
iconsfalse