NIH | National Cancer Institute | NCI Wiki  

Error rendering macro 'rw-search'

null

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

With small animal/co-clinical data meeting the DICOM standard, researchers could find a mouse with the same kind of tumor and compare its response to various therapies that could help generate sophisticated diagnoses and treatment plans.

Pilot Challenges

The Pilot Challenges sub-project is unique within CTIIP because rather than focusing on data standards and integration, it demonstrates that integration

Challenges are being increasingly viewed as a mechanism to foster advances in a number of domains, including healthcare and medicine. The US Federal government, as part of the open government initiative has underscored the role of challenges as a way to "promote innovation through collaboration and (to) harness the ingenuity of the American Public." Large quantities of publicly available data and cultural changes in the openness of science have now made it possible to use these challenges and crowdsourcing (enlisting the services of people via the Internet) efforts to propel the field forward.

Sites such as Kaggle, Innocentive, and TopCoder are being used increasing in the computer science and data science communities in a range of creative ways. These are being leveraged by commercial entities such as Walmart in finding qualified employees while rewarding participants with monetary prizes as well as less tangible rewards such as public acknowledgement of their efforts for advancing the field.

In the biomedical domain, challenges have been used effectively in bioinformatics as seen by recent crowd-sourced efforts such as Critical Assessment of Protein Structure Prediction (CASP), the CLARITY Challenge for standardizing clinical genome sequencing analysis and reporting and the cancer Genome atlas Pan-cancer analysis Working Group, DREAM Challenges (Dialogue for Reverse Engineering Assessments and Methods), including the prostate challenge currently underway are being used for the assessment of predictive models of disease.

Some of the key advantages of challenges over conventional methods include 1) scientific rigor (sequestering the test data), 2) comparing methods on the same datasets with the same, agreed-upon metrics, 3) allowing computer scientists without access to medical data to test their methods on large clinical datasets, 4) making resources available, such as source code, and 5) bringing together diverse communities (that may traditionally not work together) of imaging and computer scientists, machine learning algorithm developers, software developers, clinicians and biologists.

However, despite this potential, there are a number of challenges. Medical data is usually governed by privacy and security policies such as HIPPA that make it difficult to share patient data. Patient health records can be very difficult to completely de-identify. Medical imaging data, especially brain MRIs can be particularly challenging as once could easily reconstruct a recognizable 3D model of the subject.

Crowdsourcing can blur the lines of intellectual property ownership and can make it difficult to translate the algorithms developed in the context of a challenge into a commercial product. A hypothetical example is the development of an algorithm by a researcher at a university for a contest held by a commercial entity with the express purpose of implementing it in a product. Although the researcher who won the contest may have been compensated monetarily, as the IP was developed during her time at the university, the IP is now owned by the University who many not release the rights to the company without further licensing fees.

The infrastructure requirements to both host and participate in some of the "big data" efforts can be monumentalThe infrastructure requirements to both host and participate in some of the "big data" efforts can be monumental. Medical imaging data can be large, historically requiring the shipping of disks to participants. The computing resourcing needed to process these large datasets may be beyond what is available to individual participants. For the organizers, creating the infrastructure that is secure, robust and scalable can require resources beyond the reach of many researchers. These resources included IT manpower support, compute capability, and domain knowledge.

The medical imaging community has conducted a host of challenges at conferences such as MICCAI and SPIE. However, these have typically have been modest in scope (both in terms of data size and number of participants). Medical imaging data poses additional challenges to both participants and organizers. For organizers, ensure that the data are free of PHI is both critical and non-trivial. Medical data is typically acquired in DICOM format. However, ensuring that a DICOM file is free of PHI requires domain knowledge and specialized software tools. Multimodal imaging data can be extremely large. Imaging formats for pathology images can be proprietary and interoperability between formats can require additional software development efforts. Encouraging non-imaging researchers (e.g. machine learning scientists) to participate in imaging challenges can be difficult due to the domain knowledge required to convert medical imaging into a set of feature vectors. For participants, access to large compute clusters with computing power, storage space and bandwidth can prove difficult. Medical imaging data is challenging for non-imaging researchers.

However, it is imperative that the imaging community develops the tools and infrastructure necessary to host these challenges and potentially enlarge the pool of methods by making it more feasible for non-imaging researchers to participate. Resources such as the Cancer Imaging Archive (TCIA) have greatly reduced the burden for sharing medical imaging data within the cancer community and making these data available for use in challenges. Although a number of challenge platforms exist currently, we are not aware of any systems that meet all the requirements necessary to currently host medical imaging challenge.

both in terms of data size and number of participants). Medical imaging data poses additional challenges to both participants and organizers. For organizers, ensure that the data are free of PHI is both critical and non-trivial. Medical data is typically acquired in DICOM format. However, ensuring that a DICOM file is free of PHI requires domain knowledge and specialized software tools. Multimodal imaging data can be extremely large. Imaging formats for pathology images can be proprietary and interoperability between formats can require additional software development efforts. Encouraging non-imaging researchers (e.g. machine learning scientists) to participate in imaging challenges can be difficult due to the domain knowledge required to convert medical imaging into a set of feature vectors. For participants, access to large compute clusters with computing power, storage space and bandwidth can prove difficult. Medical imaging data is challenging for non-imaging researchers.

However, it is imperative that the imaging community develops the tools and infrastructure necessary to host these challenges and potentially enlarge the pool of methods by making it more feasible for non-imaging researchers to participate. Resources such as the Cancer Imaging Archive (TCIA) have greatly reduced the burden for sharing medical imaging data within the cancer community and making these data available for use in challenges.

In this article, we review a few historical imaging challenges. We then list the requirements we believe to be necessary (and nice to have) to support large-scale multimodal imaging challenges. We then review existing systems and develop a matrix of features and tools. Finally, we make some recommendations for developing Medical Imaging Challenge Infrastructure (MedICI), a system to support medical imaging challenges.

...