Cross-NCI - Clinical and Translational Imaging Informatics Project

Page History

Versions Compared

Key

This line was added.
This line was removed.
Formatting was changed.

...

Challenges are being increasingly viewed as a mechanism to foster advances in a number of domains, including healthcare and medicine. The United States Federal Government, as part of the open-government initiative, has underscored the role of challenges as a way to "promote innovation through collaboration and (to) harness the ingenuity of the American Public." Large quantities of publicly available data and cultural changes in the openness of science have now made it possible to use these challenges and crowdsourcing efforts to propel the field forward.

...

Some of the key advantages of challenges over conventional methods include 1) scientific rigor (sequestering the test data), 2) comparing methods on the same datasets with the same, agreed-upon metrics, 3) allowing computer scientists without access to medical data to test their methods on large clinical datasets, 4) making resources available, such as source code, and 5) bringing together diverse communities (that may traditionally not work together) of imaging and computer scientists, machine learning algorithm developers, software developers, clinicians, and biologists.

However, despite this potential, there are a number of challenges. Medical data is usually governed by privacy and security policies such as HIPPA that make it difficult to share patient data. Patient health records can be very difficult to completely de-identify. Medical imaging data, especially brain MRIs, can be particularly challenging as once one could easily reconstruct a recognizable 3D model of the subject.

...

The medical imaging community has conducted a host of challenges at conferences such as MICCAI and SPIE. However, these have typically have been modest in scope (both in terms of data size and number of participants). Medical imaging data poses additional challenges to both participants and organizers. For organizers, ensure ensuring that the data are free of PHI is both critical and non-trivial. Medical data is typically acquired in DICOM format. However, ensuring that a DICOM file is free of PHI requires domain knowledge and specialized software tools. Multimodal imaging data can be extremely large. Imaging formats for pathology images can be proprietary and interoperability between formats can require additional software development efforts. Encouraging non-imaging researchers (e.g. machine-learning scientists) to participate in imaging challenges can be difficult due to the domain knowledge required to convert medical imaging into a set of feature vectors. For participants, access to large compute clusters with computing power, storage space, and bandwidth can prove difficult. Medical imaging data is challenging for non-imaging researchers.

However, it is imperative that the imaging community develops the tools and infrastructure necessary to host these challenges and potentially enlarge the pool of methods by making it more feasible for non-imaging researchers to participate. Resources such as the Cancer Imaging Archive (TCIA) have greatly reduced the burden for sharing medical imaging data within the cancer community and making these data available for use in challenges. Although a number of challenge platforms exist currently, we are not aware of any systems that meet all the requirements necessary to currently host medical imaging challengechallenges.

In this article, we review a few historical imaging challenges. We then list the requirements we believe to be necessary (and nice to have) to support large-scale multimodal imaging challenges. We then review existing systems and develop a matrix of features and tools. Finally, we make some recommendations for developing Medical Imaging Challenge Infrastructure (MedICI), a system to support medical imaging challenges.

...

Challenge Post

Multiexcerpt include

nopanel	true
MultiExcerptName	ExitDisclaimer
PageWithExcerpt	wikicontent:Exit Disclaimer to Include

has been used to organize hackathons, online challenges and other software collaborative activities. In person hackathons are free while the online challenges cost $1500/month (plus other optional charges).

Open Source

Synapse

Multiexcerpt include

nopanel	true
MultiExcerptName	ExitDisclaimer
PageWithExcerpt	wikicontent:Exit Disclaimer to Include

is both an open source platform and a hosted solution for challenges and collaborative activities created by Sage bionetworks. It has been used for a number of challenges including the DREAM challenge. Synapse allows the sharing of code as well as data. However, the code typically is in R, Python and similar languages. Synapse also has a nice programmatic interface and methods to upload/download data, submit results, create annotations and provenance through R, Python, command line and Java. These options can be configured for the different challenges. Content in Synapse is referenced by unique Synapse IDs. The three basic types of Synapse objects include projects, folders and files. These can be accessed through the web interface or through programmatic APIs. Experience and support for running image analysis code within Synapse is limited.

...

CodaLab

Multiexcerpt include

nopanel	true
MultiExcerptName	ExitDisclaimer
PageWithExcerpt	wikicontent:Exit Disclaimer to Include

is an open-source project that originated at Microsoft Research that was expressly created for hosting challenges and supporting reproducible research. The OuterCurve Foundation currently maintains it. Challenge organizers can easily set up challenges by creating a competition bundle that consists of data as well as evaluate tools. As part of the configuration files, the number of phases and duration (e.g. training, leaderboard, test) can be set up by the organizer. The evaluation program can be written in any language. Participants can upload results and get immediate feedback. The currently available version of CodaLab comes with scoring algorithms for image segmentation evaluation Organizers can extend the presentation of results to allow drilling down into the results with tables and charts. CodaLab currently uses the Azure platform although, in theory, it should be possible to deploy on other servers without a great deal of effort. CodaLab is also developing support for worksheets. These are resources to support reproducible research and for collaboration. Using these, researchers have compared a number of open source NLP tools on different public datasets

Multiexcerpt include

nopanel	true
MultiExcerptName	ExitDisclaimer
PageWithExcerpt	wikicontent:Exit Disclaimer to Include

. As this technology continues to be developed, researchers will be able to quickly compare the performance of different algorithms on a range of datasets in the "cloud" by leveraging Azure technology.

...

Content

Space Tools

Versions Compared

Old Version 42

New Version Current

Key

Open Source