The success of an AI system depends on the amount and quality of data used to train it. The database that was key to the latest AI revolution (ImageNet) contains millions of real-life images labeled into thousands of categories. No data collections of comparable extent and quality exist for radiology data. By many, this is considered to be the biggest challenge for AI in radiology. Training of AI models requires medical images accompanied by metadata and expert annotations (e.g., spatial location of the finding, its clinical characteristics), ideally linked with the non-imaging part of the patient record (e.g., biopsy results, genomic and blood serum tests). Large volumes of clinical images are routinely collected, interpreted visually and analyzed quantitatively, both in clinical and research studies.
Nevertheless, the result is often optimized for reuse by a human — not an algorithm. Tremendous effort is often needed to prepare datasets for AI training, combine data sets across sites or collections, or aggregate versatile datasets as often required to develop robust models. With the recent advances in automated imaging-based tissue phenotyping (radiomics) and other relevant AI technologies, there is a new realization of the value of the large, structured AI-ready datasets.
There are many obstacles and few incentives for engineering datasets to optimize machine-level reusability. Non-technical issues aside, there are major challenges of choosing a data format, defining a data model, deciding what attributes of the data may be valuable for the future unforeseen use cases and how those can be captured in a structured and self-documenting manner, and identifying practical tools to help with those tasks. Over the past five years, we have directed our efforts to incrementally and collaboratively advance data engineering practices as applied to medical imaging research. We are extending the existing, broadly adopted DICOM standard, to support the needs of medical imaging research applications, and subsequent implementation into clinical systems. We develop open source tools that enable standardization of common outputs of image analysis. We established collaborations with a number of academic and industry groups to encourage, support and evaluate adoption of the standard. We have been leading efforts in training and outreach, aiming to educate the community about the capabilities of the standard and the supporting tools. In parallel with developing support for the generic data types commonly encountered in imaging research, we are also working on targeted solutions for the specific research workflows of interest in several cancer types.
In this talk, I will discuss our progress to date in developing the ecosystem of standards, tools, use cases, datasets, publications, and outreach activities that have the overarching goal of improving data engineering practices. I will also present some of our ongoing work developing integrated technology solutions that are used to support clinical research at our site, and the role of data as the backbone of downstream innovation.
During this presentation, Dr. Simonyan will discuss WHISE for creating incentives and promoting the liberation of health data through patient ownership, exchange of proprietary data, and by adding value through intellectual and analytic insights. The WHISE technology provides a service based architecture where the exchange between consumer and owner of information can happen with data or with derived and computed information. It allows assetization of data and commoditization of data access.
Predicting treatment response and the course of a patient’s disease is critical in selecting therapy and in helping patients to plan their lives. Despite the rich data produced by genomic and imaging platforms, the accuracy of prognostication for patients diagnosed with cancer can be highly variable, often relying on classification by only a handful of molecular biomarkers or subjective interpretation of histology. While deep learning has emerged as a powerful technology for learning from unstructured images or other high-dimensional data, its application has largely focused on classification and has not widely explored predicting the timing of disease progression, overall survival, or other time-to-event clinical outcomes. In this talk, Dr. Cooper will discuss recent advances in developing deep-learning based survival models for predicting cancer outcomes from genomic and digital pathology imaging data. He will show how conventional survival models can be combined with convolutional networks or other neural networks to learn patterns associated with patient outcomes in digital pathology images or genomic signatures. Using gliomas as a driving use case, he will describe how these models can combine histology and genomics to provide unified and highly accurate predictions of overall survival, and illustrate how these models can be deconstructed to improve validation and reveal biological insights.
- EDIT THE CALENDAR
Customise the different types of events you'd like to manage in this calendar.#legIndex/#totalLegs
- RESTRICT THE CALENDAR
Optionally, restrict who can view or add events to the team calendar.#legIndex/#totalLegs
- SHARE WITH YOUR TEAM
Grab the calendar's URL and email it to your team, or paste it on a page to embed the calendar.#legIndex/#totalLegs
- ADD AN EVENT
The calendar is ready to go! Click any day on the calendar to add an event or use the Add event button.#legIndex/#totalLegs
Subscribe to calendars using your favourite calendar client.#legIndex/#totalLegs