Welcome to the CBIIT Speaker Series Wiki
This talk will describe the challenges in the area of scientific workflows, including how they are used to advance science in a number of domains, and how state-of-the-art software systems, such as Pegasus, meet the application and computing infrastructure challenges. Pegasus enables scientists to describe the workflows in an abstract, resource-independent way. That description includes the definition of the workflow steps and the data they take in and generate, but does not include low-level cyber-infrastructure information. Given the abstract workflow description and the information about the execution environment (composed of potentially distributed data sources and systems), a planner can map the computational tasks onto the available resources and plan the movement of data across distributed resources. The planning process also opens up opportunities for performance optimization and fault-tolerance. The talk will describe example applications, including LIGO, the gravitational-wave physics experiment that recently confirmed the existence of gravitational waves. The talk will touch upon the issues the applications face, and how Pegasus can help them execute in a number of different environments: campus clusters, distributed resources, and clouds.
As the crystal structures of biological macromolecules were being determined, a new field of structural biology was born. Inspired by these new structures, the scientific community worked to establish a home to archive and share the data emerging from these experiments. The Protein Data Bank (PDB) was established in 1971 with seven structures. The PDB provides a repository for scientists who generate the data, and an access point for researchers and students to find the information needed to drive additional studies. Today, the PDB contains and supports online access to ~117,000 biomacromolecules that help researchers understand aspects of biology, including medicine, agriculture, and biological energy. The ways in which the interrelationships among science, technology, and community have driven the evolution of the PDB resource for more than 40 years will be discussed. The PDB archive is managed by the Worldwide Protein Data Bank (wwpdb.org), whose members are the RCSB PDB, PDBe, PDBj and BMRB.
The imaging report is an essential source of clinical imaging information. It documents critical information about the patient's health and provides a professional interpretation of the images. However, the vast majority of report information remains narrative, a major obstacle to the rapid extraction and re-use of discrete imaging data. Structured reporting facilitates linking of imaging observations to clinical and genomic data, and is increasingly being adopted by clinical imaging practices. However, most imaging reports are used only once by the clinician who ordered the imaging study and are rarely used again for research, clinical care, or analytics. This presentation will describe the likely future of the imaging report, including efforts underway to standardize radiology report information, and the use of machine learning and natural language processing techniques to extract the semantic elements of the radiology report. These novel technologies enable connections between images and the electronic health record, and represent a vital part of the future of medical research.
Bioconductor is a widely-used collection of R packages for the statistical analysis and comprehension of high-throughput genomic data. Biocondctor has strengths in sequence (RNA-seq, ChIP-seq, called variants, ...) and microarray (expression, methylation, copy number, ...) analysis, as well as significant facilities for flow cytometry, proteomics, and many other omics domains. The breadth of available facilities, coupled with principles of interoperability and reproducibility, make Biocondctor an ideal platform for integrative approaches to cancer genomics. This presentation outlines technical aspects of recent and forthcoming facilities to enable integrative cancer genomic analysis in Bioconductor. We discuss our own work to enable routine integration of large-scale consortium (e.g., ENCODE, Ensembl), annotation into analysis work flows, development within Biocondctor of facilities to manage multiple-assay experiments, and approaches to scaling R's in-memory model to large scale data sets. The presentation concludes with a brief overview of integrative approaches contributed to Bioconductor by our international contributors.
- EDIT THE CALENDAR
Customise the different types of events you'd like to manage in this calendar.#legIndex/#totalLegs
- RESTRICT THE CALENDAR
Optionally, restrict who can view or add events to the team calendar.#legIndex/#totalLegs
- SHARE WITH YOUR TEAM
Grab the calendar's URL and email it to your team, or paste it on a page to embed the calendar.#legIndex/#totalLegs
- ADD AN EVENT
The calendar is ready to go! Click any day on the calendar to add an event or use the Add event button.#legIndex/#totalLegs