'Inspiring innovation through forums for idea-sharing'
CBIIT TechScouts is a forum for promoting continuous improvement across CBIIT through the cross-fertilization of ideas, experiences and recommendations. This forum is designed to foster new collaborations, learn about opportunities to better serve our customers, engage CBIIT more broadly, and raise awareness of new techniques and technologies that promote innovation. Central to the CBIIT Tech Scouts is the focus on gathering insight from the community with ideas and experiences on how emerging information can be used to improve scientific productivity and accelerate cancer research.
Table of Contents
Topics Archive
Date | Topic (CLICK FOR MORE INFO) | Author | Topic ID | Summary |
---|---|---|---|---|
| NIH Supercomputers Have Come a Long Way | Carl McCabe | TS-0029 | In the February 9th issue of the NIH Record: NIH Supercomputers Have Come a Long Way BY DANA TALESNIK https://nihrecord.nih.gov/newsletters/2018/02_09_2018/story1.htm |
| Free Pro Git 2 book (Kindle edition) | Carl McCabe | TS-0028 | I don't know how long this deal will last, but Amazon currently (Friday 5pm) has Pro Git 2, by Scott Chacon and Ben Staub, on sale at a very affordable price of $0.00. It is the Kindle edition, not a paper copy. If you're interested, here's the link: https://www.amazon.com/gp/product/B01ISNIKES Melina Scotto: 'A reminder that the hash used in GIT is SHA1. Deprecated for government use since 2013. Great to play with, though.' Jeff Shilling: 'Since Git is not using SHA1 for cryptographic functions but as a hash of the files to determine if changes to the files have taken place, it doesn’t pose a security concern.' |
| Golem open source, decentralized supercomputer | Carl McCabe | TS-0027 | This is not something we can use right now (and maybe not ever in the Federal govt), but it is an interesting peek into the future. Golem "Golem is a global, open source, decentralized supercomputer that anyone can access. It is made up of the combined power of users’ machines, from PCs to entire data centers. Golem is capable of computing a wide variety of tasks, from CGI rendering, through machine learning to scientific computing. Golem’s limitations are only defined by software developers’ creativity.Golem creates a decentralized sharing economy of computing power and supplies software developers with a flexible, reliable and cheap source of computing power." Golem does have competitors, e.g. https://iex.ec/ |
| R and an NCI Genomic Data Comms Use Case | Sean Davis | TS-0026 | NCI has established the Genomic Data Commons (GDC, https://gdc.cancer.gov) as a home for NCI cancer genomics datasets. One feature of the GDC is that it uses UUIDs for *everything*. However, most cancer researchers think in terms of legacy "barcodes" when working with these datasets. Given the increasing importance of web-based APIs and their use for NCI data, I wrote a quick blog post using an R client for these data and, specifically, for translating from UUID back to "barcodes": |
| PuTTY Begone! Microsoft Will Ship an OpenSSH Client | Carl McCabe | TS-0025 | https://techcrunch.com/2017/12/13/putty-begone-microsoft-will-ship-an-openssh-client/ |
NIH Hour of Code | Sean Davis | TS-0024 | The international hour of code is an effort to bring computer programming to the masses. We at the NIH DataScience Special Interest Group have put together four sessions: R, python, natural language processing, and Shiny. Webinar slots are still available. Details are here: https://nihlibrary.nih.gov/about-us/news/hour-code-classes-nih-library Feel free to pass along. My session on truly introductory R is from 10-12 today. Materials (we won't cover all of them) are here: | |
| Biocunductor, Software for Genomic Data Science | Sean Davis | TS-0023 | I am giving a talk at 11am about Bioconductor, a large software project partially funded by NCI and the CBIIT ITCR program. The talk will be the TE406 at 11am and is only 20 minutes long--part of a morning-long conference open to anyone sponsored by the Center of Excellence in Integrative Cancer Biology and Genomics. Slides are posted here: Here is the abstract: Progress in biotechnology is continually leading to new types of data, resulting in data sets that are rapidly increasing in volume, resolution and diversity. The promise of unprecedented advances in our understanding of biological systems and in medicine is challenged by complexity and volume of data also challenge scientists’ ability to analyze them. Meeting this challenge requires continuous improvements in analytical methods and capable, usable software tools implementing them. Bioconductor is a well-established open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 1473 interoperable packages contributed by a large, diverse community of scientists. These packages undergo formal initial review and continuous automated testing. Each package includes documentation and working example use cases. Bioconductor supports many types of high-throughput sequencing data (including DNA, RNA, chromatin immunoprecipitation, Hi-C, methylomes and ribosome profiling) and associated annotation resources; contains mature facilities for microarray analysis; and covers proteomic, metabolomic, flow cytometry, quantitative imaging, cheminformatic and other high-throughput data. Bioconductor package interoperability enables the rapid creation of workflows combining and integrating multiple data types and tools for statistical inference, regression, network analysis, machine learning and visualization at all stages of a project from data generation to publication. A large and growing community of researchers and users contribute to ongoing development, online support, and education. The influence of the project is evidenced by more than 250,000 downloads per year and tens of thousands of citations in the literature. I will present an overview of the project for prospective users and contributors. |
| Protect Against Secrets in Git Repositories | Sean Davis | TS-0022 | I wrote a blog post about an embarrassing but educational experience including the solution I implemented for myself to keep it from happening again. Perhaps someone else will find my experience useful. |
| GitHub - Security Alerts and Dependency Graphs | Carl McCabe | TS-0021 | Github recently rolled out a new feature allowing you to see a graph of your project's dependencies. And yesterday they took this a step further by providing you with alerts when vulnerabilities are detected in any of those dependencies. Dependency Graphs are enabled automatically for public repos, but must be enabled optionally for private repos. Currently, this works for Ruby and Javascript, but Python is coming sometime in 2018. Dependency Graphs https://help.github.com/articles/listing-the-packages-that-a-repository-depends-on/ Security Alerts https://help.github.com/articles/about-security-alerts-for-vulnerable-dependencies/ |
| Agricultural Data Ecosystem Talk | Sean Davis | TS-0020 | I recently visited the USDA to give a talk with thoughts on an Agricultural Data Ecosystem, inspired in part by the work done by pioneering work done by CBIIT and NIH on data commons and cloud pilots. The slides are available on my website and might interest a few folks: https://seandavi.github.io/talk/2017/11/15/thoughts-on-an-agricultural-data-ecosystem/ Thanks to Ishwar Chandramouliswaran, Steve Tsang, and Durga Addepalli for several of the slides. |
| Bored With Your Fitbit? These Cancer Researchers Aren't | Carl McCabe | TS-0019 | Here is an overview of activity tracking devices (specifically Fitbits) use in clinical research.\ BORED WITH YOUR FITBIT? THESE CANCER RESEARCHERS AREN'T (Wired.com) https://www.wired.com/story/bored-with-your-fitbit-these-cancer-researchers-arent/ |
| Malware Encoded Into DNA can Hack the Computer that Reads It | Joel Yobouet | TS-0018 | What if someone stores a malicious program into DNA, just like an infected USB storage, to hijack the computer that reads it? Here is the white paper : |
| DigiCert to Acquire Symantec's Website Security Business | Jesse Bocinski | TS-0017 |
Digicert is acquiring Symantec’s web security business, a consolidation of the public commercial certificate market. |
| Semiconductor Feature Size Roadmap | Warren Kibbe | TS-0016 | Pretty amazing feats of science and engineering to get us to 2020. https://en.wikipedia.org/wiki/5_nanometer For comparison, a single benzene ring is about 500 picometers, or 0.5 nanometers |
| Monarch Made Twitter | Amy Gentzel | TS-0015 | I wanted to share a session that was presented at the AWS Public Sector Summit by NIAID that discussed Next-Generation Medical Analysis leveraging the cloud. Our CSRA team at NIAID assisted by developing a PaaS solution they’ve named Monarch, discussed in the presentation, that assists with this work leveraging a full DevOps pipeline. For background, Nephele, discussed in the presentation, is a project that a team in BCBB has been working on for several years now. Due to the infrastructure group (OEB) not having a defined service offering for public cloud at NIAID years ago when Nephele started, the project team went out on their own and architected a solution using AWS. Within the last year, CSRA consulted with them and provided some guidance for best practices and improvements now that we are launching a custom platform as a service using open source technologies and AWS services. In the future, we’ll be moving Nephele components to the custom PaaS we created called Monarch and hosting it under our architecture. https://www.youtube.com/watch?v=rLkBwWv0Hdc&list=PLhr1KZpdzukePsKIUofhgp50b63-5yr1V&index=66 |
| DCEG Linkage: 3D Printing in Radiation Research | Carl McCabe | TS-0014 | Here's a unique example of 3D printing right on site in the NCI Shady Grove building. This is from Choonsik Lee's group in DCEG's Radiation Epidemiology Branch. https://dceg.cancer.gov/news-events/research-news-highlights/2017/3d-printed-phantoms?cid=eb_govdel |
| SciAm: China Shatters 'Spooky Action at a Distance' Record, Preps for Quantum Internet | Carl McCabe | TS-0013 | Here's more on the long-term future (ignore the immediate or mid-term geopolitical implications). https://www.scientificamerican.com/article/china-shatters-ldquo-spooky-action-at-a-distance-rdquo-record-preps-for-quantum-internet/ |
| MaruOS - Your Phone is Your Computer | Carl McCabe | TS-0012 | With Maru, your phone is your PC -- you connect it to a keyboard and monitor whenever you need the desktop environment. Obviously this isn't ready for government use yet, but it illustrates the path toward convergence into single device personal computing. |
| Why Windows Must Die. For the Third Time | Mark Cunningham | TS-0011 | Interesting article about the history—and potential future—of Windows: http://www.zdnet.com/article/why-windows-must-die-for-the-third-time/ Great article as Microsoft tries to stay relevant in an age of minimal OS’s, immutable servers and relentless competition from, some would argue, more mature operating systems such as those found on MAC’s. It brought a tear to my eye as I took a walk down memory lane |
| Biowulf HPC Expansion Details | Sean Davis | TS-0010 | Biowulf now has >180,000 CPUs, including 100 GPU nodes, 24 very large memory nodes (up to 3TB per machine), and a very high-performance, low latency network (FDR infiniband) with an 80 Gb/sec connection to the outside world. Additional dedicated resources (with priority usage) are available to CCR researchers (10,000 cpus) and, soon, to DCEG (4000 cpus). This expansion will likely place Biowulf in the top 100 most performant HPC systems in the world. If you have questions, feel free to contact me or the Biowulf staff. |
| Globus Data Management Solution | Miles Kimbrough | TS-0001 |
|
| The Algorithm Will See You Now | Melina Scotto | TS-0009 | Great piece in the April 3 New Yorker on medical diagnostics and machine learning. They used Thrun and Hinton’s Columbia work with predicting melanoma as an example. http://www.newyorker.com/magazine/2017/04/03/ai-versus-md Anyone interested in machine learning – FAES offers python based machine learning in BIOF 509. Taught by Jonathan Street and Burke Squires on campus. |
| TEDxBuffalo - Become a Citizen Data Scientist | Carl McCabe | TS-0008 | Here's a TEDx talk about how normal (i.e. non-academic) people can be enlisted to support certain classes of scientific research. As the talk alludes, there are many possibilities for leveraging citizen science, and it is useful to think about the technologies available and the ways people interact with technology (like gaming or idle time-wasting on a cellphone) to understand those possibilities. The value of citizen science is not just in distributed data collection or analytical tasks that are currently easier for humans than computers. It is also the engagement of more people in the purpose, goals and problems of current research efforts. https://www.youtube.com/watch?v=zgijAGkdjZc&feature=youtu.be |
| Oracle Doubling the Cost to Run in AWS | Robert Smallwood | TS-0007 | Found these two articles related to Oracle’s new pricing strategy which effectively doubles the cost to run Oracle products in AWS. For those of you that have worked with Oracle over the years, I’m sure this isn’t a big surprise, but as we move to the cloud it may be a driver to move to other database platforms such as AWS RDS or Red Shift on AWS. Of course I suppose there’s always the Oracle cloud. https://www.cloudtp.com/doppler/oracle-is-now-charging-double-to-run-on-aws/ https://www.theregister.co.uk/2017/01/30/oracle_effectively_doubles_licence_fees_to_run_in_aws/ |
| IBM Watson for Oncology | Joel Yobouet | TS-0006 | Watson provides clinicians with evidence-based treatment options based on expert training by MSK physicians.Whether a community oncology practice or an international hospital, oncologists like all clinicians are struggling to keep up with the large volume of research, medical records, and clinical trials. Watson scales vital knowledge and helps oncologists. Now, with the collaboration between IBM and MSK, Watson for Oncology utilizes world-renowned MSK expertise to evaluate specific details of each unique patient against clinical evidence. |
| Data Science Job Report: R, SAS, Python | Carl McCabe | TS-0006 | For anyone paying attention to technology trends in statistical analysis and data science, here's a report from an R-focused data science blog: http://r4stats.com/2017/02/28/r-passes-sas/ The methodology used to track these trends involves data from job postings. Specific trends within cancer research are likely to be somewhat different. |
| DNA - the Future of Data Storage? | Carl McCabe | TS-0005 | You've probably heard about the concept of storing data in DNA by now. Here's a quick what/why/how/when from Yaniv Erlich's lab at Columbia U: https://www.researchgate.net/blog/post/dna-could-be-the-future-of-data-storage |
| NCI Genomic Data Commons R Client | Sean Davis | TS-0004 | The NCI has established the NCI Genomic Data Commons to store and distribute NCI datasets. One powerful feature of this new resource is a RESTful API. The Bioconductor project is a large, open-source for the analysis and comprehension of -omics datasets. A couple of us are working on an Bioconductor package for finding and accessing data in the NCI Genomic Data Commons directly from R, exposing a community of about 4000 developers and 100,000 users to NCI genomics datasets. The GenomicDataCommons package is available on GitHub and will be making its way into the Bioconductor project in the next month or so. |
| Open, Reproducible Neuroscience | Sean Davis | TS-0003 | Towards open and reproducible neuroscience in the age of big data Abstract: The Stanford Center for Reproducible Neuroscience (CRN) was founded to develop a number of initiatives to help to make neuroscience more open and reproducible. These initiatives include influencing the scientific culture (e.g. The OHBM Replication Award), introducing new standards for organizing neuroimaging data (BIDS) and software (BIDS Apps) as well as building tools to assess MRI quality (MRIQC) and perform robust preprocessing of fMRI data (FMRIPREP). Tying all of those efforts together, as well as capitalizing on the experience gained from OpenfMRI and NeuroVault, is the upcoming OpenNeuro project. This novel software-as-a-service platform seeks to harness the power of high-performance computers to provide free data analysis in exchange for making data publicly available. There is a nice technology feature article in Nature on "big data" in Neuroscience, touching on many of those parallels -- data management when dealing with 100s of terabytes, attitudes toward data sharing, and the notion that in many cases "big data" is still not enough data. Neuroscience: Big brain, big data http://www.nature.com/nature/journal/v541/n7638/full/541559a.html |
| AWS CloudTrail | Sean Davis | TS-002 | Monitoring changes to infrastructure and data is a very important part of a robust and secure infrastructure. This service allows ongoing logging of API calls (so, changes) to infrastructure and even supports S3 events (for changes to data). https://aws.amazon.com/cloudtrail/ Other cloud providers offer similar services directly or via third-parties. |
Show and Tell Archive
Date | Topic | Who | Notes | Action Items |
---|---|---|---|---|
Attachments
Welcome to CBIIT TechScouts!
We are a forum for promoting continuous improvement across CBIIT through the cross-fertilization of ideas, experiences and recommendations.
The goal of CBIIT TechScouts is to collectively improve scientific productivity through the sharing of knowledge.
What We Do
CBIIT TechScouts is designed to foster new collaborations, learn about opportunities to better serve our customers, engage CBIIT more broadly, and raise awareness of new techniques and technologies that promote innovation.
Central to the CBIIT Tech Scouts is the focus on gathering insight from the community with ideas and experiences on how emerging information, data-oriented, and software technologies can be used to improve scientific productivity and accelerate cancer research.
Our Service Includes:
- An email distribution list, allowing members to stay updated on the latest topics and trends
- To join: email Miles Kimbrough, subject 'TechScouts Access Request'
- To join: email Miles Kimbrough, subject 'TechScouts Access Request'
- An annotated archive of topics and presentations, organized for future reference
- A monthly summary found on CBIIT Central, providing a lightweight overview of recent trends
- And finally, Monthly Show and Tell Sessions, opening the door for new ideas to be presented both in person and virtually
- To present: email Miles Kimbrough with topic and availability
Questions?
General Support : Miles Kimbrough | miles.kimbrough@nih.gov | 240.276.5251
Consultation and Guidance : Eric Stahlberg | eric.stahlberg@nih.gov | 240.276.6729
Technical Support : George Zaki | george.zaki@nih.gov | 240.276.5171