__
General Support : Miles Kimbrough | miles.kimbrough@nih.gov | 240.276.5251 Consultation and Guidance : Eric Stahlberg | eric.stahlberg@nih.gov | 240.276.6729 Technical Support : George Zaki | george.zaki@nih.gov | 240.276.5171 |
Attendees: Eric Stahlberg, George Zaki, Lynn Borkon, Miles Kimbrough, Randy Johnson, Eckart Bindewald, Jason Levine, Carl McCabe
Notes:
Today’s meeting to serve as a working session rather than a share/update meeting.
Looking to begin to capture use cases involving HPC, in a way that we can generate data from it rather than just reports. Capture what people are doing now, qualify/characterize, and start to capture future cases.
Use Case Nice-to-Have’s:
· Have a list of accessible use cases with key stakeholders, to be captured and refined over time; useful for not only HPC but other domains as well (data science). Use this list to iterate with different people in the community, to validate with others in the NIH community, provide a discussion platform of how we are supporting them, and yield better ways to support them in the future.
o Capture use cases and then synthesize to identify CLASSES of use cases
o Ways to identify this community: possible survey (?)
o Program Directors in extramural community are doing some kind of data analysis and trying to get a better understanding of what their grantees are doing. Trying to lift their understanding of grantees’ work
o Annotated list – could search for publications which cite Biowulf, text mine publications to search for certain resources
· Understanding what’s happening in CCR has been a challenge, with 250 investigators using HPC and not always centrally communicating those use cases (pain points, struggles, etc.)
· Use cases should also include what people WANT to do with HPC, not just what they are doing
HPC Needs Database – what data fields should we start with?
· MILES and (if possible) JANELLE – DUE 3/27: Merge Janelle’s assessment document with HPC SIG assessment and circulate to team
o ID data fields and what would be expected to be put into responses
o Survey to serve as a tool to be referenced as we go out and engage the community
Earlier discussions around turning those into a database – how would we initially capture info to understand use cases and develop insight. Begin to answer questions around bottlenecks, what’s needed – contributors will be able to provide pieces of larger picture
Many scientists have data analysis problem – mostly resort to excel. Need exists to identify communities doing data analysis who aren’t equipped with the right tools beyond MS excel.
Provide logistical support to these communities, if they want to host their own seminars and workshops we can provide logistical support (advertising, comms, etc.). CBIIT already has infrastructure in place to help with this.
· Concept of Communities of Practice
High-level strategy is identifying HPC needs – Goal is to develop strategy for engaging needs as they emerge
Unsure how to allocate funds. Not trivial amount of money and uses, tired of hearing ‘this is how it works’ – need to find another resource in this case.
Biowulf: Limit on what it can deliver in a finite period of time. ‘Large’ jobs at NCI/NIH is a ‘small’ job in DOE terms. Leaves a gap in the ‘middle’ job size
· Propose promoting DOE resources.
· Can get list of Biowulf users based by size jobs and start from the top to consult with them for use cases. Also grassroots effort to network organically with users.
Informatics and Data Science Strategic Plan: Potential plan to be developed per Jeff Shilling
(Eckart) Google Collaboratory: useful resource to run python, GPU nodes. Not sure whether able to use or if authorization is needed.
Upcoming Events:
Nvidia coming week of 3/25 for a seminar. Dr. Tang to provide intro and use cases of DL at NIH.
· Publicized via list serves, NCI calendar – Randy to forward list serves (Bioinformatics User list serve) - BIOINFORMATICS-USER-FORUM@LIST.NIH.GOV; HPC-SIG@LIST.NIH.GOV; NIH-DATASCIENCE-L@LIST.NIH.GOV;
Attendees: Miles Kimbrough, Lynn Borkon, George Zaki, Eric Stahlberg, Janelle Cortner, Eckard Bindewald
Introductions:
Eckardt Bindewald – Sr Computational Scientist, FNLCR, RNA Biology Lab (Sponsor, Bruce Shapiro)
· Involved with coding, clustering, parallelization, threading, ML
· Writes informatics applications
Janelle Cortner – prior at CCR, now at CBIIT
· Big data comes out of HPC
· Interest with HPC TL is understanding projections for the needs for more resources, use cases for on-prem vs cloud, making sure we have resources available and planning on the services side
· Hoping to discuss Strategic Initiatives related to HPC, want to harness this group as a node for HPC current/future needs
Opening Remarks (Eric)
HPC not just on the computational side but also the data
· Amount of data generated in commonplace experiments is dramatically exceeding what we have historically done.
· Objective of this group to inform anticipated resource requirements
· Need to gather information – both immediate needs/points of opportunity, but also understand how computing is being used – feeding into recommendations
· Want to be data-driven moving forward, what we’re fostering. The information we gather is dynamic, not static, so needs will continue to evolve as we interact with new/expanding groups
o ID profile of use cases so we can address moving forward
What are the big compute resources available to NCI?
· Documentation on available resources is being collected and will be made available
· On-prem computing elements:
o Nathan Cole runs large system out of DCEG using in-house compute environment. Systems being run at a division/lab level, but starting to mature and will need replacement/enhancement
§ Competes for available funds for HPC – need to explore including these into broader needs assessment
§ Propose working with Nathan Cole and Jonas Almeida for initial use cases
o System at DCTD that has a large amount of FPGAs being put into it
o MOAB in FNLCR
o Successor in operation now
o Biowulf
o NIH-level and Division-level buy-ins
· Cloud computing elements:
o Understanding where data passes through various computing resources. Helps identify/organize use cases by which resources involved with use case. Eric to make this research available
o MS Azure
o Amazon
o Google
o Cloudera
o IBM
o DOE resources – individuals can access. No cost to get started but contains application process. DOE can provide discretionary allocation (10% of available DOE compute)
§ Few orders of magnitude more compute overall
Need more pages geared towards intramural IDing available resources to researchers
· Table indicating what resources are available, how large, steps to using, etc.
· Data could inform success rates
· If need to access human genome data, need to manually download it. May have already been downloaded but not sure where located. Want to reduce redundant efforts
Janelle spoke w Jeff RE: buying persistent storage attached with Biowulf. Biologists for example come in with raw datasets, analyze, and keep data parked in Biowulf – don’t need to keep data parked at Biowulf, but some other use cases need to remain parked.
· Standard datasets like human genome – lot of researchers need and keep downloading them. Will make for ideal use case to park this information in Biowulf
· If can get data in DME, where people are given appropriate permissions, then no need to transfer data. Commonly needed reference datasets would reduce duplicate efforts
When to use MOAB/Biowulf/others?
· Use cases to inform
· March version of Jack’s report contains guidance on this
Need statements have been produced in the past, contain first incarnation of HCP needs but refresh constantly needed as the field moves too fast. How do we capture use cases so they can be shared on ongoing basis?
· Even if we had some type of database to capture use cases, what’s being done, essential elements we want to know about, and make extensible so if we want to ask new questions in the future, data can accommodate evolving inquiries
· Use cases could be what’s being done already and what researchers want to do in the future
· Not too many heavy HPC users - ~10-15 in Biowulf, not many in cloud, handful using DOE resources, ~10-15 using Moab
o Could capture use cases among these groups since not overwhelming amount of users
o As more and more scientists begin adopting HPC, desire to capture their use cases as well
· Another use case – searching human genome by individual. Not possible directly but can download data. Why doesn’t NIH provide searchable/google equivalent of human genome sequence?
o From cancer perspective, would be ideal assuming access and permissions for patient level data are met
o DCEG has 300 GWAS cohorts, used internally but not externally shared
o Can we recreate GWAS cohorts, or what does it take to get them shared??
o Significant investment in creating data – how do you incentivize data in a way to make it accessible to others
· CRDC (Cancer Research Data Commons) has elements available ie genomic data commons
o Imaging and proteomics commons not yet available but will be
o Goal to be centrally available
Questions to ask about every use case – need to establish a baseline set of questions to understand common needs and trends moving forward
· Janelle volunteering to come up with short list of use cases/compute resources on Biowulf/Moab, AND create draft of questions (to be refined)
· Eric volunteering to work with BIDS to establish database (similar to Miles/Randy development of TRON, using Filemaker) of use cases
When making recommendations, these are recommendations for procurements – which will result in larger HPCTL attendance/engagement
· Want to justify end-of-year procurements because always will be year-end money
Current tools like Galaxy/DNA Nexus can’t be used off-premise. Solution needed to bring bioinformatics into every lab
· Janelle’s experience with DNA Nexus – trained over 200 people on cloud resources centrally funded through CBIIT - Haven’t been well-received.
· Vishal will present pipeline from DNA Nexus to Pallantir for genomic/RNAseq analysis during Jan 24 Foundations workshop
· Part of Janelle’s job with Jeff to bring resources available to CBIIT, make case to Tony not to just use 3 cloud resources but also fund DNA Nexus (cloud resource). Might be able to support this.
· With Pallantir, users who aren’t coders use informatics tools written by experts, can drag/drop/customize, then use APIs to loop out to compute (seven bridges, Biowulf, etc.) then bring back into to Pallantir for analysis
o Janelle to provide demo to Eckardt
Imaging doesn’t fit very well with Biowulf, motivation to move to cloud but problems here as well
· Use cases will continue to evolve
· As database put together, ensure information is presented in a way that captures surrounding environment (HPC needs ‘for the rest of us’)
Smaller group working on this project to meet in 2 wks, to discuss needs assessments updates
· Move broader HPCTL to quarterly basis
If we know what budgets are across labs and experiments to be conducted, if experiments aren’t doubling then available data won’t double
· Get insight on what are planned experiments for the year
· Going to individual data users is the best approach
NEXT STEPS:
· JANELLE – provide draft list of top 20 users/use cases, and questions to be asked
· MILES – send Janelle HPC SIG needs assessments
· ERIC – work with BIDS to justify development of database to cover/organize use cases
· MILES – get HPC Utilization report from Jack Collins (Janelle has)
· MILES – update HPCTL wiki
· JANELLE – Send Eckard link to data commons
· Janelle to provide Eckard access to DNA Nexus through Peter Fitzgerald (runs BTEP)
o Contains freebie access
· Smaller group working on this project to meet in 2 wks, to discuss needs assessments updates
o Move broader HPCTL to quarterly basis
Attendees: Miles Kimbrough, Eric Stahlberg, George Zaki, Randy Johnson, Jack Collins, Dianna Kelly
Intro (Eric):
· FY 19 initiatives as added agenda item
· Add HPC and AI needs to agenda, collect feedback in context of accelerating data science
HPC and Cloud
· What do we anticipate needs to happen for HPC and cloud? What are we doing? What do we need to prepare for?
· George – piloting use case with HiTIF, want to have image visualization host in cloud using containers. Cloud team shared prototypes, looking to host in NCI Cloud 1. Per Jeff, can use cloud resources that can act as APIs within the firewall. Working demo on VM, can launch VM to install software, pull data from archive – demonstrates connectivity to internal firewall.
o Potential use case of using cloud not only for HPC processing but also data visualization based on resources using cloud containers (Dockr). Using Dockr to launch software users are accustomed to.
o Don’t have to pay for licenses, allows for scalability
o Investigators can launch one of these instances on demand
· Jack: CCBR, NCBR working with various groups to put pipelines in the cloud, request coming from Janelle. Asked Sue Pan for direct interactions and connections to get spin-up from VMs, put down into infrastructure itself as opposed to going through other channels like Seven Bridges. Sue trying to give access to Cloud 1 sometime (date TBD).
· Randy/Jack been working on follow-up on HPC usage – extension from Biowulf retrospective report. Beginning to look like a manuscript. Currently contains usage from HPC Moab and Biowulf, storage growth and utilization, and compared costs of doing computation and storing data in the cloud. Contains estimate if using AWS in cloud using 3-year contract with guaranteed provisioning and not turned on.
o Goal to complete report by 12/21
· Need to be very careful why going to cloud, demonstrate good reason of what we’re trying to accomplish. Cost is sizeable with large HPC jobs
o Are we getting a better service? Are we creating better capabilities? What are the appropriate metrics for comparison? Want to ensure we’re comparing common metrics (to avoid ‘apple and orange’ comparison)
o Per George, Omero Columbus (as one example) requires licensing fee and is not scalable. Omero containers don’t work well using Singularity
o If investing more in cloud, can redirect funds from Biowulf to cloud – potential longer-term strategy
· Want to determine long-term, 3-year cost. Will require communication across this community to validate costs.
o Question of what engineering do we want to outsource vs. conduct in-house. Will determine operations and maintenance
· 16-24 cores will be sufficient for a lot of the computation being conducted in near future – relatively modest amount of computation
· If we had ability to have back-system along with resources that you spin up. If need to scale, can burst into the cloud and utilize during high-peak computation. Under normal conditions, can contain internally.
o Cost and resource-effective strategic. Can keep confidential information in-house as one value-add
· Regarding who pays for cloud, per Dianna: need to follow up with Carl as he is in charge of proposal going to NCI leadership on cloud funding
o Using cloud HPC billed to common acct that everyone can charge to vs. per-directorate/per-group model – supposed to be a common account (CAN)
· Based on CPU usage in Frederick – majority CPU storage is persistent rather than temporary
o ~ $100K/month
o If considering Biowulf, just the CCR portion of compute and storage is ~$2MM. Assuming bringing back 10% of anything sent to the cloud
· Roughly 6 months since last Biowulf report. Need to maintain 6 month reporting cycle
o These assessments are included in FY 19 needs
· As we move toward cloud, will require very skilled human resources to help move those applications. Will demand larger group of people to help move and support those applications rather than doing work on nights and weekends.
o Governance plan also a factor to consider
Positioning HPC and Data to accelerate Data Science
Ask the question – how do we want to answer this challenge? In terms of looking at how we position HPC and data to accelerate data science, is that something we want to do in future context of these meetings? Or something we should be looking at more broadly – NCI/NIH/similar organization level?
· How do we want to develop a strategy to deliver HPC to accelerate data science?
· Cloud is one of the pieces of this consideration – what are ideas on how we can present/prepare a collective document?
· ID how the data commons factors into this. Are we putting a lot of resources into data commons, or doing more movement of data into centralized repository?
o Edge computing another trending consideration
· Interest in adding topics to this – next step to develop short whitepaper to capture some of these topics so we have common vision working towards
o Aligns with needs with HPC going forward
o Whitepaper could be deliverable under FY 19 plan – people working on these initiatives. Could be summary document, quick read making some of the information broadly available
o Could have some addendums with additional information, presenting just high-level summary with addendums available for reference
Updates Around the Community
· George: Do we have enough HPC resources for AI/DL efforts?
· Jack working with Jay installing GPUs into nodes to assist with ML. Biowulf has a few GPU nodes but becoming increasingly in-demand.
· George recently conducting inference on images. Recently discussed with lab.
· Training not currently being done at scale, given available resources on Biowulf
o Not a question of hardware anymore. More about getting/cleaning data, processing, getting into something that makes sense. Once training models, can then go back to PIs to ensure models will work for their research
o Going back and forth multiple times to conduct retraining, cleaning data, retraining data has been larger part of time than putting on the machine. Once on machines, training can be done over a couple days. Other efforts are people-work up front, requires significant domain expertise sometimes requiring data that some labs don’t have
o 80-90% of effort is in the up-front coalescing/comingling/aggregating data
· George, Yanling, and others met with Nvidia team – several people working with NIH through CRADA, focused on image segmentation on MRI’s. Discussed smaller collaborative projects using GPUs, optimizing/benchmarking pipelines. Nvidia capable to doing smaller projects if there is value and not trivial. Looking at how to share data/software with Nvidia using Biowulf or other avenues
· As working with individuals with Nvidia who have access to Biowulf, where do we go for guidance about policies of sharing data with that group? Since they aren’t employees of NIH – essentially consulting? OPEN question…
o Is it even possible to share data with Nvidia if they have specific access to Biowulf?
· Randy: Recently had HPC SIG meeting on 11/27. Mark Jensen gave talk on creating tumor-normal classifier. Next meeting scheduled late Feb/early March… topic pending
· George: Coordinating with Tom Misteli to develop online presence NIH.AI on NCIP Hub
o Developing focused min-workshop on image segmentation Feb 14 – roughly 3 hours with 3 speakers, technical overview. Review of what worked, pitfalls, pre and post-processing pipelines, and encourage attendees to share pipelines on GitHub, to provide overview of resources used
· Eric: NCI-DOE Governance Review Meeting being held February
· Another workshop proposed has been accepted: HPC Applications in Precision Medicine @ International Supercomputing Conference
o Much broader slate of European participation
· SuperComputing 19: Workshop proposal deadline in ~6 weeks. Workshop topics currently under review
Education – Data Science Oriented Workshops and Training
· Miles: Upcoming Foundations of Cancer Data Science I Workshop to be held end of January. Providing high-level opportunity for those who’ve heard about data science to get topical overview. Talked with Cancer Data Sciences laboratory
o Whole effort being defined of filling in gaps of mapping other data science training activities going on to develop integrated master schedule
· Emily and Carl working on overall data science training and education effort, partly a response of various reports coming out from Big Data working group
o Education/training resources important part, but people and data part are the rate-limiting factor
· Randy working on things such as Programmer’s Corner and HPC SIG
· Need arising to determine where there is knowledge and where there are gaps – where do gaps exist and where are outlets to fill those gaps? What gaps exist that need to be filled?
o Randy: haven’t thought about this in depth. Recent interest expressed in analyzing RNA seq data better, put in context of gene pathways and visualization
o Jack (in ABCS context): Human resources are critical to training researchers as they begin learning about new tools and technologies. Desire to have a place for consolidated training resources.
o Outreach part of letting people know what’s possible, and having consolidated place where people can submit request targeted towards those who can offer consultation and guidance. Then need to have the people with broad knowledge and bandwidth
Action Items
· Begin capturing prospective needs – what we’re trying to accomplish, what goals are, how we plan to achieve them. Options to consider – integrating in-house and cloud capabilities
· Develop short whitepaper to capture some of these topics so we have common vision working towards
· Meeting frequency: Propose moving to quarterly and extend as needed
Attendees: Miles Kimbrough, Eric Stahlberg, George Zaki, Randy Johnson, Jack Collins, Dianna Kelly
Intro (Eric):
HPC and Cloud
Positioning HPC and Data to accelerate Data Science
Ask the question – how do we want to answer this challenge? In terms of looking at how we position HPC and data to accelerate data science, is that something we want to do in future context of these meetings? Or something we should be looking at more broadly – NCI/NIH/similar organization level?
Updates Around the Community
Education – Data Science Oriented Workshops and Training
Action Items
HPC Thought Leaders Meeting
6/21/18 Meeting Notes
Attendees: Eric Stahlberg, Janelle Cortner, Sean Davis, Lynn Borkon, Miles Kimbrough, Paul Fearn
Agenda: Review of data services efforts/initiatives around the room, Open discussion
Opening remarks
Eric:
Janelle:
Review of Data Science Strategic Plan
Sean:
Paul (addressing data prep & aggregation to develop models):
Sean:
RE: using cloud services on current CCR projects
RE: Strat Plan - to meet in Cold Spring Harbor and develop a proposed/updated data science plan
Paul:
Sean:
HPC Thought Leaders Meeting
5.17.18 Meeting Notes
Attendees: Eric Stahlberg, George Zaki, Miles Kimbrough, Paul Fearn, Randy Johnson, Janelle Cortner
Opening Remarks (Eric):
Challenge to figure out how to make sure everyone stays mutually informed
Plans to ID role of HPC, Data
Infrastructure, storage, and implementation updates developing quickly
Plans for today’s call to open floor for discussion of top efforts among the room
Randy:
Next HPC meeting on Tuesday 5/22 in Frederick
Janelle:
New to CBIIT role, want to ensure connection between scientific users and champion key needs
Issues arose from users around how Biowulf is working
Currently in the process of learning and absorbing
Interested in learning how Moab and Biowulf fit together
Idea of having large compute next to big data in Frederick would be worth considering
Paul:
Significant progress on NLP group
Need to understand connections between people and systems, connect with the right people
HPC as related to Internet of Things (IoT)
Would like oversight of upcoming meetings/workshops of interest
With several initiatives outside of DOE collaborations, need to ensure upcoming events are communicated across all
George:
ID how to translate broader impact of CANDLE and related updates to intramural and broader/extramural community
Eric:
End of May: Mini-workshop of Predictive Model coherence, access, and interoperability
HPC Program to have additional support within CBIIT ESIB Branch to scale upward, develop broader impact for HPC & Data Services
Attendees: Trent Jones, Sean Davis, Paul Fearn, Eric Stahlberg, Randy Johnson, Miles Kimbrough
Agenda:
3/15 Action Items:
Next Meeting Agenda – Key priority areas:
**Topic areas to cover current, 2019, and 2020+ priorities
2019 Prioritization
3/15 – Updates around the room
Paul
Sean
Eric
Trent
Next Meeting: April 19, 2018
Attendees: Dianna Kelly, Randy Johnson, Eric Stahlberg, Trent Jones, Paul Fearn, Lynn Borkon, George Zaki, Miles Kimbrough
Review of HPC Long Range Plan
Next Meeting: March 15, 2018
Attendees: Eric Stahlberg, Randy Johnson, Miles Kimbrough, Lynn Borkon, Paul Fearn, George Zaki, Jack Collins, Trent Jones, Dianna Kelly
Updates
Eric asked to provide updates to the HPC long range plan in February, presenting to Jeff
Request for more resources approved within CBIIT, delivered to NCI Director Ned S. for consideration
Need is for more expertise and personnel, not necessarily compute
HPC plan includes support of data services, providing means for individual labs to deposit data, annotate, aggregated within a common interface
Mid-December, begun discussing with Biowulf the physical extent Biowulf can be expanded, to leverage computing relative to the data
DCCPS has data sharing working group, have troubles sharing data across regions
Several efforts building out commons of different types (ex. GDC), actions to build out additional commons, where data is centralized but individually coordinated
Issue of moving data vs. moving compute
NCI DOE Collaboration Updates
Trans-NCI DOE Ad Hoc Working Group established
DCCPS Still very underpowered on the Data Science side
Training programs needed for frontier technologies based on massively disruptive nature of DOE capabilities
Pilots developing and delivering capabilities, question arising of how best to translate and transition those
Several pilot efforts approaching computational and infrastructure demands but lack of integration across DOCs
Biowulf Update
Needs and Issues around the room
Jack
CSSI as coordinating center, ongoing effort to build infrastructure and software
GPUs – Lower-priced GPUs performing better across multiple frameworks than high-priced models
CPU – Skylink performance has been great, not many tweaks needed
Frederick been collecting data/info on HPC needs, how many need to stay local vs. move to other locations
Dianna working with Jeff to come up with HPC strategy for NCI, to extend towards NIH level
Diverse computing needs, supporting many workloads simultaneously along with doing individual large runs, complex demands
George
DME
Paul
Storage
Tutorials on using NLP within CANDLE framework to be developed
From hosting to transmission of data, how to keep it protected
IP Management for scalability
Randy
HPC SIG
CANDLE community – ACTION to develop CANDLE-specific user group
Attendees = Eric Stahlberg, Trent Jones, Nathan Cole, Randy Johnson, Miles Kimbrough
Agenda = Scaling HPC strategy and engagement across the NCI
Opportunity to reset with recent personnel transitions
People = limiting factor impacting HPC strategy engagement
- Resources needed to train and engage community
- Computational experience still a gap to be filled
FNL employees now able to use Biowulf (NIH main system), transforming HPC capabilities
- No longer limited to using MOAB system in Frederick
New machines coming online to take images of molecules, generating terabytes of data in short periods of time
- Leading to increasing demand in HPC and data storage
In internal clusters, would be better if more available – always desire to scale up but need to find balance of capabilities vs. budget
- Recent need for bigger memory nodes
- ~50 nodes running 256 gigs of memory, another 50 with 512 gigs, still not enough
- Tools being developed by biologists/scientists but not computer scientists, leads to scaling issues as most solutions developed for limited use
- Going to do a tech refresh on older nodes with upcoming move
- Personnel - CGR fully staffed with bioinformaticists, fair amount of scripting occurring but not much software development
- Data storage – fairly comfortable with current resources and ability to offload to local resources to remain in steady usage state
Where to put storage, compute, and what are constraints on the network?
- Most budgets not designed to support types of networks geared towards impending requirements
- Leads to possibly having data centers with compute in close proximity to circumvent limited network capabilities
DOE ECP project – Exascale solutions for microbiome analysis – potential ties to Nathan’s group
- Possible interest in utilizing
- Next step to connect Nathan with DOE POC to discuss potential to leverage
NCAB ad hoc working group looking at opportunities between NCI and DOE
- Computation to be one of the first areas of focus
- Goal to build bridges, provide additional insight on focus areas
Public availability is key consideration with leveraging DOE capabilities
Trent – Emerging storage, metadata tagging were big items at 10/19 data summit
- Upcoming solutions have object data store model with files and annotation associated, in technology-agnostic way for increased vendor compatibility
Next Thought Leaders Meeting date pending
Attendees = Paul Fearn, George Zaki, Miles Kimbrough, Randy Johnson, Nathan Cole, Jack Collins
Agenda = Round of updates and open discussion
Updates
Paul – Projects with SEER program (NCI DOE collaborations), as datasets are assembled they seem to be hosted in different places (subcontractors, etc)
- Similar efforts with VA (NLP, ML) using similar datasets
- How can we take lessons learned from this to be portable across environments
- Want to create linkages b/t DOE, using 4 different registries, and VA
- Can environments be configured such that a model developed on one can be portable across others
DCCPS has upcoming workshops on data sharing – 10/18 & 11/8
- ID barriers to move data to other environments
- What are requirements for data sharing, to be explored
Q: (Jack) Possible overlap with resources developed at CSSI along with collaboration with Army
Paul – Internal meetings will identify common language, ID data use/sharing requirements, level-set process
Jack – VA shown interest in almost all branches of military
Paul – Hope to ID key issues/datasets by Jan 2018
Jack – FNL, Jeff Shilling and Dianna Kelley inquired to explore workflows on moving data to Biowulf, listing of what experimental instruments need to be housed onsite vs. what can be moved out
- Hoping to state which workflows work well (CPUs, GPUs) over the next few weeks
Object Store – Eric/George been looking at DME and Globus, apparent issues with files being dropped
- Although notifications indicate all files have been transferred, apparent loss of ~5% of files
- Long term storage strategy for HPC, solution needed to resolve dropped files
- CCBR pushing lots of files separately, Yongmei’s group moving one consolidated ‘tar-ball’ file
George – CCBR has 200T allocated on Cleversafe, configured for them to use data on API, they just need to put internal process in place to use
- Workflows pending to push data into archive and into processing pipeline
George – CANDLE workflow for hyperparameter exploration, originally used for DL but can be used for black box function
- Currently running on Biowulf
- If user is working on molecular dynamics problem as example, best solution reached by tweaking hyperparameters
- You define function as black box, bash script, define what hyperparameters you’d like to explore and train your system with, resulting in loss value
- All wrapped together in Biowulf
- Currently being used for image segmentations and Pilot 1 efforts
- Built into CANDLE framework
Data Management – Yongmei’s group ramping up production use
- Up to 8T data since April, 10K files with ~26 metadata describing each file
- Details in process for how to incorporate these into pipeline
- Globus has limits on how much data can be transferred simultaneously
- High Throughput Imaging Facility – imaging instruments generating lots of data, want a system to put into production so data can move from Samba into Cleversafe
- SBL labs (Yunxing) – desire to use Cleversafe to store data
Nathan – currently in planning stages, planning to move into building next to Shady Grove
- Everything in ATC storage room (HPC storage) to be colocated in Shady Grove data centers
- Construction to be completed late summer 2019, likely 2020 before relo
Able to piggy back on top of NCI Globus server to get storage mounted and get users started
- Able to use Globus as transfer system between ATC storage (~5 petabytes) to Helix
Randy – HPC User Group planning training on containers
- Scheduling next training session in October
Open Discussion
Paul – Wondering who is doing large-scale simulation and generation of synthetic data
- For SEER program/cancer surveillance, is there alternative to cancer surveillance as a passive process – assembling a big database, alternative to build simulation that builds a synthetic picture of cancer with active sampling to calibrate model over time
Jack – we’re generating synthetic data for electron microscopy clusters and testing genomic algorithms – comparing to what we’re seeing in intramural research program real data
- Mostly used for benchmarking and algorithm development, not scalable to HPC yet
- Similar to metadata sets which we’re doing with Army
Next Steps
Jack to have team write up summary, lessons learned from metadata set for Paul
Attendees = Paul Fearn, Miles Kimbrough, Tony Kerlavage, Randy Johnson, Jack Collins, Omar Rawi, Eric Stahlberg, George Zaki
Agenda = update from HPC SIG, Thoughts for FY18, Priorities, Other Items
HPC SIG Update
Thoughts on FY18
Next meeting, July 20
Attendees: Miles Kimbrough, Eric Stahlberg, Dianna Kelly, George Zaki, Omar Rawi, Greg Warth, Tony, Dianna
Two general categories
Steps or dates associated with that? Not specifically yet. Randy will help with PM work on this and we will get schedule for it
These are deliverables and milestones we’ve identified:
May 15, 2017 – Defined scope of initial HPC community; Have coordinated and held first NCI intramural HPC User Group meeting.
May 31, 2017 – Have scheduled and held first education opportunity involving HPC programming concepts; Have established HPC User Group communication mechanism and initial intramural HPC User Group communication plan.
July 1, 2017 – Websites updated to provide visible points of access to information on utilization of HPC resources available to the NCI intramural community.
July 30, 2017 – Have held second intramural NCI HPC User Group meeting
August 31, 2017 – Websites targeting NCI intramural community updated to provide HPC learning raesources accessible to the intramural community.
September 15, 2017 – Results of intramural community survey on HPC awareness completed and summarized in a report. An updated 2017 intramural HPC Needs Assessment in the form of a report. Recommendations and priorities for FY18 intramural HPC User Group activities in a written summary.
Attendees: Miles Kimbrough, Eric Stahlberg, Dianna Kelly, George Zaki, Omar Rawi, Greg Warth
Updates on Current Efforts
Attendees: Eric Stahlberg, Miles Kimbrough, George Zaki, Greg Warth, Nathan Cole, Sean Davis, Steve Fellini
Agenda:
Next Meeting – December 15th, 2016
Note – December 15th Meeting Canceled.
Next Meeting – January 19th, 2017
Attendees:
Miles Kimbrough, Eric Stahlberg, Kelly Lawhead, Braulio Cabral, Greg Warth, Nathan Cole, Xin Wu, George Zaki
Not in attendance:
Warren Kibbe, Dianna Kelly, Jack Collins
Agenda Review
Next meeting – October 20th, 2016 (may be rescheduled)
Needs and Updates:
HPC communications update (Miles Kimbrough):
Cloud Discussion
Leading Thoughts (opening up for discussion)
Take this back to your communities and ask and if there is interest share it with Kelly
Tentative Agenda
Upcoming events and activities of note – 10 minutes
Discussion: Ideas for future efforts/priorities in HPC– 15 minutes
Attendees: Eric Stahlberg, George Zaki, Dianna Kelley, Greg Warth, Jack Collins, Sean Davis
Updates & Upcoming Events
Data Services & Anticipated Priorities
Globus
Argonne Training Updates from George Z
Ideas for Future Efforts
-Challenge will be that sequencers will no longer be large-cap expenses
-Need to talk with lab chiefs to best understand future data needs
Next Meeting: September 15, 2016
Tentative Agenda
- New faces and introductions
- Needs and Updates Around NCI and CIT
- Frontiers of Predictive Oncology and Computing Meeting Updates
- Review FY17 Candidate Projects
- We were on hiatus for a little while
- Important to have these meetings more regularly and keep each other updated and aware of what is going on
- We have new faces and important to share updates
- Logistics updates and coordination support
- Suggestions on other priorities to pursue
Introductions
- Anastasia
- Miles Kimbrough
- Nathan Cole
- Carl McCabe
- George Zaki
- Warren Kibbe
- Greg Warth (Phone)
Needs and Updates
- CCR: Sean out of vacation: not much insight but one item is looking for ways for longer than 1-year retention for files (Xinyu)
- Storage for new instruments
- CIT – Nobody on phone to give details
- DSITP
- CBIIT
- DCEG- (Nathan)
- Other DOCS – No other updates
Logistics updates
- Communications plan being put in place by Miles
- Ramp up in August and run in September
- Open collaborate page to all members of NCI
- Yellow task that Eric is on ends in September – plan for how to continue this being worked on. Summary report on what we were able to do in first two years
- Overall perspective – Braulio yellow task is where we move programmatic support to
Frontiers of predictive oncology meeting
- Well attended nearly 100 individuals each day
- One room, enthusiasm, good networking time
- Limited range to roam encouraged people to have discussions
- Good insight shared in breakout sessions
- Planning a white paper by end of August to pool all input
- Survey in development – intel was asking how meeting went (being iterated on now) – keep paper work reduction act in mind and get intel to do this
- Planning next meeting – get information out earlier and better
- Blog post – makes sense to do with DOE
FY17 Candidate Efforts – HPC and Exploratory
- Data Services Environment
- HPC Support Core
- Cloud Resources
- Predictive Models Explorations and Assessment
*efforts are not distinct. They need to be coordinated and aligned overall
* Describe more about what the purpose of HPC is and less about the infrastructure and the “means to an end”. This is part of future visioning, underpinning of “why” we are doing this and what the purpose and impact is.
- Data call: they are specifically looking for HPC cost in server and mainframe category
- Capital acquisitions
- Pull from DCEG in terms of what they are anticipating (Nathan)/ or give a view and we can reconcile it with DCEG
- Eric to give presentation
- We don’t have a real baseline for new categorization
- Include server and mainframe investment cost to support personnel etc.
- Don’t know profile of what DCEG is but we’re doing HPC support for them. Don’t know size of their plant beyond a few stations in their lab
- Asked for Bioinformatics numbers by Melinda Hicks
- Not clear line between doing bioinformatics vs infrastructure necessary to make that happen- need to draw line between those two to not count dollars that support science
- Important to put cost of infrastructure and people to maintain it – this becomes an IT cost but people taking codes and rewriting them and running them on HPC is scientific computing and we don’t want to put that in
- We want to be careful and consistent on how we report on bioinformatics activities
- Eric: trying to break it out into categories with Jeff and Tony – total bioinformatics investment is fine but helpful to define physical plant and what it takes to operate that
- In terms of physicality (cost for power and cooling) no idea what those numbers are (not part of the report). But how much spent on storage probably is something we want to report
- Server procurements, storage procurements, etc.
- HPC costs in cloud getting pulled from Tony – cloud HPC resources
- Having info available and working with Karen to best sum it up
- Making bioinformatics info possible is good first line to capture and to know total bioinformatics cost is good but not from FITARA stance (don’t want to report that)
- Presentation
Give relative priority to these efforts and identify stakeholders
What amount of resources might be appropriate and when for these resources
What makes sense for FY17? What might we defer for later?
HPC support for FY17
- Eric goes through slide
- Prioritizing activities for next year to know how to allocate resources
- First three are program development
- Cloud storage pilot – what it would take to use the cloud for deep data archive
- Leveraging investment in data service environment
- What happens when you have all meta data and feature extraction – investing at least capacity to do high performance analytics and getting systems that have that capability accessible within context of researchers we are working with (supporting work in that space)
- Cloud container environment – containers operated internally and externally and portability between them
- Micro grants – consider as a way to bring external resources into the effort on small projects and supporting intramural investigators in that way
- Prioritizing these: not much time right now / add other ideas / weigh in on what priorities are
- Maturity of being able to do it (having partners ready to do it) rather than priorities setting. Are we ready to do it? We have plans in place to do these things. Identifying future timeline of when these things would be ready to start.
- Not a matter of ranking them in order
- This needs assessment was built off 2015 and various meetings we’ve had
- Developing involvement within community to help NCI participate in consortia
- Response to NCI DOE pilots and looking at NCI exascale cancer working group – pull people together more broadly – what are the applications of exascale? Extend beyond that group and get broader input, looking long term and being frank about level of computing investment we need to make. Who are partners for that? What is need? What is demand?
- Training and outreach – supporting education and development of awareness about what computing and data science can do for cancer research
- Need to do more HPC training, developing more applications, helping with those who have large data service needs and extending support on using cloud more effectively
- Need people to help investigators use cloud more effectively (2-3 individuals to cover that space)
- How do we take service now implementation providing request support for HPC support and develop that further so it’s a better interaction for the individuals who need that support
- Bioinformatics core interface – did users find it reasonably effective? Yes
- Investigators have an idea and communication was done well, so was coordination
- Last two looking at making sure we have project management support in HPC space as we have more request (becoming more project oriented as opposed to task oriented)
- TPM would have more ability to do technical support and know more about problem space rather than a PM. Needs to be depth of awareness about that (it is negotiable) – potentially someone with a lot of HPC knowledge and can translate to technical team
- For FY17 – taking what we’ve looked at to develop a basic service API that’s at an enterprise level and building it out to have stronger and deeper services. Developing façade on top of what many of our object store technologies are that we might use. Rationale is to provide flexibility that we don’t become vendor locked. As things become more capable and standardized, façade will get narrower and narrower and potentially disappear.
- Helpful to lay out some of initial projects and right size whole activity so not get carried away building things without having accurate picture
- Useful to call out number of FTEs required to build things out from a budgetary perspective
- Supporting data service environment
- Dedicated system administrative support for it
- How to look out to extend storage to different types of storage places like cloud, etc.
- Next steps: how best to get input and refine this with CBIIT budget process to prioritize. Give opinions on what we should do, shouldn’t do and defer or things not on the list to think about.
Attendees: Steve, Nathan, Greg, Xinyu, Dianna, Sean, Carl, Kelly, Eric, George, Omar
Agenda
DAT services and storage environment
We are going through PM change. Made some good progress , operational, working on setting up for select users to do testing
Steve and Biowulf group are in position to contribute to these conversations. Original implementation plan for Cleversafe in Bethesda is abeer/aver??– direct object storage APIs. If this is the case there is room for teaming up. Solving same problems so should talk as a group to move forward.
Advantages of having object store system is its dispersed geographically. Don’t need to back it up.
Nathan – would like to try it out and thinks this is direction to jump on or archiving
Nathan to drop note to Greg and he will set him up with training.
Other topics of potential interest
- Brainstorming – envision what the future is (Bob Coyne) – if interested in such a session let us know. think about looking ahead
- HPC support efforts (George doing a lot)
S are constant. Fixing and modifying to mak1e it run efficiently on Biowulf…from 5 days to run to 1 day. Scaling would do the same thing. This is just one Biowulf node. Parallelism on application is a lot. One node has 32 threads. Challenge is it would require major refactoring of application (not standard). If we go this path we can’t support every instance of this application in terms of updates
Not a salable way of spending time.
Name of application is MATS.
MATS NIH – also checked this version today. Currently testing it to see if it fills Dauoud’s requirement to run things faster.
Strategy Eric: if its an inefficient application, can we be make it more efficient?
George: its an app written by a post doc or grad student – not requiring much work to adjust
Way to run – always asks for 16 threads, and it only makes use of 4 or 8. Even if asking for more resources for Biowulf, allocated for no reason.
- Home grosn and other applications are being run (mat lab, etc.)
- Open ACC with PGI compiler (now available on Biowulf). If we have more of home grown application, showay and uld have workshop to learn how to optimize their applications in an easy way and making use of our resources in a good way.
- Hackathon in university of Delaware – George going. Load of GPUs and interests – maybe get GPU hakathon at NIH.
- Eric: other things goinhis g on in HPC – support for education and finding avenues where Can be effective
Helping connect those who might have an application with fact that HPC can make more teir research more effective or rapid
Computing and predictive oncology meeting in July 2016
Initiated out of doe collaboration.
Targeted in downtown DC
Working on logistics to confirm venue so we have meeting and define it
It’s a short time frame
One of reasons to have it in this ARA is to maximize number of people from NCI to participate with out travel
Around 100 invitees
Pull together where computation is and where its going and impact on predictive oncology
More details come up we will get this out
Sean: Data management and converged IT – potential conference /Summit
Email thread going out – idea that we have a lot of data and storage and storage needs. Two pieces to that puzzle making it work storage infrastructure and network infrastructure and connecting to that, and second is the meta data. Think about this and if anyone interested follow along in email conversation. Interest in some kind of conference or summit. Invite a few extramural people to tell us how they would have done things and talk about data management strategy at a higher level.
Reach out to Sean with interest.
Creative ideas for funding? CIT might help, warren’s office, NCBI.
Commercial sponsorship? Maybe open to that. It’s possible but they are not allowed to ask for it. If commercial sponsor came offering to do something we can do it but we can’t ask for it. It needs to come from someplace else. Maybe Intel is interested in something like that?
Funds needed to travel extramural folks in and for physical space. Can use local talent but maybe 4-5 ext. folks or potentially someone from bio team.
Maximum number of people expecting in attendance around could be 50-75 people or different approach and have set of speakers more open on first day then more focused group on second day. Could do it either way. Smaller scale could use shady grove (carl). Down staris conference room in shady grove.
Timing is right for this now.
Good time to get everybody together
Conversations continue on slack
Next Meeting – April 21
Agenda
Updates
Forward looking direction for Data Management, compute and Cloud (thought s and perspectives) what people would like to do and anticipate they would need to do:
Keep base technology as general as possible so we don’t put ourselves in a corner that’s heavy lift to transition
Next HPC Thought Leaders Meeting – March 17, 2016
Agenda
Quick Updates
Storage and Data Management
Make sure we start talking through the issues.
HPC Long Range Planning
Eric to send document that was pulled together – this will get informed and updated so people can take a look at what was done.
HPC FY 17 Priorities
Next HPC Thought Leaders Meeting - February 18th, 2016
Pre-Meeting Discussion
Agenda
DOE Collaboration
Questions
Next Steps and Action Items:
Next HPC Thought Leaders Meeting – January 21st, 2016
HPC Needs Review and Refresh
Comments from everyone on priorities and order
FY16 HPC priorities
Other items
Eric action item – get input from NCI: Get transcript from last presentation that Warren did (transcript of questions to see if it can be better developed)