NIH | National Cancer Institute | NCI Wiki  

1) Integrated Azure's Kubernetes \ Storage \ Networks \ VM pools \ and Docker Registry pieces with CodaLab with the help from Patrick Flickinger. 

  • UPDATE: Codalab is loaded into kubernetes so that we can minimize VMs that will be running. While this should streamline things we are having huge amounts of trouble getting the ingress-nginx helm chart working properly. Also the cert-manager chart integration isn't quite working. Patrick has determined that this is related to incorrect documentation on the Microsoft side.
Planned for next month

1) We make the VM resource pool contain GPUs, but we are deploying into US West 2 right now and it needs to be US East for compatibility with this thing called an "Express Route" from Partners to securely transfer data. It is only available in the US East and US East 2 zones.

2) Test Phase ends  9/3 and participant finalists will need their Dockers run on the CBIT platform as well as at MGH.

  • UPDATE: All dockers were run on the an Azure environment (V100 GPU). All 6 finalist's dockers produced results consistent with the results reported on the MedICI platform.

3) Upload Carolyn's video to Medici website.

  • UPDATE: Need to change it over to YouTube hosting.
Comments

Things that went well:

  • Participant submissions were evaluated successfully
    • However, some participants didn't use the right folder naming convention
    • Some didn't have internal code setup correctly to name their own intermediate results properly so that the algorithm ran. I needed to spend some time with 2 participants on zoom calls and 3-4 if you include email correspondence. Not sure how to handle this in the future but automating these submissions will become challenging if they don't just "work" on their own. 
    • Participant docker images are located here for the time being: https://www.dropbox.com/sh/r1nu4ovrjh4j3uf/AADKOeAPknQgEZQFuEOavbDMa?dl=0
Comments

Things that could be improved:

  • Azure integration is pretty basic still. We need to test a real algorithm on real GPUs.
    • UPDATE: We ended up testing the first architecture run on a GPU machine (while CodaLab was still it's own VM) and everything worked. As we migrate to the new kubernetes deployment of CodaLab we will need to reproduce this functionality
  • As far as the new infrastructure is concerned better documentation would really help. There will be a huge need to collect what "worked" when this is completed. 
MeetingDate
Bi-weekly Meeting #1

 

Bi-weekly Meeting #2

 

Milestones

DescriptionDate

 

Task

DescriptionResolutionStatusCreation DateClose Date

Get sample pneumonia algorithm running on Azure VM

Need to increase quotacomplete

 

 

Get pools of VMs in Azure integration to be based on GPU machines


Depends on above taskpending

 

 

Risks

DescriptionMitigationRankStatusCreation DateRealizationClose Date
If we can't get GPU machines hooked up to Kubernetes in Azure then we can't run algorithmsExecute above tasks and get help from Patrick and or Azure support team--closed

 

We needed to request additional access to GPU machines through the Azure service portal

 

Ingress service in Kubernetes is not behaving like the standard tutorials or documentationKeep plugging away to find the correct configuration1open

 

Azure has inconsistent docs on how to set this service up.--
  • No labels