Date & Connection

13:00-16:30 PT

BlueJeans, Science Platform Team Room: https://ls.st/lsx

Attendees

Goals

  • This is a technical discuss-and-decide meeting. Participants are assumed to be familiar with the necessary background.
  • The goal is to resolve (or set up actions that lead to resolution) the identified operational issues surrounding stack and science platform container builds
  • Resolution includes:
    • A decision in this meeting (record in table below) - action for implementation ticket or doc update
    • Actioning (ticket) persons to write a technote outlining a rational and proposing a decision
    • Actioning (ticket) someone to propose an RFC
  • This is an open meeting but non-agenda items will be directed to the walk-on sessions and timeboxed.

Discussion items

TimeItemDecisionsDiscussion prompt
13:00 PTDM focus
rubinenv background (summary from KT)
15 min

Pip v conda in nublado and user environments

  • Historically we used pip to install packages in containers
  • We now have switched to conda following stack upstream
  • What do we do about users (up to now we have told them to pip install —user)?
  • Lack of pinning in rubinenv - we have previously been asked to pin by PO
20 minBackward (science build) compatibility in nublado
  • We have to clarify with Product Owners the backward compatibility guarantee in nublado; can we limit it to eg. commandline
  • We need a technote to explain how that guarantee is going to be met (TJ/KT/FE)
  • We are currently unable to guarantee backward compatibility as the current nublado machinery can’t always run old containers
  • Backward compatibility guarantees are meaningless unless tested
  • We should agree on whether
    • We really want to offer backward compatibility
    • What the limits of it are (eg all official stack releases since DR1? Last 2 years of stack officials? etc)
    • Actually add those containers to RoboSimon and fix issues if they arise and/or
    • Routinely rebuild old stack release containers
20 minNon-stack science packages in nublado
  • SQuaRE will create a process/configuration to cleanly separate service machinery packages from user-requested packages
  • SQuaRE will propose a process to DM-CCB that maintains provenance of package installation requests (and consider what testing can be added for 3rd party requested packages)
  • SQuaRE/KT to produce a technote describing how to manage 3rd parties technically, eg. with a stack→3rd party→JL cake
  • JL extensions by users for now have to be owned by SQuaRE. In the future pre-built extensions will be available in JL3 and may be a partial or complete solution to that problem
  • We are required to offer popular common astronomical packages in the nublado environment
  • Right now nublado machinery and user requested packages are comingled ← problem
  • Process for dealing with user requests is ad-hoc
15 minSims
  • lsst_sims support in nublado will wait until sims builds with conda
  • at which point it will be treated as a third party (see above)
  • meanwhile users who have asked for lsst_sims in nublado have access to lsst-dev and are not blocked
  • for summit use, for now, people can install in their home directory using eups
  • the new build engineer could be made available to help the Sims team transition to their conda model if they can't get there themselves


5 min

SQR-054 review


We've done most of sqr-054 already.

  • There's only one Python 3 on the system, and it comes from the stack.
  • We use conda for everything we can.
  • There's only one kernel, and it is based on the stack installation.
  • There are currently no pinned builds.
15 min

Walk-on issues

  • Naming/version convention for stack releases
  • multiple lab containers per stack container naming convention 
  • document naming convention for official releases to be specified so it can be testable (KT)
  • SQuaRE will write a test class on the python side
  • each time we build a new version of a container we should change a version by upping a docker tag
  • the above is done for technical reasons and should not be visible to the user - newest is best
  • we should not depend on metadata in Docker Hub that is not available in the protocol through eg Nexus

14:30 PTBreak

15:00 PTT&S Focus

10 minSeparate process for designating recommended
  • SQuaRE will implement a way for the telescope deployments to have a different recommended per telescope deployment
  • Tel deployments need to have different recommendeds than Sci deployments
30 minThe T&S/stack/JL layercake
  • try using the stack→t&s container that t&s builds and layer lab on. top - may need multi-usermode work done 
  • Currently the cake is stack→lab→t&s
  • This is bad, how can we do better
  • t&s more integrated with dm than anticipated (this is good, but there is entanglement)
30 minRapid turnaround for commissioning
  • there is going to be a daily nublado-tel container build that consists of: stack daily → t& sanctioned → lab-tel
  • we need to establish a time-budget that results in the above being produced and available in < 3hrs so quick fix turnaround before next night's observing is possible
  • we should consult widely and revisit the times for the daily builds of services to prioritize start of data taking
  • during-night hotfixes are possible using eups within the fixer's container, and can be upstreamed for integration into the daily build for next night
  • similar approach could be taken for non-Science Platform containerized services (eg. header service, OODS)
  • Time budget for fixing something reported overnight and having it available that night
  • Higher frequency dm builds on slower cadence t&s builds
5 minSQR-056 review
  • change SQR-056 to specify that NTS is available for non-maintenance window integration work if you ask #ncsa-integ-teststand (but avoid night-time in Chile)
  • when the base teststand is available, then the tucson test stand can be used an integration environment
  • Maint windows working well for Tel
  • NTS is both considered prod and int ←  this is a problem
15 minWalk-on issues
  • versioning for lab-tel containers has to reflect both the stack build and the t&s sanctioned version (user-visible)
  • SQuaRE to figure out how to make sure spawner page does not display containers built on obsolete sanctioned versions

16:30 PTEnd

Action items

  • Discuss pinning requirements for nublado and Stack with product owners (especially GPDF); develop a plan or get requirements (at least partially) waived - Frossie Economou Kian-Tat Lim 
  • Discuss backward compatibility guarantee with product owners (e.g. is shell good enough?) - Frossie Economou Kian-Tat Lim -  
  • Write a tech note to explain how users get reproducibility via Pipelines software though notebooks are changing - Tim Jenness Kian-Tat Lim Frossie Economou -  
  • Write a tech note to explain how an intermediate Science Pipelines + desirable third-party packages container can be change-controlled and built (but probably not tested, at least not extensively); this would be input to the sciplat-lab container - Kian-Tat Lim -  
  • Document release (and release candidate) version string format (as used in container tags) in SQR-016, and incorporate mechanisms to enforce this format into the Jenkins official release Groovy code - Kian-Tat Lim -  
  • Write code to test that version/tag strings are in the correct format and to sort as needed - Adam Thornton -  
  • Add build datetime to lab container tags - Adam Thornton -  
  • Allow different "recommended" containers by deployment - Adam Thornton -  
  • Allow non-SQuaRE control over "recommended" designation with different approvers per deployment - Adam Thornton -  
  • Build a daily "tel" container containing the daily Science Pipelines build and a sanctioned TSSW build - Tiago Ribeiro -  
  • Attempt to use the daily "tel" container to build the sal-sciplat-lab containers; use the "cycle" number from the "tel" container as part of the sal-sciplat-lab container tag - Adam Thornton -  
  • Define a deadline for daily sal-sciplat-lab container availability - Michael Reuter 
  • Adjust Jenkins to meet sal-sciplat-lab build completion deadline, including upstream builds and containers - Kian-Tat Lim -  
  • Do not display sal-sciplat-lab containers with an unapproved cycle number in the container selection interface - Adam Thornton -