This is a technical discuss-and-decide meeting. Participants are assumed to be familiar with the necessary background.
The goal is to resolve (or set up actions that lead to resolution) the identified operational issues surrounding stack and science platform container builds
Resolution includes:
A decision in this meeting (record in table below) - action for implementation ticket or doc update
Actioning (ticket) persons to write a technote outlining a rational and proposing a decision
Actioning (ticket) someone to propose an RFC
This is an open meeting but non-agenda items will be directed to the walk-on sessions and timeboxed.
We need a plan to address or to waive off the requirement for pins (KT/FE)
Historically we used pip to install packages in containers
We now have switched to conda following stack upstream
What do we do about users (up to now we have told them to pip install —user)?
Lack of pinning in rubinenv - we have previously been asked to pin by PO
20 min
Backward (science build) compatibility in nublado
We have to clarify with Product Owners the backward compatibility guarantee in nublado; can we limit it to eg. commandline
We need a technote to explain how that guarantee is going to be met (TJ/KT/FE)
We are currently unable to guarantee backward compatibility as the current nublado machinery can’t always run old containers
Backward compatibility guarantees are meaningless unless tested
We should agree on whether
We really want to offer backward compatibility
What the limits of it are (eg all official stack releases since DR1? Last 2 years of stack officials? etc)
Actually add those containers to RoboSimon and fix issues if they arise and/or
Routinely rebuild old stack release containers
20 min
Non-stack science packages in nublado
SQuaRE will create a process/configuration to cleanly separate service machinery packages from user-requested packages
SQuaRE will propose a process to DM-CCB that maintains provenance of package installation requests (and consider what testing can be added for 3rd party requested packages)
SQuaRE/KT to produce a technote describing how to manage 3rd parties technically, eg. with a stack→3rd party→JL cake
JL extensions by users for now have to be owned by SQuaRE. In the future pre-built extensions will be available in JL3 and may be a partial or complete solution to that problem
We are required to offer popular common astronomical packages in the nublado environment
Right now nublado machinery and user requested packages are comingled ← problem
Process for dealing with user requests is ad-hoc
15 min
Sims
lsst_sims support in nublado will wait until sims builds with conda
at which point it will be treated as a third party (see above)
meanwhile users who have asked for lsst_sims in nublado have access to lsst-dev and are not blocked
for summit use, for now, people can install in their home directory using eups
the new build engineer could be made available to help the Sims team transition to their conda model if they can't get there themselves
5 min
SQR-054 review
We've done most of sqr-054 already.
There's only one Python 3 on the system, and it comes from the stack.
We use conda for everything we can.
There's only one kernel, and it is based on the stack installation.
There are currently no pinned builds.
15 min
Walk-on issues
Naming/version convention for stack releases
multiple lab containers per stack container naming convention
document naming convention for official releases to be specified so it can be testable (KT)
SQuaRE will write a test class on the python side
each time we build a new version of a container we should change a version by upping a docker tag
the above is done for technical reasons and should not be visible to the user - newest is best
we should not depend on metadata in Docker Hub that is not available in the protocol through eg Nexus
14:30 PT
Break
15:00 PT
T&S Focus
10 min
Separate process for designating recommended
SQuaRE will implement a way for the telescope deployments to have a different recommended per telescope deployment
Tel deployments need to have different recommendeds than Sci deployments
30 min
The T&S/stack/JL layercake
try using the stack→t&s container that t&s builds and layer lab on. top - may need multi-usermode work done
Currently the cake is stack→lab→t&s
This is bad, how can we do better
t&s more integrated with dm than anticipated (this is good, but there is entanglement)
30 min
Rapid turnaround for commissioning
there is going to be a daily nublado-tel container build that consists of: stack daily → t& sanctioned → lab-tel
we need to establish a time-budget that results in the above being produced and available in < 3hrs so quick fix turnaround before next night's observing is possible
we should consult widely and revisit the times for the daily builds of services to prioritize start of data taking
during-night hotfixes are possible using eups within the fixer's container, and can be upstreamed for integration into the daily build for next night
similar approach could be taken for non-Science Platform containerized services (eg. header service, OODS)
Time budget for fixing something reported overnight and having it available that night
Higher frequency dm builds on slower cadence t&s builds
5 min
SQR-056 review
change SQR-056 to specify that NTS is available for non-maintenance window integration work if you ask #ncsa-integ-teststand (but avoid night-time in Chile)
when the base teststand is available, then the tucson test stand can be used an integration environment
Maint windows working well for Tel
NTS is both considered prod and int ← this is a problem
15 min
Walk-on issues
versioning for lab-tel containers has to reflect both the stack build and the t&s sanctioned version (user-visible)
SQuaRE to figure out how to make sure spawner page does not display containers built on obsolete sanctioned versions
16:30 PT
End
Action items
Discuss pinning requirements for nublado and Stack with product owners (especially GPDF); develop a plan or get requirements (at least partially) waived - Frossie EconomouKian-Tat Lim -
Discuss backward compatibility guarantee with product owners (e.g. is shell good enough?) - Frossie EconomouKian-Tat Lim -
Write a tech note to explain how an intermediate Science Pipelines + desirable third-party packages container can be change-controlled and built (but probably not tested, at least not extensively); this would be input to the sciplat-lab container - Kian-Tat Lim -
Document release (and release candidate) version string format (as used in container tags) in SQR-016, and incorporate mechanisms to enforce this format into the Jenkins official release Groovy code - Kian-Tat Lim -
Write code to test that version/tag strings are in the correct format and to sort as needed - Adam Thornton -
Add build datetime to lab container tags - Adam Thornton -
Allow different "recommended" containers by deployment - Adam Thornton -
Allow non-SQuaRE control over "recommended" designation with different approvers per deployment - Adam Thornton -
Build a daily "tel" container containing the daily Science Pipelines build and a sanctioned TSSW build - Tiago Ribeiro -
Attempt to use the daily "tel" container to build the sal-sciplat-lab containers; use the "cycle" number from the "tel" container as part of the sal-sciplat-lab container tag - Adam Thornton -
Define a deadline for daily sal-sciplat-lab container availability - Michael Reuter -
Adjust Jenkins to meet sal-sciplat-lab build completion deadline, including upstream builds and containers - Kian-Tat Lim -
Do not display sal-sciplat-lab containers with an unapproved cycle number in the container selection interface - Adam Thornton -