Review plans for John Swinbank leaving the country / the project over the next few weeks.
At (home) desk in Seattle through next week (until 2020-09-25); expect to continue regular work.
Flying to NL 2020-09-30. At this point, Yusra AlSayyad (overall manager, focus on DRP) and Ian Sullivan (deputy, focus on AP) assume full management responsibilities for the Science Pipelines team.
In quarantine in NL through 2020-10-14; will continue tidying up loose ends & writing documentation, and will be available for questions, meetings, etc, on demand.
2020-10-15 onwards: stepping back from regular work on Rubin. Will continue to be available by mail (and Slack, probably) for questions, discussions, etc, on request.
Melissa Graham has proposed a model for technical support during construction, which spawned some discussion on RFC-703.
She has subsequently been developing these ideas into DMTN-155.
Do we have DMLT sign-off on these ideas? In particular, do they provide an adequate level of support to the community, without placing an excessive burden on the construction team?
How will these plans be communicated to the wider community?
The scope of this document is “science-level” questions, not technical support at the level of lost passwords etc.
The IT helpdesk will be provided by NOIRLab, SLAC, NCSA, etc.
(Leanne summarizes the DMTN; not transcribed here.)
Community engagement in operations is described in RTN-006.
How do we determine whether a given issue is an IT problem or a scientific issue?
Often, it will be “obvious”.
Where necessary, CET will provide triage through the Community Forum.
Request for more boilerplate: when a question has already been answered in Slack, but we need to take it to Community.
In general, people shouldn't answer on Slack, so ideally this doesn't happen.
But when it does, we should aim to copy & paste.
There is concern that running support through Community will increase the load on the DM team relative to Slack.
We expect it to continue to evolve based on experience from construction and DP0 (1, 2).
And the SST, CET, etc are very much open to feedback and suggestions.
This is not just an effort to prepare for ops, but also an effort to relieve growing pressure on the construction team.
There are requests for the CET to take a larger role sooner, but this is limited by its scope and funding source, which are both tied to operations.
We do have to ensure we acknowledge especially external contributors, and not simply try to redirect them to Community.
It is important that community support be tied to user-facing documentation.
The SST/CET should provide feedback to the development teams about where documentation needs to be improved.
Further discussion should be redirected to #dm-sst on Slack.
We expect the CET to play a coordination role in documentation, but will require inputs from the Pipelines and Algorithms teams.
Detailed breakdown of responsibilities is TBD.
CET is already working on this for DP0 in conjunction with DESC.
Leanne Guy — update DMTN-155 to reflect how to move answers which have already been given on Slack onto Community. DMTN-155 updated - please review
John asks about LSST-dev which can probably be fixed using lsstinstall. Are there other depencancies on newinstall? Seems not (most use containers which embed newinstall; changing container build fixes all of these).
RHL asks about Telescope and Site - they build on top of our containers so should be ok
GPDF asks about long term stability/availability of conda-forge ? KT thinks it has a multi year horizon, conda has interesting history and future partially supported by commercial company. Conda-forge is much more community based and has a big community.
John - who is the product owner for the build system ? - Unfortunately it's KT owning and Managing. Does KT understand who all the stakeholders are ? KT is confident he knows the people with Jenkins jobs and who the user base for lsstsw and newinstall - will go to community in any case.
Kian-Tat Lim attach diagram to this confluence page
How are validity ranges stored : Tim - uses the directory structure and filename. QE curves come from Camera directly and are imported. Jim - big wall in gen 3 be tween certified and those not yet certified. Export and import deserve the ??? we need more research on that.
RHL not sure squash in there for e.g. images. CamGeom deprecation was slipped in the document .. though Jim and TIm want to do this but surprised to see it in here. Otherwise happy with Document.
John - if there are technical comments it does not need all DMLT but then we are back to the outstanding action.
Colin - found it difficult to get a feel for what its describing - KTs diagram is a huge help. This may be partially why DMLT have not commented in detail. John agrees on the contend gave similar feedback to Chris - but nothing from DMLT was taken as all ok not befuddlement. If the latter we should include diagram and update.
Tim - defects easy to handle perhaps its worth having a worked example. Jim asks if KT diagram works for defects .. Tim says yes but there may be other approaches.
Jim - technote is good for the products which are fairly automatic (human yes/no) not the merged by human ones. John - we need write down we do not know when that is the case. This is somewhat the case in this doc
GPDF crosstalk corrections are handled ? Tim - yes. In a given CDB3 instance when you replace a calibration is it replaced (is it bi-temporal). Jim its not but the idea would be to have a new collection not to actually replace the old one (new name).
RHL - all the special cases for detectors are not covered - it may not be a uniform and nice as this makes it out to be. It could be messier when we get to it ... so hesitate to sign off. Back to CameraGeom ....
John Swinbank Arrange focused brainstorming meeting with RHL, TIm, Chris , Jim, Yusra, John - to get DMTN-148 further updated . Should at least list all calibration cases even if not solved.
Jim - how we access calibrations is different to how they are written - may need Robert to propose an alternate design. There is a feasability issue.
KT - best way forward ?
Christopher WatersKian-Tat Lim Modify DMTN-148 with more diagrams (from KT) and explicit statements about which products it applies to .. and which it does not apply to (and when).
From zoom:
John - where were we commenting on this document? From John Daniel Swinbank to Everyone: (8:59 p.m.) https://github.com/lsst-dm/dmtn-148/pull/3 From Gregory Dubois-Felsmann to Everyone: (8:59 p.m.) Is what Jim said a couple of minutes ago about what happens in BG3 when a calibration is certified going to be included in DMTN-148? From Tim J to Everyone: (9:13 p.m.) I think one of the things is that pipelines just need to be configured to use specific dataset types — that’s the optimal approach for a pipeline. Having every pipeline instead require a composite cameraGeom is overkill From Tim J to Everyone: (9:14 p.m.) but from a commissioning perspective it’s clearly easier for Robert to have access to everything in one blob From Robert Lupton to Everyone: (9:17 p.m.) I'm worried about notebooks, not pipelines. It's possible that pulling out a set of n parallel data products with the same dataId is OK, but it pushes the book-keeping onto the code. That's not too bad until the code starts by updating some of the values (e.g. the gain). Then the code becomes much more complicated, but if we just allowed setting values on the camera and doing a "put" makes the user's job much simpler. So it's a tradeoff. So that notebook may become a calib-products "pipeline". But a weird one
NCSA has instituted a 2FA requirement for the new lsst-login servers.
Either SSH with password + DUO
Or Kerberos + DUO
After authentication, a control connection can be used to avoid further authentication.
Kerberos renewable tickets can be used for 25 hours / one week without further renewal.
But DUO is still required.
At Princeton, it's possible to use DUO + an SSH keypair.
Concerns from NCSA that SSH keys stored without passphrases are less secure.
Use of DUO at NCSA is required UIUC.
Question before the DMLT: how much should we care?
We presume that Wil has the authority to define policy and accept risks based on such a tradeoff.
It's not clear that this could overrule UIUC policy, though.
We assume there is a fair bit of discretion on behalf of NCSA security staff about how that policy is implemented.
We could make functional requests of NCSA (“we want persistent connections”), or implementation requests (“we want SSH keys”).
Unknown User (mbutler) — understand the parameter space for getting a “long term lease” on an SSH connection to NCSA, and discuss with Wil O'Mullane what wiggle room we have.
13:15
Generation 3 middleware plans and acceptance criteria
Aiming for “Gen 3 ready for general use” by November 1st.
Do not anticipate formal acceptance testing on this date; handover will be based on completed Jira tickets, rather than a test campaign.
However, functionality is regularly tested in CI.
First priority is schema changes; aim to resolve them quickly, since they are maximally disruptive (may require re-ingest).
Following this milestone, we should discourage use of Gen2 whenever possible.
This milestone will rely on a shared database.
However, it is expected that the system is usable at this stage; some things might still be easier in Gen2, but not many.
QuantumGraph generation time is being addressed before hitting this milestone.
Note that “feature parity” here is explicitly for middleware; Science Pipelines features available in Gen3 will be later, but is currently a high priority.
But outputs from Gen2 pipelines can be converted to Gen3 for analysis.
Leanne Guy — agree Gen3 acceptance tests for November 1. DM-26798
-
Getting issue details...STATUS
Yusra AlSayyad — provide a timeline for complete pipeline conversion to Gen3.
In discussion at the JDR, a couple of issues emerged surrounding DM's milestones:
The review recommended that our milestone tracking being more automated / streamlined;
Existing milestones being poorly defined (to the extent that the responsible T/CAMs don't know what they mean).
How can we address these?
Recording is on by consent of all for internal use.
Frossie says she did not hear it quite the same (for first point of slide 2)- automation would be good. But we need a coherent story. Would be great to have automation for Levl3 milestones - but unlikely to get it.
From chat: problem is that the milestones are not written in quantifiable ways
Question about lag - yes updates lag by a month.
How do you know which milestones are dependened on by others .. in DMTN-158 which show predecessors and successors.
Could add line for predecessors, sucessors .. Michelle/Yusra woudl like that.
John Swinbank add predecessor successor line to milestones in DMTN-158 –
KT AP Gen3 assumes all raws etc all in butler - Yes
KT Alert packet cutout sizes are limited ? yes - more work to be done
TIm - WCS is it AP or DRP ? Formally its AP. Dave Berry contract to modify AST to export Yaml ASDF format, WCS understandable by AstoPy. Means any AstroPy user can download Calexp and use our WCS.
GPDF - still outputting approx WCS in FITS standard as well as the YAML? Yes no change - ASDF has fits translation format will try to use their scheme.
Michelle - running any AP pipelines at NCSA ? Should NCSA start running them. - That would be great trying to move more to the DRP mode but there have been a lot of things holding the team back. In next few months... Ian Eric ..
Fritz Mueller — write a SOW for APDB POC on a cloud provider.
Is there a plan to update estimated object counts?
Yes, although this primarily comes through the PST. No progress recently, but should look at this on a 6 month timescale.
DRP:
Seeing similar burnout issues to those reported by other teams.
Note that it's hard for Tim Jenness as middleware manager to keep track of what DRP (and other) team members are doing in their non-middleware time (including personal issues, etc); consider having Tim attend T/CAM meetings, or sprint planning with DRP (and other) team members.
Do we have a clear understanding of who is responsible for solving “the TAP schema problem”? Getting data ingested into the database visible in the TAP service.
Architecture has provided some tooling for this.
But linking those tools and providing appropriate metadata is the responsibility of Pipelines and DAX teams.
DAX will provide a Felis description of catalog data for ingestion.
Requires further work in FY21.
Wil will look further into how the responsibilities break down here; this may be an update to DMTN-155 (or it may be elsewhere; operational procedures?).
T/CAMs — at the next T/CAM call, consider plans for appropriately tracking schedule/variance in the era of Covid.
Kian-Tat Lim Convene a meeting with Colin, Tim, Robert, Yusra to resolve graph generation with per-dataset quantities (likely based on Consolidated DB work).
In May 2020 we were unable to make a 19.0.1 patch release because of incompatible changes to the build and release system since the 19.0.0 release. The Architecture team were tasked with updating and simplifying the build and release system to ensure that this couldn't happen again (ie, whatever changes are made to the underlying infrastructure, we should always – within reason – be able to reproduce and update old releases). This session is an opportunity to review the plans that were made and the progress towards implementing them.
As we move closer to operations, members of both Science Collaborations and the wider scientific community are taking an increasing interest in using our Science Pipelines and other software. We need to be able to provide them with technical support, without imposing an unreasonable burden on our on-project staff. In particular, in May of this year, specific concerns were noted about members of the community using Slack channels which were originally indented for technical discussion on the DM system to ask for technical support.
Providing a coherent approach to support is challenging, given the wide range of interests and skills in the community, limited on project resources, and the need to provide a system which both supports the construction project now and which fully transitions into the System Performance department's Community Engagement team in the future.
How much progress have we made since May? Do we now have a coherent message on what support we are providing, and through which channels? Have we clearly communicated that message to the leadership of the various science collaborations?
Melissa Graham I (Leanne) might call on you to join this session