We discussed whether some DM work can be pushed into the future as effectively a “no cost extension” to construction; the practicalities of this are not clear...
In terms of what's acceptable to the agencies.
And in terms of whether there will be staffing in the operations era to work on this.
If there are people in your team/institute who would be interested in moving to AURA positions, please let Wil O'Mullane know.
There are practical issues here, regarding which states are acceptable for people to reside, and whether there is office space available for them.
And of course the downsides to breaking up groups.
DM10 alert filters interface may actually represent a scope increase in 02C.03 in terms of providing a mini-broker interface.
We believe that the current portal is sufficient for commissioning data releases (after finishing the closeout plan).
We discussed increase staffing load resulting from making commissioning data widely available: this is expected to be significant.
“As discussed at the PST F2F meeting in September, DM should review the names of people going into commissioning” (requested by Leanne Guy )
Not immediately clear whether this means permanent reassignment of staff, or those DM folks who are temporarily assigned to assist the Commissioning Team with specific activities.
The commissioning support activities listed are validation of the DM system, but verificationof the LSST system.
None of the commissioning activities listed cover aspects of the system outside Science Pipelines.
Concern expressed about the impacts of people being reassigned to commissioning on the rest of the DM team.
Often the people who might be most effective with commissioning are also those most necessary for facilitating efforts within DM; concern expressed that this will have a disproportionate impact on the DM schedule.
Detailed definition of the contents of the commissioning tests is with Leanne Guy and Keith Bechtol . Expectation that they will often involve repeating tests that have been performed within DM.
Early Ops funding is now available, and during FY19 (ie, this year) the ADs for both Data Facility and Science Operations (Margaret Gelman and Wil O'Mullane) are funded at 0.25 FTE. That's half an FTE coming out of DM management. How are we handling that?
Similarly there's a total of ~15 FTEs funded across Data Facility and Science Ops during FY20.
Please review:
The plans and schedule for transitioning staff;
The activities which the Operations Team will be carrying out with this effort, and how they relate to ongoing DM construction and the commissioning effort.
Some milestones (e.g. ops rehearsals) are effectively duplicated between DM and pre-operations funding; where possible, they will migrate from DM to pre-ops.
There's a worry that commissioning data maybecome available to the public more quickly than we currently plan; DM should be ready to scale up to address this.
And a feeling that simply making the data available for download will not adequately address this need.
But we acknowledge the cost and schedule impacts of this.
There's also a concern about what level of end-user support is implied by this.
How do we handle folks being required in DM and commissioning and pre-ops?
This is a matter of ongoing planning. There may be some overlap/double counting.
Discussion of how open the access to commissioning data & facilities should be, balancing getting input on commissioning from members of the community with the support load and “chaos” of wide access.
Wil O'Mullane — coordinate the writing of a memo describing what community DM can support during commissioning.
The proposal is to develop a policy for handling packages which have been developed externally and which their authors offer up for inclusion in pipeline processing, the Science Platform environment, or elsewhere in the DM system.
Obvious examples might be scientific algorithms contributed by the community.
There is history of external users refusing to make contributions like this due to the demands of DM engineering (code quality, review, tests, etc...).
Based on the modeling work done before the review this year, review proposed product tree, components characterization and relation with the document tree
“Inside pipelines, there are more or less five products” — not every Git repository or software package is a product.
“SW Products can depend on other SW products without containing them” — so there is a SW product that contains e.g. the Butler, which can be depended upon by other products.
SW Products are the unit of release.
Dependency relationships happen between SW products, rather than between SW packages.
All DMLT: Review product tree in LDM-294 and provide feedback/corrections to Unknown User (gcomoretto) .
Unknown User (gcomoretto) : to produce a technical note including all products from the product tree and their characterization.
The Data Facility can execute based on whatever version of the middleware the developers are using.
There's no requirement for pipeline developers to support Gen2 from the LDF point of view.
Most Data Facility work going forward is based on Pegasus; there is minimal ongoing support for DESDM.
Assuming availability of pipeline code, the LDF predicts that they could run pipelines in a “sustained processing” mode based on Butler G3 and Pegasus in mid-2019.
Two possible goals for BG3 priority:
Support for obs_lsst (to make RHL & Merlin's life easier)
Or to convert code to PipelineTask to enable execution at scale of e.g. HSC on the Data Facility.
The consensus is that the latter is the priority; agreed to prioritise the conversion of ci_hsc Tasks to PipelineTasks until end of Jan 2019 per Fritz Mueller 's recommendation.
This would also meet Frossie Economou 's immediate validate_drp use case.
We agreed that temporarily abandoning the shared-nothing model for execution might enable faster development.
We should also prepare for “plan B” by assembling a “mini-working-group” to consider wholesale technological change (mini-WG to consist of at least Kian-Tat Lim & Simon Krughoff; not to involve folks who are busy with the ci_hsc conversion ).
Note
title
Further middleware discussion
A further, unscheduled, discussion of middleware took place at 09:00 on the following morning. Notes from it are recorded below. They supersede the above, and render some action items previously recorded here obsolete.
John Swinbank — designate and/or start a recruitment process for a “systems programmer” to act as long term middleware owner.
Atmospheric absorption data structure TBD; may be a lookup table, for example.
Need to capture generation of a distortion model; this should be part of calibration products (ie, capture the output of Jointcal for use in pipelines processing).
Outstanding question is which calibration data has to be fed through the prompt processing system for on-the-fly adjustments of the observatory configuration.
The answer is “no” — where necessary, they can be deployed as part of T&S software on the mountain, not in a DM execution framework.
We expect the Commissioning Cluster will not be reliably available during operations, so would not be a good home for this. Worries were expressed that the future capabilities, and requirements for capabilities, provided by a Commissioning Cluster-like service are unclear.
John Swinbank / Robert Lupton — Ensure that the plans for how calibration products pipelines will be executed during operations are clear, including e.g. executing Jointcal to produce a distortion model for ingestion into the science pipelines.
Wil O'Mullane — Add a discussion of future requirements for Commissioning Cluster-like capabilities during the operational era to the AuxTel workshop in January.
Are there ways of providing greater uptime and lower latency using rolling upgrades, schema evolution with backfill, planned Observatory maintenance windows and daytime maintenance, etc.?
What are the features and timeline for DBB?
Discussion about how databases are represented within the DBB.
The DBB has to know about different types of databases; an Oracle DB is replicated differently from other technologies, for example.
Qserv is not part of the DBB.
Even Qserv replication is entirely separate from the DBB.
How are spherical geometry queries performed within the consolidated DB? This remains unresolved.
ADQL support is required for performing TAP queries.
Should incorporate availability constraints into test plans for the DBB.
The TAP service will not be required to interface to the "live" PPDB, just the equivalent information in the Consolidated Database that derives from the PPDB. This implies that this data gets to the Consolidated DB with whatever latency would meet the requirements for making that data available through the TAP service. The baseline for doing so at this point is <= 24 hours but Eric Bellm is working on usecases that could result in tightening that. (On the other hand, it was mentioned that DoD concerns may motivate keeping this at >= 24 hours.)
Tim Jenness — provide astro_metadata_translator for obs_lsst by December, but at lower priority than PipelineTask conversion of ci_hsc.
Kian-Tat Lim — draft an SLA document for each enclave and services within it.
Wil O'Mullane — add a milestone for the availability of a DAQ at the NCSA L1 test stand.
Margaret Gelman — Prepare for LDM-503-06 by i) ensuring that the description and comments in LDM-503 provide a concise summary of the aims and methodology of the milestone, and ii) ensuring that LDM-564 provides a complete list of prerequisites for this milestone.
John Swinbank — Prepare for LDM-503-07 by i) ensuring that the description and comments in LDM-503 provide a concise summary of the aims and methodology of the milestone, and ii) ensuring that LDM-564 provides a complete list of prerequisites for this milestone.
Robert Gruendl — Prepare for LDM-503-09i) ensuring that the description and comments in LDM-503 provide a concise summary of the aims and methodology of the milestone, and ii) ensuring that LDM-564 provides a complete list of prerequisites for this milestone.
John Swinbank — Prepare for LDM-503-09a by i) ensuring that the description and comments in LDM-503 provide a concise summary of the aims and methodology of the milestone, and ii) ensuring that LDM-564 provides a complete list of prerequisites for this milestone.
Unknown User (mbutler) — Prepare for LDM-503-08 by i) ensuring that the description and comments in LDM-503 provide a concise summary of the aims and methodology of the milestone, and ii) ensuring that LDM-564 provides a complete list of prerequisites for this milestone.
Unknown User (mbutler) — Prepare for LDM-503-10 by i) ensuring that the description and comments in LDM-503 provide a concise summary of the aims and methodology of the milestone, and ii) ensuring that LDM-564 provides a complete list of prerequisites for this milestone.
Robert Gruendl — Prepare for LDM-503-11 by i) ensuring that the description and comments in LDM-503 provide a concise summary of the aims and methodology of the milestone, and ii) ensuring that LDM-564 provides a complete list of prerequisites for this milestone.
John Swinbank — Prepare for LDM-503-11b by i) ensuring that the description and comments in LDM-503 provide a concise summary of the aims and methodology of the milestone, and ii) ensuring that LDM-564 provides a complete list of prerequisites for this milestone.
Unknown User (mbutler) — Prepare for LDM-503-10b by i) ensuring that the description and comments in LDM-503 provide a concise summary of the aims and methodology of the milestone, and ii) ensuring that LDM-564 provides a complete list of prerequisites for this milestone.
Robert Gruendl — Prepare for LDM-503-11a by i) ensuring that the description and comments in LDM-503 provide a concise summary of the aims and methodology of the milestone, and ii) ensuring that LDM-564 provides a complete list of prerequisites for this milestone.
Robert Gruendl — Prepare for LDM-503-12 by i) ensuring that the description and comments in LDM-503 provide a concise summary of the aims and methodology of the milestone, and ii) ensuring that LDM-564 provides a complete list of prerequisites for this milestone.
Arrived at the consensus that the plan to deliver a complete BG3 and PipelineTask conversion of all tasks in ci_hsc by end January 2019, arrived at yesterday, is unrealistic, given the other time commitments of major players.
Agreed on a variant of “option D” in Jim Bosch's notes, above. Specifically:
BG3 will be integrated with the existing CmdLineTask framework.
BG2 will be retired without waiting for all CmdLineTasks to be retired.
CmdLineTasks may never be fully retired. As new tasks are written, or old tasks are refactored to produce pipelines which are more “LDM-151-like”, they will be implemented as PipelineTasks, but no drive will be scheduled to complete the conversion of other CmdLineTasks.
The Data Facility confirms that they will be able to support an execution environment capable of driving both PipelineTask and CmdLineTask indefinitely.
Success for the current work will be declared when BG3 has been fully adopted. At this point:
Conversion of remaining tasks to the PipelineTask system (on an as-needed basis) becomes fully the responsibility of the Science Pipelines groups.
A long-term maintainer for the BG3 codebase must be found, either as a new hire or (potentially) from within the ranks of the Data Facility.
Simon Krughoff volunteered effort to assist in converting code of particular relevance to SQuaRE to PipelineTasks, likely starting with ProcessCcdTask. There was some follow-up discussion about having his expertise best deployed on integration with obs packages instead; this should be included in Fritz Mueller 's plan (below).
Fritz Mueller — present a timeline for integration of the CmdLineTask framework with Butler Generation 3, and the subsequent deprecation of the BG2 system.
We're lacking L3 milestones in PMCS which describe the availabilities of services and capabilities which we know are coming, particularly later in construction.
Obvious examples include:
Butler Gen 3 / PipelineTask as the regular production environment;
Roll-out of the Pegasus WMS (or some other WMS);
Data back bone capabilities;
There are no DAX milestones beyond the end of November 2018;
There are five SQuaRE milestones total, all of which refer to notebooks (and which are not necessarily SQuaRE deliverables — see below).
In addition, some existing milestones seem unclear about who is delivering what. For example:
“DM-SUIT-16: Commissioning DAC“ — is that really a SUIT deliverable?
“DM-SQRE-5: Notebook service ready for general science“ — is that really a SQuaRE deliverable?
I assert:
We should have milestones describing the delivery of effectively everything described in LDM-148.
Are there other things which are not in LDM-148 which we need to track?
“Standing up a service” milestones are all for the Data Facility, and should be dependent on prerequisite milestones for software delivery.
We should have brief (~few sentence) description for every milestone.
We might consider having a more formal verification procedure, which would relate successful completion of these milestones to test execution.
I suspect that such a procedure would lose us more in time & overheads than it gains us, but I am open to being convinced otherwise.
Can we produce a revised set of milestones? When is an appropriate due date — January 2019?
Agreed to target the due date of this work as February 2019.
Agreed further that it should be driven by the product tree (not just LDM-148). Note that:
This means work can't usefully begin until the product tree has been finalised (see discussion 2018-11-06 at 16:00).
We note that the product tree is never really “final”, as it will continue to evolve throughout construction, but we expect a substantially revised and updated version by the end of this year which will form the basis for this work.
Some milestones are not directly related to DM products. These might include delivery of documentation or supporting services. For this reason (and for reasons of visibility/transparency, and because the PMCS will remain the source of truth regarding milestones) we will not associate milestones with products in MagicDraw.
Expect the minimal set of milestones will be availability of code to run a service (i.e. a software artefact) and the availability of the service itself (i.e. that software deployed at the Data Facility). In some cases (e.g. Science Pipelines) it will make sense to have multiple intermediate milestones for software delivery.
John Swinbank — Following product tree updates, circulate to all T/CAMs a spreadsheet for collecting L3 milestones.
Feedback from DMLT members on the plans heard earlier.
Finalize cross-team priorities.
Agree development plans for S19.
DM Science:
Documents for the LSP review are listed in the charge, and will be delivered to reviewers two weeks in advance of the view.
Reviewers will cover both science and technical themes.
Alerts Key Numbers study is not re-defining or deriving key numbers, but rather making them available to the community with adequate context.
Not yet clear where 200 GB DESC Qserv test dataset will actually be hosted.
Architecture:
Sizing model work likely to happen as part of the LDF-operations funding; Arch ready to act in an advisory role. No due date.
Data Access and Database:
It is a DAX (Colin Slater ) deliverable to ensure there's code for demonstrating that pipelines output matches the Science Data Model (but not for resolving discrepancies).
Data Facility:
Ongoing DESC DC2 processing based on informal discussions at LSST2018. This was news to most of the DMLT, but we agreed that it was a positive step.
Wil O'Mullane is the lead organizer for the AAS demo session; he will coordinate necessary resources with LDF etc; Robert Lupton can send suggestions for scale testing / widespread access to him(!).
Are there ways of providing greater uptime and lower latency using rolling upgrades, schema evolution with backfill, planned Observatory maintenance windows and daytime maintenance, etc.?
Work has been ongoing in the DM-SST to develop a formal DM data model and to refine the way in which LSE-163 (DPDD), the cat pacakge, LDM-153 (the baseline schema) are generated and managed.
This should be presented to and agreed by the DMLT.
Early Ops funding is now available, and during FY19 (ie, this year) the ADs for both Data Facility and Science Operations (Margaret Gelman and Wil O'Mullane) are funded at 0.25 FTE. That's half an FTE coming out of DM management. How are we handling that?
Similarly there's a total of ~15 FTEs funded across Data Facility and Science Ops during FY20.
What are the plans and schedule for transitioning staff?
Based on the modeling work done before the review this year, review proposed product tree, components characterization and relation with the document tree
We're lacking L3 milestones in PMCS which describe the availabilities of services and capabilities which we know are coming, particularly later in construction.
Obvious examples include:
Butler Gen 3 / PipelineTask as the regular production environment;
Roll-out of the Pegasus WMS (or some other WMS);
Data back bone capabilities.
In addition, some existing milestones seem unclear about who is delivering what. For example:
“DM-SUIT-16: Commissioning DAC“ — is that really a SUIT deliverable?
“DM-SQRE-5: Notebook service ready for general science“ — is that really a SQuaRE deliverable?
We should have milestones describing the delivery of effectively everything in LDM-148.
The half hour time estimate is for Wil O'Mullane to state the above, set expectations, and develop a timeline for delivery. If we actually start creating milestones during the meeting, it could take arbitrarily long.
The proposal is to develop a policy for handling packages which have been developed externally and which their authors offer up for inclusion in pipeline processing, the Science Platform environment, or elsewhere in the DM system.
Obvious examples might be scientific algorithms contributed by the community.
There is history of external users refusing to make contributions like this due to the demands of DM engineering (code quality, review, tests, etc...)
The aim is to share some preliminary considerations on the release process, taking in account the actual approach and giving a look on how this can evolve.
✅ (albeit in a potentially tightly squeezed slot...)