Logistics

Day 1, Tuesday June 22
Time (Project)	Topic	Coordinator	Pre-meeting notes	Running notes
Moderator: Kian-Tat Lim	Notetaker: Ian Sullivan
09:00	Welcome	Wil O'Mullane	Introductory remarks Review agenda and code of conduct
09:15	Project news and updates	Wil O'Mullane		RHL: There is a possibility of travel down to Chile as well as to the office. GPDF: At Caltech starting next week we can go back with almost no restrictions, including having visitors and using meeting rooms. Mandatory return to office will be September. Aura is not asking about vaccination status; Princeton, SLAC, UW will require vaccinations. Caltech is requiring reporting vaccination status, vaccination likely required upon FDA approval. Camera official delivery date is August 19. RHL: not actually done at that point, they will still be tinkering with the voltages. WOM: we can't realize or burn down the camera-related risks until they're done modifying it.
09:30	Community broker selection	Eric Bellm Leanne Guy	We are still resolving a few last issues in the SAC's preliminary broker selection report; no DMLT decision-making is needed right now.	KTL: Does the hybrid alert model make a difference in how many brokers we can support? EB: We asked the broker teams if they wanted a subset of alerts, and they all requested the full stream with full alert packets. TJ: This means that every broker wants the full postage stamp as well? EB: Yes, though many have said they would be OK with a service to look up the images. LPG: Steve K wants to answer this as an Operations question. We're not going to commission 6 and then run 5 in Operations. KTL: It's also only a problem if everyone wants it at the same latency. WOM: The six all do want all the data, within the 60s window.
10:00	Conda version pins	Jim Bosch	Doing less pinning in our conda envs lets users install their own things on top at the expense of reproducibility. Could we start providing both pinned and unpinned versions of each conda env release? I think it's time to admit that we cannot satisfy all consumers with either minimal pins or maximal pins or even a carefully chosen balance, but I'm hoping we can simultaneously support two envs that each try to satisfy different consumers just as easily. KTL: We already have this. The only thing that's lacking is an easy way to create a newinstall environment with the fully-pinned versions. My version of Gabriele's lsstinstall script (currently on a branch of lsst/lsst) intends to provide this. Also note that stack (not RSP) containers are effectively pinned unless someone installs something on top.	(#1: reproducible, #2: extensible) JB: If we require a reproducible stack, we may need to include a few additional packages in order to support users KTL: Cleaning up Gabriele's lsstinstall script. Working on it this morning, since Mario might be able to make use of it now. KTL: Prior to conda 4.9 anything you installed required the versions of any new packages matched the versions of all dependencies exactly. New versions allow you to install additional packages and update the versions. But, now you have lost complete reproducibility. KTL: The shared stack is a different problem: we can install additional packages that make developers lives easier, but we also want a minimal development environment without any additional packages. We can possibly do this for the shared stack, but not for the binary installs. RHL: Why is this a problem, if I just install new things it shouldn't require changing the build of the stack? FE: If a user pulls in a new package, it frequently includes updated dependencies that are already in the stack. TJ: We have loads of flexibility. The problem is that we depend on lots of python packages and if a new package needs a newer version of a package that might break us KTL: Two ways to go about it: freeze all dependencies, or allow it to float. KTL: There are ways to add packages to the lab containers or the shared stack, as long as they don't lock in incompatible versions. TJ: Are you thinking of doing Rubin-extra in addition to Rubin-env? KTL: Yes. RHL: I would like to move away from the expectation that we tell people they must install their own packages in the RSP FE: There is a process for people to add stuff to containers. FE: In Operations, the RSP is a very slow moving environment tied to official releases. RHL: I don't understand why we aren't more user friendly with our containers. FE: We have to determine whether many users need new packages, or if it is just us/RHL. This is what the Data Previews are for, to determine what real users in the wild will need. FE: The emerging model is that there are varying classes of deployments. For the Telescope environment, we might make the trade-off that we allow users to change the underlying configuration which might break it for everyone, in exchange for rapid development. For the science users, we need an absolutely stable environment. KTL: It is possible to give users more of a choice, but it means we have more complicated builds. RHL: It is great that the Telescope team may have a flexible environment, but I worry that will grow to include the entire commissioning team. FE: My preferred model for Operations is that we have a separate enclave on the Data Facility for developers and one for the thousands of science users. KTL: We need to have our standard Rubin-env for stable releases, and Rubin-env-extra for the additional packages. WOM: We might need different Rubin-env-extra environments in different places. KTL: That should be OK, we can have multiple sets. JB: I'm willing to live with flexible notebooks that don't guarantee reproducibility, as long as I can always get a minimal build that does guarantee reproducibility. FE: The problem is that some packages have dependencies in common with the stack, though it's rare. An additional problem is that we don't have a build engineer, so we don't have a dedicated person to solve this.
	Pop-up topic: Are people happy with Gen 3?	Tim Jenness		LPG: I am very happy, and hear from a lot of scientists that they are SK: Very happy overall, but we must solve the error message when we get an empty quantum graph. It is hard to step through all the data sets to find what is missing, and often it might be something in a late stage so many tasks could have run successfully before the missing part was missing. KSK: I'm not sure it's possible to do with logging. It may actually require more tooling. RL: I am very happy Gen 3 is coming out, worried about how flexible it really is and whether we have tested all of it. There is no question that it is better than Gen 2. RG: My concern is what will happen when you try to share amongst a lot of people. TJ: You're worried about registry overload? RG: Yes RD: From the USDF, the Gen 3 butler brings up questions of data processing and data handling issues TJ: I hope the execution butler will solve all these problems. RD: Also worried about multi-site registration for the products YA: Writing a fresh pipeline task is easy. Our struggles have been getting the same tasks to run the same way in both Gen 2 and Gen 3 YA: Hear a lot of complaints from the camera team, but not clear they're actionable.
10:30	Break
Moderator: Wil O'Mullane	Notetaker: Simon Krughoff
11:00	DMTN-185 Provenance	Frossie Economou	recommendations of the Provenance WG and identify which T/CAM(s) owns which so they can accept or reject them	REC-EXP-2: Tim: We have a way of associating images together, GROUPID. Things would get better if we had an M out of N header because we don't know when to run define visits because we don't know when all the data has shown up. RHL: This is really campaign management Tim: Snaps can't be part of campaign management RHL: It's part of it Jim: This seems like perfect enemy of the good territory Frossie: will create an extra meeting to hash this out REC-EXP-3: Frossie will shepherd, but there is obviously a lot about observatory management that has slipped through the cracks. Will need to bring together multiple sub-systems to hash things out REQ-TEL-001: All data is exported, but could be exported to Kafka REQ-TEL-003: KTL: This is under consideration and is working through the chain Frossie: Does this prevent CSCs hard coding firmware versions KTL: Will have to make sure that's part of the wording REC-SW-2: Patrick, Tiago, Andy, and K-T should meet to hash out whether commanding configuration is in the plan REQ-PTK-003: Frossie: This seems a little scary Jim: I don't think it's that bad except for setting up the right software Tim: This is specifically running a part of the graph Jim: We could provide some tooling to help do this Tim: We have a requirement to do this because of the virtual data products REQ-PTK-005: Jim: If you replace URI with UUID, I think this is solved
	DMTN-185 Post facto 2021-10-09			REQ-WFL-001: Done by Tim. Butler datasets. REQ-WFL-002: Ops campaign management project. BPS configuration and logs will be made available by Michelle Butler. Any other workflow level (docker container version) information will be handled by the campaign management team. REQ-WFL-003: Tim: Campaign management need this Jim: This is part of middleware Tim: segv will not show up Jim: Failed quanta and failed jobs are different. Former from middleware, latter from BPS logs. Frossie: Do we have the tooling to surface this information through current tooling Tim: Yes, through panda knows about job failures Frossie: Tim owns making sure this information is surface-able REQ-WFL-004: Panda pilot can surface CPU, memory, I/O info REQ-WFL-005: Tim will make sure OS info is in base_packages (sp?). This should include host node info to the level possible. This may be via nodeId that means something unique to somebody Frossie to add requirement for node ID inventory at the data centers REC-FIL-001: Gregory: The unique thing is the UUID Tim: But this is not going into the header. It means all formatters need to know how to write metadata and all readers will need to know that there is (could be) a UUID that should be used. Frossie: If I ship a user a dataset, they have to be able to tell me back what dataset I shipped them. Whether that is through UUIDs or some other mechanism, there needs to be a way Tim: not all datasets know about metadata Frossie: assuming all science datasets will have metadata is reasonable REC-FIL-002: Gregory will do the study in an ops capacity REC-FIL-003: Tim: This isn't a file level thing Frossie: Propose to strike based on this is a an understood objective Robert G.: We can strike it, but this is more about tooling later Frossie will move this req to another place REC-SRC-001: K-T will do the census of flags to make sure we can fit in 64 bits for sources and 128 bits for objects with buffer REC-SRC-002: K-T will look into data release ids fitting in 4 bits REC-SRC-003: With the above two K-T will look in general whether 64 bits is sufficient for source IDs REC-SRC-004: Leanne will provide new language in the DPDD around footprints and heavy footprints and Gregory will collaborate REC-MET-001: Frossie will replace dataId with UUID and claim it REC-MET-002 – Done REC-MET-003: Yusra will drive adding sufficient metadata to persisted Job objects that specific measurements can be looked up from the original butler repository from metadata in the Job. I.e. the repo root, run, collection, and dataId will all need to be knowable from the JSON persisted Job object. REC-MET-004: Yusra will describe how this is done currently with measurements not related to specific datasets like runtimes in jointcal and verify_ap REC-MET-005: Tim: There is no problem with having a special metric measurements backend to butler Frossie will discuss with Yusra whether/how this will be pursued REC-LOG-1: Richard owns logging. Frossie will coordinate REC-LOG-2: Frossie will make sure log management solutions are in place for all sites REC-LOG-3: Frossie will raise to DPLT
11:45	User Batch	Kian-Tat Lim	Impersonation or not? Inside K8s or outside? Integrated with DF systems or not? Could UWS be enough? Are we even ready to start discussing requirements or design? If not now, when?	See Level 3 Definition and Traceability for the collected relevant requirements (summary: they don't constrain this very well). Frossie: Lots of this is lots of work. Would it be the worst thing in the world to offer batch that requires running exactly like production (e.g. use pipelinetasks) Tim: If we put user auth in Panda, this is basically trivial. If we offer running arbitrary docker images, this gets way harder KTL: Of course the standard HPC env is a shell prompt, not BPS GPDF: I thought we would go just that route, e.g. batch submission from the command line. It's late to do something more sophisticated unless we bring in someone else's system Frossie: CADC's model is different from ours, so we can't borrow from them Richard: We are adding cores throughout the project. My suspicion is that most people won't do image processing, but will be doing random batch processing with results of queries Eric: There is a steep learning curve with out pipelines code if we make them go that route RHL: Colin's use case is the one I really want supported GPDF: The community compute is meant to democratize access, not support large collaborations like DESC completely Wil: I believe we have provided this via notebooks. People do want dask or spark, but we need a solution that is controllable Frossie: We have always talked about there being a TAC that will manage access Leanne: In ops this is called the User Committee Wil: We may get away without having to have a lot of process around allocation depending on usage patterns Frossie: It is probably best to be legalistic about requirements so that we don't get caught in the situation where we are providing "nice to haves" at the expense of delivering the system we promised Leanne Guy will provide a reference-able document on interpretation of the user batch requirements that will define the minimum viable system we need to deliver. (Update: requirements will be presented at 2021-10-18 vF2F meeting)21 Oct 2021
12:30	Break
Moderator: Leanne Guy	Notetaker: Wil O'Mullane
13:00	Prompt Processing	Kian-Tat Lim	Use OCPS or start building a more sophisticated execution system for USDF?	no detailed design for prompt processing - could use OCPS if we added event rigger RobertG worried about security (OGAs) not allowing this everywhere - baseline is USDF with secure links. Worry about FARO publication to Squash being slow - Frossie and Leanne agree this is a bug, probably with the squash API and will be solved. LPG: faro writes out single scalar quantities, should not be any issues with storage. GPDF reminds Originally PP was going to be at the Base AND at the Archive/USDF. RHL in favor of using OCPS for prompt - need access to SAL messages Colin worries OCPS is not covering all the open issues - OCPS exists though and could be a step in the correct direction Eric -if we moved to Chile does Casandra Prompt DB also need to move ? Yes .. Tim - wherever it runs you need to reflect this in OODS, other problems like graph generation will have to be solved. But its not in the planning Jim - gen3 problems are not hard if you don't use the quantum graph generation .. need a bit of time need to be scheduled RHL - if we generalize prompt production a little it will solve lots of problems currently in OCPS Cristian how much space at summit - about one rack .. Frossie worries about running in Chile - OPS IT is unclear .. many other problems. DO min on summit and through away .. separate alerts from OCPS use case. RHL - to say we are only doing sanity checks is not correct .. need multi step scatter gather Richard - sounds like a workflow engine - is PanDA an option to run prompt processing. Can interface UWS to anything like PanDA Mostly between Tim and KT - WOM wants to stay involved to make sure PP does not get over complicated .. Colin - how does he gain confidence he will have prompt processing.. Tim - once DP0.2 is done PP is the priority. KTL will develop design document: DM-30854 - Getting issue details... STATUS
13:30	Exposure Table Gregory Dubois-Felsmann slides (PDF)	Frossie Economou Gregory Dubois-Felsmann	The exposure table is a key piece of observatory metadata but I have been unable to determine who is in charge of constructing it, and its lack is starting to block work. Gregory Dubois-Felsmannor I can give a brief overview of the state of play but we should identify way forward. YA: + what if any is the relationship with the pipeline-output `CcdVisit` and `Visit` Tables.	Yusra from Sci Pipes - parquet for visits implemented (covers exposure) , some things from EFD like mirror positions is not clear and how to tie it in is not clear. Tim - concern visit and exposure are not the same .. GPDF used both words separately with different meanings. Most exposure info can come from EFD (GPDF plausible). What is the path from EFD to a new header (FITS) in a table each keyword .. GPDF says that is there. Need to get it back into Gen3 - naming needs to be fixed and homogenized. Gen3 formatter needs to get header form this system (per DR) KT lots of meta data calculated at different times upto a year later .. so is it one or multiple things . GPDF need a technical arch for this - may need separate tables .. FE does not want to be pulled into this - there is an aggregating instream formating in kafka - demo of this for weather data to relational table. This should fulfil the needs above but does not solve the data model ? CloudSQL on IDF for postgress .. Tim - will butler registry at USDF be kept up to date at low latency .. We can release pointing but not pixels faster than 24 hrs .. Yusra - how many tables ? should thing of it partially as data product output of pipelines Richard - plots per exposure or plots per multiple exposures .. Tim put them in butler gen 3 repo .. KT - other place is LFA - but for other data sets we would have had in butler .. Who is going to make this happen ? RHL would like to see it designed .. GPDF there is substance in his two points .. baseline those. (General agreement / no objections.) Who takes the responsibility for moving this onward .. Gregory Dubois-Felsmann post DP0.1 begin architecture of exposure table 16 Jul 2021 : DM-30853 - Getting issue details... STATUS
14:30 (latest)	Close
Day 2, Wednesday June 23
Moderator: Wil O'Mullane	Notetaker: Colin Slater
09:00	AHM at PCW.	Wil O'Mullane	What should we cover .. Rebaseline Ops transition .. Hands on GEN3 ? (IDF or NCSA ?)	Tim: Session 1) Gen3 Q&A for developers to ask question of middleware. Session 2) Helping the community switch from Gen2 to Gen3. Simon: Most users either know Gen3 from start, or have already started 2→3 transition. Ian: Good for some of the Gen3 power users (non middleware devs) to lead something from a user perspective. Wil: So a tutorial? Simon: hard to know what issues people are going to have, if we have a tutorial we should also have a Q&A. Wil: Q&A for DM developers. Then Tutorial session. Then slots for "come in and ask question", open to anyone, "this is what I'm trying to do". Yusra: Not great attendance with help/tutorial sessions at prior PCWs. Jim: How many people who would be helped by this are actually planning to attend PCW. Wil: DP0.1 users coming online, some fraction of that might want this? KT: PCW planning on community, can use that to gauge. Ian Sullivan Discuss within Science Pipelines who should lead a Butler tutorial/QA session at PCW.02 Jul 2021 Tim: CET might already have good tutorials for Gen3. KT: Review of how DM works w.r.t SIT-COM, urgent tickets. Frossie Economou Prepare a "How DM works with SITCOM et al." presentation as part of the PCW DM All Hands session.06 Aug 2021 KT: PCW in Chile? Wil to discuss with Victor. Wil: Add slide to deck RobertG: Docs on Gen3 is required for deprecation. Gregory: DP0 "how it's going" session is canceled? Yes, we have many sessions with Delegates. Frossie will have a "Coffee with RSP Devs" session. Separate session for RSP Devs w/ other data centers. Tim: concerned about duplication of effort between CET gen3 docs and DM gen3 docs. Tim and Leanne will resolve offline. Gregory similar concern. Wil, when does this link back up? After we get feedback from delegates, we'll know more about what was useful. Simon is working on updating the pipelines.lsst.io tutorial to gen3. Yusra: Task docs also exist, need refresh in the fall. Frossie: Russ has a good tech talk on security, arrange with Cristian. Q&A on security.
09:45	Status I. Team status and brief overview or EPICSs to FY23 given to Kevin.	Ian Sullivan Yusra AlSayyad Unknown User (mbutler) Kian-Tat Lim	10 minutes each - link slides in agenda below By coordinates North? UW 47.65, -122.30 Princeton 40.34, -74.68 Urbana 40.11,-88.19 SF 37.76, -122.43 Arch Palo Alto 37.43, -122.15 Tucson 32.20, -110.96 Chile -29.91, -71.24 Prompt Processing Data Release Production LDF Arch	Alert Production RHL: is the plan to have AP prototype processing running at SLAC by next summer? Yes. Eric: Hope that AP effort serves as a forcing function. Fritz: Need to have compute on the floor. There are ways to find compute. Data Release Production NCSA Is NTS going to Chile or Tucson? ITTN-30 gives the test stand plan. (CTS: Couldn't understand the answer on this, someone else should supply) Arch Gregory: Status of RFC-775? Jim hasn't gotten to writing the implementation tickets, will then adopt.
10:30	Break
Moderator: Wil O'Mullane	Notetaker: Kian-Tat Lim
11:00	Status II	Fritz Mueller Leanne Guy Frossie Economou Cristián Silva	DAX DM Science Plans SQuaRE Update Chile IT and Networking	DAX: Consider moving schema browser to schema.lsst.io (using LSST-the-Docs infrastructure) Fritz Mueller 30 Jul 2021 Ticketed at DM-25399 - Getting issue details... STATUS Science: SQuaRE: IT: IT Update
12:00	Wrap up - review actions	Wil O'Mullane		Schedule a follow-up meeting on the rest of the Provenance recommendations Frossie Economou 02 Jul 2021
12:30	Close (latest)

Proposed Topics

Topic	Requested by	Time required (estimate)	Notes
DM AHM	Wil O'Mullane	30min?	Discuss DM AHM at PCW in August.
DMTN-185	Frossie Economou	30-45min	Suggest we walk through recommendations of the Provenance WG and identify which T/CAM(s) owns which so they can accept or reject them
Exposure table	Frossie Economou	45-60min	The exposure table is a key piece of observatory metadata but I have been unable to determine who is in charge of constructing it, and its lack is starting to block work. Gregory Dubois-Felsmannor I can give a brief overview of the state of play but we should identify way forward.
Prompt Processing	Kian-Tat Lim	30-45min?	Use OCPS or start building a more sophisticated execution system for USDF?
User Batch	Kian-Tat Lim	30-45min?	Impersonation or not? Inside K8s or outside? Integrated with DF systems or not? Could UWS be enough? Are we even ready to start discussing requirements or design? If not now, when?
Conda version pins	Jim Bosch	30min	Doing less pinning in our conda envs lets users install their own things on top at the expense of reproducibility. Could we start providing both pinned and unpinned versions of each conda env release? I think it's time to admit that we cannot satisfy all consumers with either minimal pins or maximal pins or even a carefully chosen balance, but I'm hoping we can simultaneously support two envs that each try to satisfy different consumers just as easily. KTL: We already have this. The only thing that's lacking is an easy way to create a newinstall environment with the fully-pinned versions. My version of Gabriele's lsstinstall script (currently on a branch of lsst/lsst) intends to provide this. Also note that stack (not RSP) containers are effectively pinned unless someone installs something on top.
Community broker selection	Eric Bellm & Leanne Guy	15 minutes	Report back on community broker selection and discuss next steps.

Attached Documents

Action Item Summary

Description	Due date	Assignee	Task appears on
Frossie Economou Will recommend additional Level 3 milestones for implementation beyond just the DAX-9 Butler provenance milestone. 15 Mar 2022	15 Mar 2022	Frossie Economou	DM Leadership Team Virtual Face-to-Face Meeting, 2022-02-15 to 17
Kian-Tat Lim Convene a meeting with Colin, Tim, Robert, Yusra to resolve graph generation with per-dataset quantities (likely based on Consolidated DB work). 18 Mar 2022	18 Mar 2022	Kian-Tat Lim	DM Leadership Team Virtual Face-to-Face Meeting, 2022-02-15 to 17
Frossie Economou Write an initial draft in the Dev Guide for what "best effort" support means 17 Nov 2023	17 Nov 2023	Frossie Economou	DM Leadership Team Virtual Face-to-Face Meeting - 2023-Oct-24
Convene a group to redo the T-12 month DRP diagram and define scope expectations Yusra AlSayyad30 Nov 2023	30 Nov 2023	Yusra AlSayyad	DM Leadership Team Virtual Face-to-Face Meeting - 2023-Oct-24
Gregory Dubois-Felsmann Complete DMTN-105 defining the goal for "Prompt Products Release Ops" 11 Dec 2023	11 Dec 2023	Gregory Dubois-Felsmann	DM Leadership Team Virtual Face-to-Face Meeting - 2023-Oct-24

Space shortcuts

Page tree

Logistics

Date

Location

Attendees

Regrets

Day 1, Tuesday June 22

Day 2, Wednesday June 23

Proposed Topics

Attached Documents

Action Item Summary

Space shortcuts

Page tree

DM Leadership Team Virtual Face-to-Face Meeting, 2021-06-22/23

Logistics

Date

Location

Attendees

Regrets

Day 1, Tuesday June 22

Day 2, Wednesday June 23

Proposed Topics

Attached Documents

Action Item Summary