2019-10-14 DM SST Agenda and Meeting notes

Date & Time

14 Oct 2019 11:00 PDT

Location

Browser

Room System

Phone Dial-in

https://bluejeans.com/103664856

Dial: 199.48.152.152 or bjn.vc
Enter Meeting ID: 103664856 -or- use the pairing code

Dial-in numbers:

+1 408 740 7256
+1 888 240 2560 (US Toll Free)
+1 408 317 9253 (Alternate Number)

Meeting ID: 103664856

Attendees

Regrets

Colin Slater (TVS-SMWLV meeting)
Melissa Graham

Discussion items

Item	Who	Notes	Conclusions and Action Items
Project/Science Updates	Leanne Guy	Scarlet workshop in Naples 7-9 Oct Robert Lupton, Leanne Guy and Peter Melchior were invited speakers. Leanne gave an overview of data management and the science platform, Robert spoke about the science pipelines and crowded fields For most of the rest of the week, Peter ran a tutorial on using Scarlet (independently of the LSST stack). Their interests are in crowded stellar fields - Scarlet is one part of this problem, not the whole solution. We could have run this on the LSP and using the Stack. It was stressed that LSST is currently evaluating Scarlet as a deblender and that no decision has yet been made. Italian community is very keen to maintain and grow their existing MOU with LSST into operations beyond the current 15 PIs and are looking for ways that they can get involved. For us - it was good to get an idea of the interests outside of the non-DESC community.
Scientific datasets	Michael Wood-Vasey	DM-15448 - Getting issue details... STATUS Latest version of document: https://dmtn-091.lsst.io/v/DM-15448/index.html What is the motivation for having a CI and SMALL dataset? There seems to be a lot of overlap. Do we need something on that intermediate scale or do we just need CI/med/large Vision is that small is still something that can be run on a single developer machine in a few hrs. MEDIUM is the next scale requiring a cluster somewhere. Having a dataset that can be run a few times a day while developing (SMALL) is useful rather than having to wait overnight for results (MEDIUM). Consensus that a bigger MEDIUM rather than removing SMALL would differentiate them better and be useful. Is CI_HSC in fact 'SMALL' in this definition? CI_HSC is 8GB( ~ 30 CCDs from ~ 12 visits) so yes in that sense but it currently takes ~8 hrs to run (not SMALL). Most of this is thought to be due to Jenkins processes. On a few cores on most machines it takes 45 mins to hour with most of the processing down in ~20 mins in this case. Needs to be profiled to understand, perhaps some inefficiencies in I/O. Reported to be faster under Gen3 (30 mins on 8 cores) but no detailed timings have been done. The technote does not address computational performance monitoring, only or just algorithmic scientific performance monitoring. Even though we are not running testing in the context of an orchestration workflow on a known hardware configuration, it is nonetheless useful to know and track how long it takes to process these datasets. We should recommend doing that and add to the document especially or MEDIUM and LARGE datasets. It would also be useful to know if a SMALL dataset suddenly starts taking twice as long to run. What is the tradeoff between individual developers knowing they should run a CI/SMALL dataset regularly to check they didn’t break something algorithmically, and a regular CI that goes through SQUASH? Is that sufficient to catch regressions? AP team looks weekly, is that sufficient? Agreement with the following for running on dataset on Jenkins: CI level is required for a merge SMALL is at developer discretion, with the understanding that they will fix any breaks. MEDIUM /LARGE for algorithmic or larger scientific changes changes, but not unless there is an expectation that there would be a change, e.g don’t run on HSC-RC2 unless an algorithmic change might be expected to produce a different output. This means that the current CI_HSC is 'SMALL' both in runtime and in usage. Could maybe make CI_HSC 20-50% smaller but need to maintain a sufficiently interesting dataset for testing (area, depth, # epochs/patch) MEDIUM dataset definition is satisfied by HSC-RC2. Lauren has put a lot of effort into defining this. Details in the TN. This will be the main dataset that will give a balance between ’scientifically interesting’ and ‘does not take too long to run’ Key point about this this dataset that it is not representative – Lauren intentionally included more edge cases. This makes it more interesting for scientific development and performance monitoring but does mean that any predictions will be conservative or non representative to some degree. We should bear this in mind when doing characterizstion or commissioning level studies. Currently takes longer than a night to run. Not clear how much is limited by the current middleware's ability to balance over many more cores. Cannot be easily automated at this time as it requires some babysitting. The new middleware will address this and automation should be possible with Gen3 a workflow system. Hsin-Fang currently runs these on a monthly basis and is handing over to NCSA. Can we go back to fortnightly. LARGE: We have currently only run once or twice on PDR1. PDR2 still coming out of Japan. Simon Krughoff asks, is it within our remit to define a mechanism for identifying exactly what data are to be processed in each context?	Michael Wood-Vasey Move CI_HSC data to SMALL in technote Michael Wood-Vasey to add a statement to the document at GKE 29 Nov 2019 Michael Wood-Vasey Add comment on addressing computational performance 29 Nov 2019
AOB	Leanne Guy	Reminder that we will have a special meeting on LOY1 alerts on Wednesday

List of SST tasks (Confluence)

Description	Due date	Assignee	Task appears on
Robert Lupton Clarify the meaning of time in the object table. 1 sentence description in sdm_schemas, can link to a short DMTN. Update 2022-02-09: Meeting to resolve this on 2022-02-21 28 Feb 2022	28 Feb 2022	Robert Lupton	2018-11-05 DM SST F2F Agenda and Meeting notes
Gregory Dubois-Felsmann check if SDM standardization is adequately represented in project documents, and whether DMTN-067 should be required.31 Mar 2022	31 Mar 2022	Gregory Dubois-Felsmann	2022-02-14 DM-SST Virtual F2F Agenda and Meeting notes
Leanne Guy read RFC-842 and work out how much of this is in DM scope. Work with Eli Rykoff and Robert Lupton to make a plan to address this 28 Feb 2023	28 Feb 2023	Leanne Guy	2023-01-23 DM-SST Agenda and Meeting Notes
Leanne Guy talk to Steve R about presenting plans for the ShearObject table to PST and SciCollab chairs 20 Mar 2023	20 Mar 2023	Leanne Guy	2023-02-27 DM-SST Agenda and Meeting Notes
Jim Bosch Provide an example of a file containing a cell-based coadd for Gregory Dubois-Felsmann to examine to assess implications for firefly 31 Mar 2023	31 Mar 2023	Jim Bosch	2023-02-27 DM-SST Agenda and Meeting Notes
Leanne Guy talk to Gregory Dubois-Felsmann to review the original intent of the AFS-related Portal requirements before deciding on a course of action 29 May 2023	29 May 2023	Leanne Guy	2023-05-01 DM-SST Focus Meeting - Brokers in Commissioning
Leanne Guy Prepare to consult the PST on the question of providing compressed PVIs for AP outputs, to cover the period before the data become available in a DR. 02 Jun 2023	02 Jun 2023	Leanne Guy	2023-03-27 DM-SST Agenda and Meeting Notes
Jim Bosch Incorporate 30-60 day period for raws on disk into the strawman proposal and present to KT 26 Jun 2023	26 Jun 2023	Jim Bosch	2023-05-08 DM-SST Agenda and Meeting Notes
Parker Fagrelius Patrick Ingraham how long will it take to do a scan as described? No need to scan the whole WL range but will require additional points outside nominal lambda range. 30 Jun 2023	30 Jun 2023	Parker Fagrelius	2023-03-27 DM-SST Agenda and Meeting Notes
Colin Slater Gregory Dubois-Felsmann Robert Lupton Jeffrey Carlin Convene meeting/vf2f session on definition of dataset selection requirements in DMS-REQ-293 31 Jul 2023	31 Jul 2023	Colin Slater	2023-07-10 DM-SST Agenda and Meeting Notes
Eli Rykoff , Leanne Guy Develop a proposal for what calibration processing, hardware, data we actually need and what will be needed for DR1. This has implications for the ORR and for prioritisation of work in commissioning 31 Jul 2023	31 Jul 2023	Eli Rykoff	2023-01-30 DM-SST Agenda and Meeting Notes
Yusra AlSayyad will look to see if there is any effort to help on option 1 28 Aug 2023	28 Aug 2023	Yusra AlSayyad	2023-08-14 DM-SST Agenda and Meeting Notes
Jim Bosch Provide a physical example of that a up on cell table would look like fo the Colin Slater / DAX team to review 31 Aug 2023	31 Aug 2023	Jim Bosch	2023-02-27 DM-SST Agenda and Meeting Notes
"What is the pathway to defining the data products that are required to meet DMS-REQ-0266" Jeffrey Carlin 30 Nov 2023	30 Nov 2023	Jeffrey Carlin	2023-10-23 DM-SST vF2F Agenda and Meeting Notes
Gregory Dubois-Felsmann Eli Rykoff to investigate splitting DMS-REQ-0298 (data access services for provenance) into separate 1a and 1b reqs. 30 Nov 2023	30 Nov 2023	Gregory Dubois-Felsmann	2023-10-23 DM-SST vF2F Agenda and Meeting Notes
Leanne Guy Gregory Dubois-Felsmann , possible others small group to write description of how our deployment strategy meets DMS-REQ-0297. 30 Nov 2023 2024-01-30, See DMS-REQ-0297 -- Deployment Strategy	30 Nov 2023	Leanne Guy	2023-10-23 DM-SST vF2F Agenda and Meeting Notes
Jeffrey Carlin follow up with KT on DMS-REQ-0176 and DMS-REQ-0315 to update/disaggregate this for latest base/summit infrastructure split. 30 Nov 2023	30 Nov 2023	Jeffrey Carlin	2023-10-23 DM-SST vF2F Agenda and Meeting Notes
Jim Bosch Follow up on the possibility of investigating further the ability to process 2 collections in parallel. 31 Jan 2024	31 Jan 2024	Jim Bosch	2023-12-04 DM-SST Agenda and Meeting Notes
Jeffrey Carlin Gregory Dubois-Felsmann Colin Slater Define verifications for the consistency of collections part of this requirements - e.g. OSS-118 31 Jan 2024	31 Jan 2024	Jeffrey Carlin	2023-12-04 DM-SST Agenda and Meeting Notes
Gregory Dubois-Felsmann Jim Bosch Setup a meeting between science platform and middleware regarding user storage for butler purposes.		Gregory Dubois-Felsmann	2023-10-23 DM-SST vF2F Agenda and Meeting Notes

Space shortcuts

Page tree

Date & Time

Location

Attendees

Regrets

Discussion items

List of SST tasks (Confluence)