Robert: But what if these milestones don't actually reflect what work we actually need to do
Frossie: There may be a couple of those, but by and large that doesn't seem to be the case with the current set of milestones
Michelle: Can we try to organize the completion process a little? E.g. put them all in a spreadsheet and identify next steps and responsible teams so that we can get a better handle on how to make progress
Wil: I have a sheet like that, though there are probably new ones. I can resurrect that document
Gregory: Looking at overdue ones in my areas, they are all in active development.
Frossie: Those are tractable. LDM-503-EFDc is a case where there is no one team who can retire it on their own.
Gregory: Exactly. Maybe we should look for ones that haven't been started and that we may not even have architecture for
Fritz: My work is under represented by milestones relative to work left.
Frossie: I'm worried about user databases which does have a milestone and is a problem since it's one of these that takes many teams
Frossie: Maybe we should have each T-CAM classify their late milestones into: not relevant, mostly done, need to be moved to another location, etc.
Frossie Economou with T-CAMs will do some taxonomy to try to categorize late and soon to be due milestones
Wil O'Mullane will update the spreadsheet to remove done ones and add new ones
Yusra: Most milestones can be completed internally. Exception is AP-15 and DRP-24. We can't do anything more without precursor data
Fritz: We did a big walkthrough and filed LCRs and moved the needle. Maybe we just need another one of those at the next F2F
Wil: We could do that
Gregory: Is there low hanging fruit?
Yusra: The ones in orange in Wil's spreadsheet are first guesses at the lowest hanging fruit
Frossie Economou will run a milestone "parade" for a time box starting 09:00 project on Thursday
The output of the parade should be a fleshed out version of the spreadsheet.
Acceptance Test. Robert: Auxtel data campaing to be includede en acceptance test. Leanne : Agreed KT: Who run acceptance test?, who organize them? are we using thinkgs already done, or is it new work? Leanne: Organized by Jeff Carlin and Leanne, will need help of product owners. Should be able to execute unless product owner wants to do it. Frossie: Retiring pre-comcam data should be fine. Retire L2 when level 3 tests are done also good. Fritz : For databases there are scale requirements related to datasets, which are not the same as in operations. Leanne: Run tests on datasets available, could stay in verification status until lsstcam data is available. Robert: On regards to KT question, Robert can get Sitcom scientists that could like to be involved in acceptance tests. Frossie: Can't wait for DR1, so performance requirements must done "at scale". We could use an artificial load. Fritz: Some other things may not appear until we got data production at scale. Frossie: Could fulfill level3s while level2 activities are ongoing. Wil : 1a, 1b are camera focus. But we do need to prove at scale.
Ops Rehearsals Frossie: Different ops. rehearsal than the past. Commisioning style, to find if how we do things is wrong Robert : Similar to what we do Frossie: People involved now are experienced w/Auxtel. Comcam is something new. Leanne: More about actors and interation than components. Should be DM ops. rehearsals ? or Rubins? Wil: Would like to be Rubins. Next OR should be focus on commisioning. Robert: Could be good to do a "real" OR with more/new people involved KT: Auxtel is not using final components, so training to use this way could be a problem later. Robert: Auxtel is useful and is good to discover what's missing Wil : We shouldnt give things that are not ready. ie. API to Alysha Leanne: Like the idea of Rubin's Ops Rehearsal Wil: We shouldnt push everything, there are some DM only activities.
Network KT: Can we do it now Cristian: I rather wait to not do work twice. Leanne: We can wait. Cristian: if this is taking too much time, we can still do it. Robert: Does base facilities verification includes running pipelines in antu Wil : Not in scope, but we can do it. KT : Base is about facilities not the services Wil: Base facilities was handed over to operations Noirlab.
Middleware Frossie: Worried Jim as middielware product owner. Perhaps could be too much load for Jim. Jim: Already doing some of this. Gregory: Backing up Jim. Wil: Need to update org chart ? Leanne: Already started updating. Wil: About org charts, product owner of LHN should be moved to Richard.
RSP Acceptance Test Gregory: Running test campaignsa, they are good cause we always find something.
DM Science Validation Wil: Verification on DM side, validation in conjunction with the rest.
Sizing: Robert: Sizing means memory, cpu, etc. Gregory: Release field is a non trivial problem/ Wil: Commisioning could be continous release process, and one final release for operations KT: Concerns about the buying hardware for USDF given the leadtimes and timeline. Richard: Haardware ordered, perhaps arrives in January KT: Data release, if you need to patch still the same data release because replaces code. Wil: Not a problem for commissioning. Jim: During commissioning shouldn't be a problem. Wil: Number of IDs in a patch, can be splitted for ID purposes? make smaller patches... Wil: Science team can investigate about it. Wil: DMTN-135 has good information about hardware
If someone asks for "all the observations taking by Auxtel last night" there is no way to get that to them
We have ~4 different of ways of representing this data
We could take advantage of the butler registry, which has all of the information. Would have to expose it from the butler as a way to get to the tables and views that we need to build services that query the DB.
Can we expose the butler registry in this way?
Who is leading this effort? There is no general view into the metadata
JB: I don't actually hate this, the big thing that has changed is the Butler registry schema
RHL: I have exactly the opposite reaction. We have an enormous amount of things that must be captured here. Mixing that up with an operational database that has to run the system seems like a mistake
KTL: This has to be an operational database. We need something that includes both metadata that we know at the time of observation, as well as information that is calculated later.
FM: Frossie, how much of your concern is addressed by metadata tables that are already in the DPDD in combination with reformatted EFD?
KTL: The DPDD does not have any observational metadata (the tables you are thinking of are in the baseline schema)
WOM: We had a long data
GPDF: “Consolidated Database” was also in contraposition to Qserv
JB: I don't want to put much more into Butler code. It should either be completely separate, or written as an extension
FE: What I want to do is seed the views that I want. Can we use the butler registry for that?
TJ: I am supposed to write a technote on observation annotations, i.e. to let an observer flag the nights observations as bad
GPDF: I imagined that we would take advantage of the Butler's existing tables, but not mess with the butler itself. It could be read-only, and we would do views instead of joins.
GPDF: What we're trying to do is allow this be done in the live butler, not in the replica after a delay. This would allow an observer to enter annotations immediately instead of first having to wait for the Butler to update
RHL: I worry that we are focusing on the technical implementation, rather than defining what we need first
FE: We all agree that we need an exposure log. Right now there is neither a technical path nor a management path unless we do something here
FE: It is a DM requirement to deliver an exposure table. Who is doing this?
KTL: That's the big problem. We know the data is coming, we just need a place to hold it.
WOM: We need to be careful to say that we would be making use of the butler, not that it would be a part of the butler.
JB: There are different levels of interface. If we want these things to be joined against butler tables, we need to be careful. It is not easy to do that, but it can be done.
TJ: I am only writing up the specific case of how you write up annotations, not how you deal with tracking
RHL: I think it is SITCOM's responsibility to lead this, and DM to build the backend according to their directions
KT: SITCOM can be product owners and work on front ends, but we need at least a prototype backend to work with them on. If no one wants to take this on, I can do it.
GPDF: The image metadata table issue is something we discussed as part of the image services, and is something I am already working on. I feel a lot of responsibility for the backend, though I can't do the front end.
RHL: I can provide a prototype of a backend from an obs log from HSC. I am happy to work with Architecture
JB: I'd like to weigh in on how this would interact with the Butler. Is there more than what is in the baseline schema?
WOM: No, that's the problem, we need the product owners to tell us what is missing
TJ: It is trivial to query the butler for all observations that were taken last night, but impossible to tell what the last observation was since there is no ordering
FE: Where does this live?
WOM: Arch will take care of this for construction, either K-T or Tim
FE: When does the data show up?
KT: during AP, so within 60s
FE: That is acceptable
GPDF: Which TCAM am I working with, and who is actually building something?
FE: Propose that we have the basic architecture for this presented at the next DMLT F2F.
KT: I think a prototype should be done by then, and I can make it. You will all hate it, but it will allow us to have a discussion
GPDF produced a Confluence page with requirements relevant to User-Generated Data Products and computing available to science users to produce them, but it only addressed high-level capabilities. User Batch has to address products derived from both catalogs and images (not all images: selected subset based on catalogs). 10% of capacity required for survey will be provided to users (as $, not necessarily same mix as production); much smaller than e.g. DESC needs.
Frossie: Does 10% include nublado? If so, then may not be much leftover for batch; if not, then nublado is uncosted increase
Batch will still be allocated (and thus maybe can be harder to use); can take into account whether users will make results public
Nublado has somewhat absorbed original birthright concept
Need to provide a processing framework for systematic runs over appropriate data. Quotas need to be able to go to groups as well as users.
Catalog use cases include training classifiers running on sharded Parquet. Next-to-data is therefore not really DAX/Qserv-related anymore. Dask in nublado works well; others like UW/LINCC are using Spark. Data Science community constantly producing new tools. (Could perhaps leverage LINCC work, but they may not be scalable enough for us.)
Frossie: need to answer these:
What resources are devoted to nublado vs. batch (and maybe vs. Dask)?
Hard to imagine fulfilling computing reserve with nublado; can't devote entire 10% to it
Specific asks from specific teams to build this system?
Tossing a SLURM queue at users does not meet requirements, but can be part of the solution. BPS + batch queues is designed around single-tenancy; needs work for multi-user. But may be generic enough for user processing.
Richard: 10% is ~500 cores, not much; users need unstructured compute.
Frossie: Project wants to reach out to a wide variety of users; can't have both birthright and batch. The difference is that batch utilization is by policy, nublado utilization is by user.
Tim: chasm between PipelineTask with Butler and BPS vs. running arbitrary jobs. Isn’t this why users on Google is easier because they *can* pay with research grant money, so we are saying that arbitrary compute is not what we are doing and user batch is only supported via bps/butler. Wil: need to sit down with Richard and figure out whether 10% needs to increase and how to divide it; his assumption has always been that it is PipelineTask and not general; also need to add priorities and timelines for when users get what. Need to provide lots of alternatives to users. KTL:
Do we need to divide up front?
No, can be elastic.
Can VO services provide interface for arbitrary jobs?
Not likely to be sufficiently scalable; don't want 100K jobs hitting VO services
Although not clear if users will have 100K jobs...
Don't yet know if we can hook up Google properly for bring-your-own — need a demonstration.
GPDF: don't need a steering framework for "freeform compute" but do need data access, and VO is not specified for this load; cannot force everyone into PipelineTask framework. Frossie: we have to force everyone into PipelineTask framework. Colin: what pieces of software will people use and how will they run them? Detailing use cases will help. GPDF: We could provide a command-line tool that a) extracts a file, based on a (collection, dataid, type) triplet, to whatever scratch space is available to batch jobs, and/or b) extract a signed object-store URL that they can use in whatever code they have. Tim: This is butler retrieve-artifacts
Tidy up description of percentages in DMTN-135 to make clear that 10% includes all user computing Wil O'Mullane
Address increases in compute allocation to users if needed Wil O'Mullane
Requests for "Focus Friday" exceptions by developers in AP have generally been addressed by "save for later" or other tools. Should consider using scheduled messages every day to deal with timezones, not just Fridays. But some people can/will read and respond even outside normal work hours. Need to better understand culture issues in general; perhaps do a similar survey on a different issue for each meeting?
RHL: What will we do about the lack of documentation? Ian: Scheduling work for people to add documentation. Writing answers provided on Fridays directly into documentation helps.
Wil: everyone likes not having meetings; some relaxation of Slack rules might be considered; not an overwhelming push to change things
RHL: Future surveys should get better coverage from non-DM people.
One of the biggest obstacles we have to putting VO services into (Rubin data) production is the fact that we do not have an agreed format for Data IDs uniquely identifying our data products. This information exists in a dict, but we don't have a scheme for converting it to a string. Let's discuss the complications and come up with a plan.
Low hanging fruit milestones and how can we claim them
We need a plan on how to get observation metadata somewhere where VO services can get them - this means probably the Consolidated DB (though, crazy idea, Butler registry seems to know most of this stuff?) Right now nobody seems to own this and be working it so we have to come up with an actionable plan.
I am getting more feedback from developers getting frustrated by some aspects of Focus Friday. Note that these concerns are mostly addressed by the open support channels and by instructing them on how to use Slack's "Schedule for later" feature.