Target date 1 -5 June 2020.
Robert Gruendl is leading this effort.
LDM-643 (Section 3) now has a first draft of a description for OPS rehearsal #2
Draft test plan for OR2 can be found at: https://jira.lsstcorp.org/secure/Tests.jspa#/testPlan/LVV-P72
The pre-operations team feel using AuxTel for this rehearsal would provide maximum end-to-end realism.
(Though Covid-19 may prevent that)
Personnel needed to execute the rehearsal - Wil O'Mullane Chuck Claver Unknown User (mbutler) with DM CAMS should identify staff for the rehearsal.
- SIT-Com: Tiago, Patrick - if AUXTel with LATISS these two should be on site in Chile - and Brian (possibly Michael Reuter for EFD)
- Data Production QA : Lauren , Yusra AlSayyad since it looks like we will be defining procedures as we go, others ..
- Data Production Facility : Michelle B, Robert G ,
- TSSW: On call who Andy Clements
OPS rehearsal II discussion (2020 March 31; 15:00 Pacific; https://bluejeans.com/206063043)
Participants
- Unknown User (abauer)
- Chuck Claver
- Jeffrey Carlin
- Leanne Guy
- Michael Reuter
- Unknown User (mbutler)
- Patrick Ingraham
- Robert Blum`
- Robert Gruendl
- Sandrine Thomas
- Wil O'Mullane
- Merlin Fisher-Levine
Agenda
- decide if we do it with simulations from the base or delay to summit reopening.
- basically set a schedule
- if we use LATISS agree ground rules with Patrick (not to interfere with engineering work)
- ComCam?
- give some clarity on personnel involved and what role they play
- give some clarity on goals such as
- which procedures we might develop
- which requirements could potentially verify (Jeff Carlin)
Notes
Goals:
- Refer to LDM-643.
- “Prepare for ComCam on the sky”.
- How will observing proceed?
- How will changes be folded in?
- How will processing be incorporated into the daily operations of the system?
- Inform e.g. our approach to change control: how do you change how the DAC is read out? How do you fall back to a previous system?
- Exercise whatever systems we have available, or use shims to make the rehearsal possible.
- However, the real aim is the interaction between the various teams (summit, Tucson, LDF); reality of hardware etc is not fundamental.
- However, using hardware to make the rehearsal “hands-on” makes it more compelling.
- Previous rehearsal was based on simulated data.
- This is a commissioning rehearsal, rather than preparation for a steady state operations scenario.
- Expect to plan a twist for mid-rehearsal; assume that everything runs smoothly on the first day.
- Needs a “ringmaster” to make this happen.
Outstanding questions:
- Which instrument should we use to drive the rehearsal? Simulations, or AuxTel/LATISS, or ComCam?
- In the near term, AuxTel will be in stasis; ~months before it is back online.
- Probability ComCam will be running in La Serena at some level, but it's not clear what level of connectivity it will have.
- Wil reckons “quite a bit”; it's in the data center.
- A moving target in terms of availability.
- Could be used to generate biases, darks, flats.
- Could play fake images through the NCSA test stand. This obviously doesn't exercise ComCam hardware.
- Can inject “interesting” images, but it takes some effort.
- Decision: target ComCam in the Recinto.
- Backup plan: use test stand at NCSA if ComCam is not available. Specifically, the date takes priority over the hardware availability.
- What compute hardware do we use?
- The commissioning cluster should be available.
- Also want to exercise batch compute as part of this exercise at NCSA.
- When do we feel we need to run this rehearsal?
- Do we need to rethink ops rehearsals given our current situation? E.g. introducing more rehearsals in future, as things start coming back online.
- Note there is a pre-ops milestone which may fold in to this rehearsal.
- That milestone implies we want this done by the end of June.
- Aiming for May gives enough time to get ComCam up and running, but likely the summit will still be closed so people will be available.
- There is still some managerial work to be worked out to get ComCam going.
- As above, date takes precedence, and we use the test stand if ComCam isn't available.
- Some uncertainty about the availability of code to perform ingestion by May.
- Note DMLT meeting, DES collaboration meeting in May.
- Decision: Doodle poll key individuals for three days in May.
- But can reconsider if ingestion is unavailable on that timescale.
- Do we need to rethink ops rehearsals given our current situation? E.g. introducing more rehearsals in future, as things start coming back online.
- How long should the rehearsal last?
- Three days last time; allows time for handovers, debriefs, etc.
- Decision: the duration will be at least three nights (four calendar days).
- If the exercise is run during the day using ComCam in Recinto, then the equivalent of three nights is fine.
- Who are the key personnel?
- Chile: Kevin Reil, Brian Stalder (likely as observing scientist; alternatives are Sandrine Thomas, Chuck Claver, Kevin Reil, Tiago Ribeiro), Cristian Silva
- Tucson: Michael Reuter, Patrick Ingraham (& Chuck as observer).
- Execution staff / NCSA: Monika Adamow, JD Maloney, Robert Gruendl, Matt Kollross
- SDQA: Lauren MacArthur, Jeff Carlin
- Depends on exactly what pipelines get run, what high-level requirements can we test?
- Pipelines support (on call): Merlin Fisher-Levine
- Camera support (on call): Tony Johnson, Htut Khine Win, Steve
- Director: Bob Blum
- Several observers.
Next steps:
- Robert Gruendl & Patrick Ingraham to outline a plan, on this wiki page.
- Leanne Guy & Jeffrey Carlin will turn that into a test plan.
- John Swinbank to make sure that Merlin knows that this is in the offing.
- Michael Reuter to give T&SS a heads-up.
Plan Assuming ComCam available from Recinto/Base:
Prereq(s):
- ComCam functioning (cold with illumination sources)
- ComCam DAQ/Archiver produces appropriate headers
- CPP pipeline ready to process (bias, dark, flat?)
- Hexapod/Rotator simulators at a minimum (the more the better)
Rehearsal Outline/Script (repeat 3? times):
- Observations from ComCam:
- bias sequence
- dark sequence
- flat sequence?
- Transfer and ingest to OODS (base) and DBB (NCSA)
- CPP pipelines (triggered by hand?)
- SDQA and inspection within LSP
- Meet and discuss
Variance/Challenges:
- initially, whatever might come up
- Perform standard visit with ingest, display image post-ISR
- ISR quality is somewhat irrelevant
- Pretend bias is bad and new one is needed
- Recreate master bias at NCSA excluding image 4
- Get it to summit network, ingested and re-display the image with proper bias
- Take a new image and make sure it uses the new bias
- Simulate a normal commissioning characterization activity
- Verify readout noise during slew is the same as when not in motion
- Verify no pickup from motors etc
- Check each elevation, azimuth and rotator axis independently
- Analyze the data during the "day" at NCSA
- Verify readout noise during slew is the same as when not in motion
- A list of standard commissioning "challenges" forcing on-the-fly actions will be compiled increasing in difficulty
- Will not be shared with participants prior to starting the exercise
- Other daytime activities could include:
- could attempt a DAQ upgrade w/ reversion (test change control/failover)
- minor change in XML interface requiring re-deployment of a single component or two (not the entire system)
Potential Verification Elements: (Note these might not be true verification of system requirement but more tests that elements can be verified)
- Networking
- Jeff Kantor asks if we might run a bit longer and test the backup network
- Data Acquisition, Archive, Ingest at ComCam Scale
- Data processing at ComCam Scale
- See also https://jira.lsstcorp.org/secure/Tests.jspa#/testPlan/LVV-P72 — will be DMTR-231 (not yet published)
OPS rehearsal II discussion (2020 May 20 )
Run June 2-4 inclusive.
Meeting noon each day . (See also LDM-643)
Take data in the morning and talk about it at noon. Discuss next day.
Comcam is available ! Some data transfers - but not triggered from outside (aka OCS) - test stand directory structure - rsync no acknowledgement.
Chuck asks how it gets in platform - cron job. Newer system should use less cron jobs.
Image format - they can be ingested but can not read them (in same nublado instance) .. since TSSW is on old version of stack so the SAL aware nublado container cannot read images. Regular LSP should work. Felipe says they are all good.
- Robert Gruendl will check images in NCSA
Weekly stack should be able to process the images. Headers are not same as CCS - need to check if there is any header info missing. Archiver generated files preferable but we may not get them. We can try camera generated files for now - no filter info in header, pretty much all other info is in the header.
- Merlin Fisher-Levine will check images if Robert Gruendl points him at data
Run calibration products generation and use those products on an image, not automatic, so perhaps generate them after each day?
Chuck suggests - Bias, Dark,Flats with illumination change - create master bias, dark, do ISR on flat images and produce master flat. Done manually for now but in ops its not necessarily automatic anyway.
Kevin or Petr can theoretically go in and change a lamp or such.
Success still looks like data arriving at NCSA and people looking at it.
- Robert Gruendlwill provide a script and we will think about whether there is a blocker based on Archiver availability.
Proposed Outline/Script for OPS Rehearsal #2
Simulated Time | Rehearsal Time (PST) | Activity | Actor(s) | Description |
---|---|---|---|---|
Day 0 | June 1, Noon | Pre-meeting | ALL | A hopefully quick session to make sure we have all our ducks in a row. |
#Afternoon (DAY 1) | #Morning June 2 | |||
9am | Acquire afternoon calibration. | ObsOps | Simple Calibration Sequence: N? x bias, N? x flat, Dark sequence | |
Calibrations Products Pipeline | USDF | |||
11am | Examine Calibrations | SciOps | ||
Noon | Afternoon Stand-up Operations Meeting. | ALL | ||
Close Out previous night's report | SciOps | |||
Reprocessing (if needed) | USDF | |||
Select Configurations and Calibration | SciOps | |||
#Night | ||||
#Afternoon (DAY 2) | #Morning June 3 | |||
9am | Acquire afternoon calibration. | ObsOps | Calibration Sequence (change illumination source) and repeat Day 1 | |
Calibrations Products Pipeline | USDF | |||
11am | Examine Calibrations | SciOps | ||
Noon | Afternoon Stand-up Operations Meeting. | ALL | ||
Close Out previous night's report | SciOps | |||
Reprocessing (if needed) | USDF | |||
Select Configurations and Calibration | SciOps | |||
#Night | ||||
#Afternoon (DAY 3) | #Morning June 4 | |||
9am | Acquire afternoon calibration. | ObsOps | Calibration Sequence: N x Bias, M x Flat (interrupt or change illumination during sequence) N x Flat (take good sequence) Dark Sequence | |
Calibrations Products Pipeline | USDF | |||
11am | Examine Calibrations | SciOps | ||
Noon | Afternoon Stand-up Operations Meeting. | ALL | ||
Close Out previous night's report | SciOps | |||
Reprocessing (if needed) | USDF | |||
Select Configurations and Calibration | SciOps | |||
#Night | ||||
#Afternoon (DAY 4) | #Morning June 5 | |||
Network configuration change | Kantor | The purpose being to test fail-over routing | ||
10am | Acquire afternoon calibration. | ObsOps | One set of BIAS(?) | |
Verify Data Arrival and Archiving | USDF | |||
No Need for General Meeting | ||||
next week | Write Tech Note to Summarize/Discuss Rehearsal | All | All really refers to |