Running RC2 via Gen3 + Pegasus + Oracle  

Not as a production operator*, but as a special developer who shares repo with other developers.   Not delivering a polished, efficient service, but rather delivering the starting point for study and more work.   Besides the presentation, trying to have something usable/maintainable for the following 3 months or so while next set of work is ongoing.

Deliverables

Why These Deliverables

  1. Larger scale than ci_hsc

  2. Oracle admins can start seeing data flow and more easily provide feedback

  3. Increases visibility of Gen3 results

  4. Gen2 RC running is one of the blockers to deprecating Gen2

  5. Unblocks multi-registry work required for separation of production data from user data

  6. Provides insight for the Batch Processing Service design doc

Delivery Date

DMLT: June 4-6, 2019

To do  (Mid-level description to enable low level work and effort generation)

  1. Design how to work in Oracle for demo and 3+ months following Oracle schemas 
    1. Distinct use cases will use distinct schemas (Registry developer, Pipetask developer, RC data user). 
      1. RC runs - Oracle admins owned and maintained. 
      2. Weekly ci_hsc data loaded by Oracle admins into shared schema. 
      3. More frequent schema updates - User owned and maintained. 
        1. Need easy way to update schema with latest changes or start from scratch (Gen3) Schema evolution (schema migration scripts? (Gen3)) 
        2. Oracle Server setup (NCSA) 
          1. Recoverability - shared registries increase risk - Nightly exports of every schema will be scheduled with 2 week retention until data volumes warrant a different approach. 
          2. Availability - Standard maintenance windows and support during business hours. 
          3. Authentication - Oracle wallets (initially created by db admins). 
        3. Install software on lsst-dev:  Oracle client software + cx_Oracle (NCSA) 
  2. Increased testing:
    1. More tests in PipelineTasks, ctrl_mpexec, etc  (Jenkins + sqlite)
    2. Running ci_hsc with Gen3 (sqlite) in Jenkins
    3. Manual running of daf_butler tests against Oracle (pre-existing schema)
    4. Stretch goals:
      1. Jenkins running daf_butler tests against Oracle (full setup and teardown of schema)
      2. Running ci_hsc with Gen3 (oracle) in Jenkins
  3. Have different output DataStores for different users (details TBD) 
  4. daf_butler refactoring work to decrease additional changes needed to function with multiple RDBMS products (Gen3). 
  5. Oracle specific Butler changes (NCSA + Gen3, blocked by refactoring work) 
  6. Need RC2 initial repo (Gen3) 
    1. (prefer) Ingest raw executable (+ script to make easier to start from scratch) (Gen3)  Calib files may be ingest + script to set ranges 
    2. Or conversion from Gen2 HSC-RC2 reprocessing runs (like we do with ci_hsc) (ChrisW) Set initial WCS  (only explicit update, not select best)
  7.  More Pipeline Tasks to convert to Gen3  (DRP) 
    1. SkyCorrection (needs to be broken up into smaller tasks)
    2. JointCal (cannot be run on ci_hsc data set, needs more data)
  8. Change template to have unique filenames for RC runs 
    1. Hopefully just saving the templates to a file.   (NCSA) 
    2. Unknown if particular values in templates would require any Butler changes (Gen3)
  9. Batch Processing Service - NCSA 
    1. Assuming still using Andy’s pipetask as the activator 
    2. Need execution config (in particular cpu/memory requirements)
    3. Changes to allocNodes to set up HTCondor pool with partitionable slots
    4. Helpful status/monitoring scripts TBD 
    5. Note the following are blocked by Gen3 development and are not part of this deliverable:
      1. Must always start from beginning of submission (no retries or restarts) 
      2. Must be shared repo model (no job scratch, no Pegasus file transfer) 
  10. RC2 dataset challenges
    1. Single frame processing failures should not halt running
      1. current proposed solution: config option to always write files 
    2. Missing warp file should not halt running
      1. Ran into this with ci_hsc - config option exists to always write files
  11. ci_hsc/RC2 output usable from NCSA LSP 
    1. Oracle software accessible from NCSA LSP (NCSA + LSP/SQRE) 
    2. Not supporting Pegasus submissions from LSP for this milestone

* Why the note about not being Production?

Current lsst-dev Oracle Instructions

Timeline

Thurs Gen3 meetingsci_hsc/RC2 RunningNCSA - BPSNCSA - OracleGen3DRP
2/21/2019Completed ci_hsc gen2 run (sqlite to load into Oracle), ci_hsc gen3 run (sqlite3) in pegasus to provide feedback if things are no longer working.
Jim Bosch Oracle account
Must provide updated weekly Gen3 science configs prior to NCSA run
2/28/2019Completed ci_hsc gen2 run (sqlite to load into Oracle), ci_hsc gen3 run (sqlite3) in pegasus to provide feedback if things are no longer working.
Completed: Init Oracle accounts+wallets (Nate - 03/01), nightly DB backups (03/04), weekly ingest of ci_hsc, install Oracle client and cx_Oracle on lsst-dev (03/01)

3/7/2019

Completed ci_hsc gen2 run (sqlite to load into Oracle) , ci_hsc gen3 run (sqlite3) in pegasus to provide feedback if things are no longer working.



Completed Filename template checking script


3/14/2019

Completed ci_hsc gen2 run (sqlite to load into Oracle),
ci_hsc gen3 run (sqlite3) in pegasus to provide feedback if things are no longer working.



Decision about how to support multiple RDBMSs. Completed code changes for sqlite side. Code ready to start making Oracle changes.
3/21/2019

Completed ci_hsc gen2 run (sqlite to load into Oracle),
ci_hsc gen3 run (sqlite3) in pegasus to provide feedback if things are no longer working.

Completed BPS v0.1 exec config, allocateNodes (partitionable slots),Completed unique filename templates ci_hsc (where only requires config file change)

Completed: Easy way to initialize dev butler schema in Oracle
3/28/2019

Completed ci_hsc gen2 run (sqlite to load into Oracle), ci_hsc gen3 run (sqlite3) in pegasus to provide feedback if things are no longer working.





4/4/2019

Completed ci_hsc gen2 run (sqlite to load into Oracle), ci_hsc gen3 run (sqlite3) in pegasus to provide feedback if things are no longer working.


Completed: Oracle Butler works (no efficiency checks, just doesn't abort).  Selecting Oracle schema , Table and view names case-insensitive on DB side.


4/11/2019

Completed ci_hsc gen2 run (sqlite to load into Oracle), ci_hsc gen3 run (sqlite) in pegasus to provide feedback if things are no longer working.

Completed BPS v0.1 status/history scripts 


Completed: Scripts to initialize ci_hsc repo for a Gen3 run without latest weekly Gen2 outputs.
4/18/2019

Completed ci_hsc gen2 run (sqlite to load into Oracle), ci_hsc gen3 run (Oracle) in pegasus to provide feedback if things are no longer working.



Completed: Mechanisms to create RC2 init repo
4/25/2019

Completed ci_hsc gen2 run (sqlite to load into Oracle), ci_hsc gen3 run (Oracle) in pegasus to provide feedback if things are no longer working.


Completed: RC2 init repo avail in Oracle


Complete RC2 DRP pipeline includes always write output config options where needed.
4/25/2019
Freeze: features, API, schema
5/2/2019Start running RC2 and reporting problems



5/9/2019




5/16/2019




5/23/2019




5/30/2019

Completed: Can access Oracle Registry + GPFS DataStore from NCSA LSP

6/06/2019Milestone completed. Presentation during DMLT meeting June 04-06.  Includes instructions, any software installs, etc