Running RC2 via Gen3 + Pegasus + Oracle
Not as a production operator*, but as a special developer who shares repo with other developers. Not delivering a polished, efficient service, but rather delivering the starting point for study and more work. Besides the presentation, trying to have something usable/maintainable for the following 3 months or so while next set of work is ongoing.
Deliverables
Results of running Gen3 DRP pipeline on RC2 HSC data on lsst-dev
Using Oracle as backend for registry of shared butler repository
Pegasus for workflow
Hsin-Fang can run Gen3 as part of the monthly HSC-RC2 reprocessing runs
Incorporate into procedures with lower expectations than Gen2 (failure rate, usefulness of outputs, operability, etc)
Instructions for friendly-user developers using Gen3 Butler with Oracle
Including accessing weekly ci_hsc repo and RC outputs
Assuming running in Pegasus would still be too rough for most friendly-user developers
Maintainable Registry code that does not require updating a copy of large portions of the code for each RDBMS for every code change
Why These Deliverables
Larger scale than ci_hsc
Oracle admins can start seeing data flow and more easily provide feedback
Increases visibility of Gen3 results
Gen2 RC running is one of the blockers to deprecating Gen2
Unblocks multi-registry work required for separation of production data from user data
Provides insight for the Batch Processing Service design doc
Delivery Date
DMLT: June 4-6, 2019
To do (Mid-level description to enable low level work and effort generation)
- Design how to work in Oracle for demo and 3+ months following Oracle schemas
- Distinct use cases will use distinct schemas (Registry developer, Pipetask developer, RC data user).
- RC runs - Oracle admins owned and maintained.
- Weekly ci_hsc data loaded by Oracle admins into shared schema.
- More frequent schema updates - User owned and maintained.
- Need easy way to update schema with latest changes or start from scratch (Gen3) Schema evolution (schema migration scripts? (Gen3))
- Oracle Server setup (NCSA)
- Recoverability - shared registries increase risk - Nightly exports of every schema will be scheduled with 2 week retention until data volumes warrant a different approach.
- Availability - Standard maintenance windows and support during business hours.
- Authentication - Oracle wallets (initially created by db admins).
- Install software on lsst-dev: Oracle client software + cx_Oracle (NCSA)
- Distinct use cases will use distinct schemas (Registry developer, Pipetask developer, RC data user).
- Increased testing:
- More tests in PipelineTasks, ctrl_mpexec, etc (Jenkins + sqlite)
- Running ci_hsc with Gen3 (sqlite) in Jenkins
- Manual running of daf_butler tests against Oracle (pre-existing schema)
- Stretch goals:
- Jenkins running daf_butler tests against Oracle (full setup and teardown of schema)
- Running ci_hsc with Gen3 (oracle) in Jenkins
- Jenkins running daf_butler tests against Oracle (full setup and teardown of schema)
- More tests in PipelineTasks, ctrl_mpexec, etc (Jenkins + sqlite)
- Have different output DataStores for different users (details TBD)
- daf_butler refactoring work to decrease additional changes needed to function with multiple RDBMS products (Gen3).
- Oracle specific Butler changes (NCSA + Gen3, blocked by refactoring work)
- Need RC2 initial repo (Gen3)
- (prefer) Ingest raw executable (+ script to make easier to start from scratch) (Gen3) Calib files may be ingest + script to set ranges
- Or conversion from Gen2 HSC-RC2 reprocessing runs (like we do with ci_hsc) (ChrisW) Set initial WCS (only explicit update, not select best)
- More Pipeline Tasks to convert to Gen3 (DRP)
- SkyCorrection (needs to be broken up into smaller tasks)
- JointCal (cannot be run on ci_hsc data set, needs more data)
- Change template to have unique filenames for RC runs
- Hopefully just saving the templates to a file. (NCSA)
- Unknown if particular values in templates would require any Butler changes (Gen3)
- Batch Processing Service - NCSA
- Assuming still using Andy’s pipetask as the activator
- Need execution config (in particular cpu/memory requirements)
- Changes to allocNodes to set up HTCondor pool with partitionable slots
- Helpful status/monitoring scripts TBD
- Note the following are blocked by Gen3 development and are not part of this deliverable:
- Must always start from beginning of submission (no retries or restarts)
- Must be shared repo model (no job scratch, no Pegasus file transfer)
- RC2 dataset challenges:
- Single frame processing failures should not halt running
- current proposed solution: config option to always write files
- Missing warp file should not halt running
- Ran into this with ci_hsc - config option exists to always write files
- Single frame processing failures should not halt running
- ci_hsc/RC2 output usable from NCSA LSP
- Oracle software accessible from NCSA LSP (NCSA + LSP/SQRE)
- Not supporting Pegasus submissions from LSP for this milestone
* Why the note about not being Production?
Missing separation of production data from user data (requires user write access to production schema)
The outputs of a production pipeline should not be directly written to the production Data Backbone (or central database in general) to allow the Batch Production Service to:
Minimize database connections
Use various methods for retries and restarts
Many missing Batch Production Service production features, some of which are blocked by not-yet-implemented Gen3 features.
Current lsst-dev Oracle Instructions
- For this milestone, no attempts will be made to make Oracle a part of the lsst_stack.
- Oracle instantclient and cx_oracle are currently installed on lsst-dev in /project/production/oracle.
- Example Oracle environment settings are in /project/production/oracle/oracle_env-v1.sh. Should not affect environment set up by LSST stack.
- Untar Oracle wallet tar given to you by admin in some directory in your home directory..
- The admin will also have given you a net service name for the wallet (e.g., gen3_cred_yourlogin_1). If provided the whole tnsnames.ora file, the net service name is the top/outer-most key.
- You will also need a sqlnet.ora and tnsnames.ora files (make sure the path in sqlnet.ora points to the wallet files).
- Set environment variable TNS_ADMIN to point to the directory where the *.ora files live.
- Test connection:
- sqlplus:
sqlplus /@<net service name>
select user from dual;
quit; - Use test python program that prints who you connected to Oracle as if successful (Note: Must have python3 in your path. If you haven't already, source /software/lsstsw/stack/loadLSST.bash) :
/project/production/oracle/test_conn.py <net service name>
- sqlplus: