Launch an instance based on the AMI "centos_condor_w38"To avoid running as "nobody" and file permission issues in /home/centos/, add UID_DOMAIN = $(EC2PublicIP) to /etc/condor/config.d/localwget https://lsst-web.ncsa.illinois.edu/~hchiang2/aws_poc/pegasus_w38.tarUntarcd pegasus_w38rm output/* (I simply shouldn't have included them in the tarball. There are a few more unnecessary files too.)Edit sites.xml: modify PEGASUS_HOME in two places<profile namespace="env" key="PEGASUS_HOME">/usr/</profile>
Add ~/.lsst/db-auth.yaml ~/.aws/config ~/.aws/credentialsEnsure they have the correct mode (600)
~/.condor
/privateKeyFile ~/.condor/publicKeyFile and chmod
Edit wf.dax and change OUTCOL to a new string; this is a new collection name../run_peg.sh
With the new AMI:
- Launch centos_condor_w38_master (ami-091a340d2d1c97d99)
Use HTCondorAnnex-SecurityGroup-SecurityGroup-O6WZKYYYQ04C (Allow SSH and HTCondor from anywhere) and
launch-wizard-1 (for RDS)- For RDS, need Ports 5432 tcp with the sg in source
- For Condor, need port 9618 and 22
- ssh into the instance
- Add 5 credential files ~/.lsst/db-auth.yaml ~/.aws/config ~/.aws/credentials ~/.condor/privateKeyFile ~/.condor/publicKeyFile and chmod 600
aws s3 cp --recursive s3://pegasus38 pegasus38
- cd pegasus38
- Ensure PEGASUS_HOME is correct (/usr/share/pegasus/) in sites.xml
- Edit wf.dax to change the output collection name (-o UNIQUE)
chmod +x run_peg.sh
- ./run_peg.sh
- To get workers via Annex:
- Edit the file annex-spot-w38-r5-new.json to have the correct AMI-ID ("ImageId": "ami-040a9d3a75ff0447c")
condor_annex -slots 1 -aws-spot-fleet-config-file /home/centos/pegasus38/annex-spot-w38-r5-new.json -annex-name test1 -duration 1
Also good to do:
- To help debugging condor_annex, before trying to use condor_annex, this can tell us if the same
collector is seeing both requests- add COLLECTOR_DEBUG = D_COMMAND to /etc/condor/config.d/local and then run condor_reconfig -collector
export _CONDOR_ANNEX_GAHP_DEBUG=D_FULLDEBUG
before running condor_annex as well, which would allow us to check which collector the Lambda function is checking.
- Where to find the log?
- Run
condor_config_val LOG
to find the log directory (usually /var/log/condor); the file will be called CollectorLog
- Run
How was the workflow made
- Use this to create a quantum graph: https://github.com/lsst/ci_hsc_gen3/blob/w.2019.38/bin/pipeline.sh
- Setup the ci_hsc_gen3 repo for the configs
- Point the -b argument to the wanted butler.yaml
- Instead of doing "run", do "qgraph -q ciHsc_w38.pickle"
setup the ticket branch u/hfc/DM-21390 of ctrl_mpexec
pipetask -b PATH/butler.yaml -o whatever qgraph --show workflow --qgraph PATH/ciHsc_w38.pickle --save-qgraph input/quantum > wf
(Notes: the ticket is in progress and the command line interface will change.)To generate a dax file
aws s3 cp s3://hsc-rc2-test1-hfc/pegasusize.py .
python pegasusize.py