AWS PoC Meeting 2019-05-30

Time, Date & Place

14:00 Pacific 2019-05-30 on Amazon Chime https://chime.aws/1930107527

Attendees

Kian-Tat Lim

Michelle Gower

Unknown User (mbutler)

Greg Daues

Steve Pietrowicz

Hsin-Fang Chiang

ChrisM

Aaron

GregT

Miron

Discussion Items

Sanjay, GregD, and Hsin-Fang had a good time in HTCondorWeek thanks to the HTCondor team.
- GregT showed Hsin-Fang & GregD how to start a condor pool on AWS; notes here
Using HTCondor Annex & Spot
- Data on EFS; with a larger (mock) workflow than ci_hsc ; not on Spot
- copying 4GB into EFS took 20 min. Question to AWS folks: what's the best way to load large datasets into EC2?
- It sounds longer than it should. But we should try to move out of EFS as soon as possible.
- If we really have to, there are utilities to move data in parallel
- Should start loading larger dataset into Amazon storage; hope to put directly into S3 storage as a repo. (action) Hsin-Fang will ask Dino
- Next dataset is ~50,000 files and each file is ~20MB
- (action) Hsin-Fang will try GregT's steps of the annex run
- a demo with Spot
Using S3 as the data storage
1. Butler S3 datastore
  - PR: https://github.com/lsst/daf_butler/pull/159
  - functionality is complete so we may work using it
2. HTCondor S3 plug-in for data transfer, shared-nothing
  - Likely need new utilities in Butler, such as local sqlite registry, to do shared-nothing. Need to provide URLs? Other prerequisites?
  - Condor needs to know job input/output S3 URLs
  - Need to create a local butler repo – however not in LSST-DM schedule in the next few months
  - (stopgap) One shared registry. Need 3 new utilities: URL generation, S3 to Posix datastore copy, Posix to S3.
    - Run as a Posix datastore locally at workers
    - LSST is unlikely have the resources in the near future to do such work in BPS though
  - Hsin-Fang still remembers to give GregT an example without S3 – will do
Q: operationally, how not to destroy out data on S3 while running tests?
- bucket sync
- read-only delete-protect

Space shortcuts

Page tree

Time, Date & Place

Attendees

Discussion Items