Please brain-dump here requests, requirements and suggestions for moderate-to-large scale processing tasks, storage needs, Science Platform service expansion, etc. that we'll need to undertake during FY2019 (October 2018 through September 2019). These will be used to inform Data Facility procurement.
Summary | Requested by | Estimated compute or storage requirements | Comments |
---|---|---|---|
Commissioning team (incl. associated grad students) and Camera team (Science Platform service expansion) | Up to ~50 active users Note that stack-club currently has about 8 active users writing notebooks on the lspdev from commissioning and operations | Commissioning and Camera team scientists using lspdev for analysis of test images | |
Science Collaboration Users (Science Platform service expansion) | Up to ~40 active SC users Note that stack-club currently has about 8 active users from science collaborations writing notebooks on the lspdev | LSST science collaboration users are becoming active users of the SUIT notebook aspect on lspdev, especially via stack-club. While we are not scoped to support a large community of users on lspdev, early feedback from a small subset is valuable. I'm thinking about 40 science users from across all the Science Collaborations. We need to distinguish between user access to PDAC (which is intended for phases of formal user testing of full LSP prototypes interspersed with periods of integration work during which user access would be closed), and user access to lsst-lspdev, which is intended to be an ongoing resource for DM and LSST project staff development efforts but has been extended to be accessible to a larger community. | |
Storing spectrograph test data at LDF | ~40MB/image, 100 images/day (upper estimate) | Spectrograph test data from Tucson to make available to the wider DM and commissioning teams. Eventually AuxTel data from the summit as well. | |
DESC 1.2i | Leanne Guy | Subset of ~25 sq deg of the final DESC DC2 dataset. ~2000 images, ~20TB, ~ 1/3 size of HSC PDR1. | NCSA will host this for DESC and DM will use this DC2 subset for testing with LSST-like images and Qserv testing (KPM50) |
Gaia DR2 | ~1.2 TB, 1.7 billion rows | Understood that Gaia DR2 was already planned for? http://cdn.gea.esac.esa.int/Gaia/gdr2/ | |
WISE single-epoch | Gregory Dubois-Felsmann | 19-20 G rows, or 20-21 TB per year: about 60 TB of raw table data, plus indexes, etc., needed to load Years 2-4. | Once the simplified bulk-loading tools from the Database group are complete, we wish to load Years 2 and beyond (up through Year 4 currently available) of the NEOWISE single-epoch photometry, available here: https://irsa.ipac.caltech.edu/data/download/ . At least Year 2 was supposed to have been accounted for in earlier space requests, but this should be verified. Year 5 may be released during FY19. |
HSC RC reprocessing | John Swinbank and the DRP team | Should be possible to get these from Hsin-Fang Chiang based on FY18 activities. | Assuming that this will continue through FY19 at approx. the same cadence as during FY18. |
HSC PDR1 reprocessing | Again, estimate based on FY18 | Based on past performance, expect 2-3 full reprocessing runs during this year. We should also plan to load the whole of PDR1 into the PDAC LSP; however, there are concerns about making the data available outside the DM team (we have been asked to limit its use to "engineering" purposes). | |
HSC PDR2 | I believe that PDR2 will be available in summer 2019, and we'll likely want to reprocess that, but given that that is late in FY19 anyway, and adding some margin for the release date slipping, it may be premature to budget for it. | ||
Storing camera data at LDF | Estimate between 60-200 TB to bulk transfer. | Need to get estimates for expected ongoing rate & volume. | |
SSD for Firefly servers in lspdev | 512GB X2 | We want to use SSD for Firefly server cache to improve the file access performance, assuming two Firefly servers, each with 512GB SSD. | |
Mini-broker testing | Eric Bellm | per discussions at LSST2018 with Unknown User (mbutler) , 3 dedicated K8 nodes | We want to develop the mini-broker architecture and operations concept and understand its performance limits without interfering with other K8 uses. |
Regular AP reprocessing | Unknown User (emorganson) to estimate based on DES SN dataset | AP analogue of the HSC RC reprocessing. | |
DAX Web Services | 60 cores | Ideally, these requirements will fit with other machines that are deployed in the commons, but in real terms this is roughly two physical machines. Shared disk in GPFS is desired for storing asynchronous outputs from the web services. | |
PDAC DB node updates | 35 nodes, ~14 cores/node; 40 TiB storage/node w/ RAID controllers; at least 384 GiB RAM/node | We need to take PDAC Qserv instance on to next step on the glide-path to production scale; would like to run KPM50 and KPM75 on NCSA with NCSA infrastructure as well as at CC-IN2P3. The plan here would be to get 35 new nodes with contemporary hw and use those for KPM50; later in the year we'd join these with the current nodes for an even more expanded system fo KPM75. Note: no new Qserv czar node is anticipated to be needed in FY19; the two existing ones should be adequate. | |
Alerts DB | ~20 cores; ~5TB storage configurable for object store; budget as capacity in k8s commons? | For prototyping of the Alerts DB (generated alerts, probably noSQL, not the PPDB). Although there are currently no milestones anchoring this, it seems reasonable to expect some development activity and budget some capacity for this. | |
Summit shared filesystem | Kian-Tat Lim | 4 TB SSD; one NFS server machine | Concerned about reliability and uptime, but this should be good enough to start with. |
K8 commons expansion to support Science Platform development | 3x expansion over current usage to about 60 32-core nodes (from the current 20) |
| |
Object Storage for services running on the Kubernetes commons. | Frossie Economou | 5TB storage to back an object store service; object store service OR appropriate privileges for us to deploy a k8s-hosted minio service |
|
Jenkins test system for release manager | 1 node-equivalent |
|
1 Comment
Brian Van Klaveren
I did some poking around and tried to create some totals and machine configurations based on the following use cases above:
From that, I've come up with the following estimates:
Cores and local storage:
Shared Storage:
Breakdown of core usage:
Some possible machine configurations:
32 cores:
40 cores: