Data processing services for FY2020 (Procurement at LDF 2020)

Please brain-dump here requests, requirements and suggestions for moderate-to-large scale processing tasks, storage needs, Science Platform service expansion, etc. that we'll need to undertake during FY2020 (October 2019 through September 2020). These will be used to inform Data Facility procurement.

Summary	Requested by	Estimated compute or storage requirements	Comments
What is needed?	Lead person to ask question of	Try to be as accurate as you can.	Why is this needed; where should it go;...
Qserv for AuxTel/Comcam commissioning data connected to lsp-stable – servers with internal disk; How much internal disk in each server (Fritz Mueller Kian-Tat Lim) and I assume a head node? is this 1 or 2? and do you know how much SSD you need in the head node? shall I order one like what is on the lsp-int one today? (same as PDAC?)	Fritz Mueller	servers + internal disks; 1? head node with SSD	This will be for qserv access for the -stable side of LSP for commissioning data and auxtel data.
APDB machines	Fritz Mueller	couple of servers (failover?) with shared disk resources? or internal disks that are replicated between servers? or 1 server for now because it's test?	Alert processing database systems
LSP development (lsp-int)	Leanne Guy	8 nodes total	Current 4 nodes is not enough to do Dask testing - need to double.
Stack-club / LSP-club support (lsp-stable)	Leanne Guy	Probably double the current setup capacity at most for 2020.	Currently sized to support ~40 accounts. When we can serve catalogs (Qserv + Parquet) (2020 timescale), many other SCs will start to take an interest we will receive requests for more user accounts. Dask usage is unlikely to increase before we can serve catalog datasets (e.g Gaia, HSC, DESC DC2)
Optimized server pool for Firefly operations UNCONFIRMED	Gregory Dubois-Felsmann	3-4 servers per heavily-used LSP cluster? Probably would request fast (i.e., SSD) local disk.	Experience suggests that Firefly servers run on the existing "vanilla" Kubernetes cluster nodes run significantly more slowly (2-10x slower) than the existing dedicated server on `lsst-demo` . The reason is not fully understood. Experience at IPAC shows that performance is substantially improved by ensuring that jumbo frames are supported at all layers of the Kubernetes virtualization stack. We have asked for this to be applied at NCSA and are waiting to do further debugging until that has been done. It may turn out that performance is also significantly affected by the availability of fast local disk on the server nodes (as is available on `lsst-demo` ), but this is really difficult to understand until the network performance is improved.

Space shortcuts

Page tree