Please brain-dump here requests, requirements and suggestions for moderate-to-large scale processing tasks, storage needs, Science Platform service expansion, etc. that we'll need to undertake during FY2020 (October 2019 through September 2020). These will be used to inform Data Facility procurement.
Summary | Requested by | Estimated compute or storage requirements | Comments |
---|---|---|---|
What is needed? | Lead person to ask question of | Try to be as accurate as you can. | Why is this needed; where should it go;... |
Qserv for AuxTel/Comcam commissioning data connected to lsp-stable – servers with internal disk; How much internal disk in each server (Fritz Mueller Kian-Tat Lim) and I assume a head node? is this 1 or 2? and do you know how much SSD you need in the head node? shall I order one like what is on the lsp-int one today? (same as PDAC?) | servers + internal disks; 1? head node with SSD | This will be for qserv access for the -stable side of LSP for commissioning data and auxtel data. | |
APDB machines | couple of servers (failover?) with shared disk resources? or internal disks that are replicated between servers? or 1 server for now because it's test? | Alert processing database systems | |
LSP development (lsp-int) | Add equivalent of 5 more nodes to the integration cluster. |
| |
Stack-club / LSP-club support (lsp-stable) | Add equivalent of 10 more nodes. |
| |
Optimized server pool for Firefly operations UNCONFIRMED | 3-4 servers per heavily-used LSP cluster? Probably would request fast (i.e., SSD) local disk. | Experience suggests that Firefly servers run on the existing "vanilla" Kubernetes cluster nodes run significantly more slowly (2-10x slower) than the existing dedicated server on lsst-demo . The reason is not fully understood. Experience at IPAC shows that performance is substantially improved by ensuring that jumbo frames are supported at all layers of the Kubernetes virtualization stack. We have asked for this to be applied at NCSA and are waiting to do further debugging until that has been done.It may turn out that performance is also significantly affected by the availability of fast local disk on the server nodes (as is available on lsst-demo ), but this is really difficult to understand until the network performance is improved. | |
Jenkins | Add equivalent of 2 nodes for dedicated jenkins execution |
|