You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 14 Next »

Please brain-dump here requests, requirements and suggestions for moderate-to-large scale processing tasks, storage needs, Science Platform service expansion, etc. that we'll need to undertake during FY2020 (October 2019 through September 2020). These will be used to inform Data Facility procurement.

Summary

Requested by

Estimated compute or storage requirements

Comments

What is needed? 

Lead person to ask question of 

Try to be as accurate as you can. 

Why is this needed; where should it go;...  

Qserv for AuxTel/Comcam commissioning data connected to lsp-stable – servers with internal disk;  How much internal disk in each server (Fritz Mueller Kian-Tat Lim) and I assume a head node? is this 1 or 2?   and do you know how much SSD you need in the head node?   shall I order one like what is on the lsp-int one today?   (same as PDAC?) 

servers + internal disks; 1? head node with SSDThis will be for qserv access for the -stable side of LSP for commissioning data and auxtel data.   
APDB machines couple of servers (failover?) with shared disk resources? or internal disks that are replicated between servers?  or 1 server for now because it's test?   

Alert processing database systems 


LSP development (lsp-int)8 nodes total Current 4 nodes is not enough to do Dask testing - need to double.
Stack-club / LSP-club support (lsp-stable)

Probably double the current setup capacity at most for 2020. 

Currently sized to support ~40 accounts. When we can serve catalogs (Qserv + Parquet) (2020 timescale), many other SCs will start to take an interest we will receive requests for more user accounts. Dask usage is unlikely to increase before we can serve catalog datasets (e.g Gaia, HSC, DESC DC2)

Optimized server pool for Firefly operations UNCONFIRMED

3-4 servers per heavily-used LSP cluster?

Probably would request fast (i.e., SSD) local disk.

Experience suggests that Firefly servers run on the existing "vanilla" Kubernetes cluster nodes run significantly more slowly (2-10x slower) than the existing dedicated server on lsst-demo .  The reason is not fully understood.  Experience at IPAC shows that performance is substantially improved by ensuring that jumbo frames are supported at all layers of the Kubernetes virtualization stack.  We have asked for this to be applied at NCSA and are waiting to do further debugging until that has been done.
It may turn out that performance is also significantly affected by the availability of fast local disk on the server nodes (as is available on lsst-demo ), but this is really difficult to understand until the network performance is improved.
  • No labels