Remote: Fabrice Jammes
- PDAC master and worker nodes at NCSA first became available to the team last week
- Fabrice working with the NCSA admins since that time to smooth out configuration issues
- NCSA has been very responsive in addressing issues in a timely fashion since the cluster became available
- Fabrice has adapted qserv installation scripts for the NCSA cluster, and so far, integration tests have run successfully on 10 of the worker nodes
- Docker storage driver configuration on the cluster is using loopback instead of overlay mode; this is likely to cause stability issues with docker and will need to be addressed
- Other remaining issues (currently being addressed by NCSA) include missing docker on some nodes, slow ssh connections between some nodes, docker installation location
- John wrapping up some bug fixes and will switch his focus to working with Fabrice on PDAC qserv deploy
Stripe 82 Dataset
- Igor attended a meeting with SUIT group and Yusra to discuss Stripe 82 catalog schema for the PDAC, provided SUIT with links and info
- Yusra would like to do some QA on the merged catalog to verify completeness and de-duplication
- SUIT has been given access to the PDAC catalog staging DB on NCSA nebula
- After consultation with John, Igor has decided it best to partition the Stripe 82 data into 340 strips. This represents 16x smaller chunk area than has been used on the IN2P3 cluster, and shrinks the Stripe 82 chunks from unmanageable ~100G to more managable ~2-15G. This would result in ~320K chunks for full sky, but for Stripe82 only ~1K of these will be non-empty. Overlap remains 1 arcmin.
- Partitioning run to GPFS has started, ~60% complete at this time
- Calexp transfer from IN2P3 IRODS to NCSA is ongoing, ~50% complete at this time, requiring periodic baby-sitting by Igor
- Calexp transfer from NCSA NFS will be started this week
- SUIT has requested coadds be included in the PDAC. This amounts to a few addition TB of data from IN2P3. Transfer underway, ~50% complete at this time. Igor to speak with Greg Daues to track down the complementary NCSA coadds.
- Brian is making progress on a unified DAX service container, and anticipates having something deployable by the end of this week.
- Currently waiting for some machine configuration issues to be resolved, but NCSA is working on it and it should be resolved soon.
- Backup plan is to cut over to running services on bare metal temporarily on the cluster if any surprise roadblocks with the containerized version.
- Current plan is to set up dbserv in front of a monolithic DB instance, loaded with a slice of the data that Igor has been producing. This way dbserv will present to SUIT for integration with partial catalog data, but with the intended PDAC delivery schema.
- imgserv and metaserv to be configured after dbserv, looking like first week of October
- AndyS has been reading existing code and design documents to become familiar with the current state of the task framework and intended design improvements.
- Has not yet been able to connect with Gregory (combo Andy's CERN trip and Gregory's general busy-ness) to discuss design requirements in detail though has has some email exchange.
- AndyS has done some testing with the existing L1/AP db prototype on one of the new Dell layered SSD/NVMe super-nodes at IN2P3. Current testing with a week's worth of simulated data seems likely to meet performance goals (compared to spinning-disk tests which definitely would not.) . Remains to be seen how this will scale. Measurements and plots coming; working on this whenever blocked on SuperTask work.
- Two of the composite-data-set issues (DM-7469 and DM-7719) are in or headed to review.
- Swift object store storage format plugin prototype has run aground of need to refactor the camera mapper.
- John has fix for missing rows under load bug up for review
- John has fix for slow-query-boot bug going up for review
- Ganglia shows some significant CPU under-utilization in some circumstances on the IN2P3 cluster; John has opened issue and will investigate further
- Vaikunth has participating in some prep work for the X-SWAP contract renewal
- Vaikunth and Fabien continue investigation of oddities with data sampling in the ELK stack on the IN2P3 cluster
- Vaikunth has additionally had some difficulties with the run-queries.py script not waking after sleeps; currently investigating
- Fabrice mentioned that MariaDB has re-issued their MaxScale proxy under a new "Business License"; Jacek took a quick look and noticed that this has some somewhat problematic terms (limits on free instance counts, etc). While we don't currently use MaxScale, this might be an indication that they are "testing" this license – if the roll it to MariaDB itself, we may need to consider jumping to a different MySQL fork.
- Jacek also mentions that while reading up on MySQL 5.7 changes, there are several that might create some work for us ahead (GRANT no longer supported, install db going away).
- IN2P3 continues to make progress on large-scale-qserv-cluster-on-demand-in-cloud research project.