Date

Attendees

Local: Brian Van KlaverenUnknown User (npease)Andy SalnikovIgor GaponenkoVaikunth ThukralAndy HanushevskyFritz Mueller (for second half)

Remote: Fabrice Jammes

Discussion items

Docker Infrastructure / Webserv

  • DM-6376 - Brian still working on it, need image pushed to docker hub and need to check with Fritz about official webserv repo. Make lsst-wide repo or our own for now?

  • Friday meeting with NCSA about infrastructure - no discussion of docker, mostly talked about kubernetes
  • Talk to KT about a session for system-wide docker infrastructure at AHM - with Josh/Square, us/Qserv, NCSA

  • DM-7103 is blocked since it depends on needing the aforementioned docker repo

 

Butler

  • DM-6988 and DM-7123 composite dataset requirements and design - writeup for design requirements etc is up on confluence for comments
  • Some people have commented already, waiting on Gregory's comments to finish DM-6988. Other comments will dictate status of DM-7123
  • Working on making policies pretty (DM-7178)

 

Qserv cluster and CI

  • Setting up Qserv cluster for Igor @ NCSA with OpenStack VMs

  • DM-7126 needs reviewed soon since Fabrice will be on vacation second half of August

  • CI build was broken - fixed now with updated documentation for review for John and AndyH (DM-7168)

  • Docker updated to v1.12. - new ticket for updating docker API which breaks Swarm. Shmux still works but we are moving from it anyway and the new interface is much simpler

 

L1DB Proto

  • AndyS has mysql-server and postgres-server running on ccqserv124, injecting data to see performance - both are pretty bad

  • Attempt changing indexing schema to look for improvements. Baseline scheme indices are not all useful, test changing those

  • PK may need changing on old tables.
  • Takes a long time to generate data and go through it to measure performance - may need another machine to run at the same time to test both DB's
    (Vaikunth to check with John on how to release one node from test cluster and let Andy know by Thursday)
  • Try not writing directly to DB, maybe smart cache or work with memory to avoid disk I/O (current bottleneck) - monitoring also shows that cpu-wait-io is quite large on ccqserv124

  • Overall looks like improvements are marginal through these optimizations, probably need SSDs or some better disk options

 

Data loader and MySQL dump

  • Igor working on DM-7053 mostly, assemble complete catalog of Stripe82 data using backups from in2p3 (4TB) to NCSA, 1 table left

  • Dumped some data already to NCSA, next step was to set up DB at lsst-db machine but ran out of space. No solution currently so trying VM now

  • After VM testing and fixes, loading up to 1.2Gbps in mariaDB of the dumped data at NCSA. The loading process seems to be capped due to mariadb constraints
  • Since data was not dumped in csv but in sql format, this could be cause of slowdowns

  • Need to make a ticket to resolve duplicated data issue (Igor Gaponenko)

 

X-SWAP

  • Made new cpu-wait-on-disk-I/O plots for John. Custom hostname removals/additions adapted

  • Mostly same as last week, implementing queryID into pipeline

  • Re-setting up tests with uniform sampling frequency
  • Vaikunth needs to talk to John about how to release 1 node from cluster setup (to give to AndyS)

 

XrootD

  • Finishing migration to 4.4 for XrootD

  • AndyH is reviewing Fabrice's documentation

  • Consider rpm while using eups tags? Probably not

  • Cleaning up orphan tickets - merged branches don't auto-close PRs when the ticket branch is not also updated in github before the merge

  • Working on DM-4473, finishing soon but have deployment issues

 

Other

  • AndyH on vacation next week
  • All hands meeting next week
  • Igor needs to talk to Greg Daues @ NCSA about calibrated images
  • Fritz sharing T/CAM presentation for NSF/DOE review