Upcoming SUI/T & DB integration cluster at NCSA
- Meetings are underway with NCSA to specify hardware for an integration environment to be hosted at NCSA. This will provide additional resources to the project of a similar scale to those that have been provided by IN2P3.
- Hardware requirements are based on considerations of our experiences at IN2P3, on some previously specified development/testing needs (DB Hardware Planning) and on some consideration of resources that would be necessary to host an instance of the PanStarrs dataset.
- Starting to look like two 25-node clusters on bare metal, plus a handful of additional servers for DAX web services, plus a handful of additional servers for SUI.
- Initially to be hosted behind firewall at NCSA with VPN access.
- NCSA has funds and is looking to get hardware orders out towards the end of April.
Plan refresh during X16
- We need to clean / refresh / sanity-check our long-range plan (through construction, at least to FY20) during the X16 cycle, adding addition detail where clear. Other teams are working on this as well.
- Mark complete things we have done in passing, remove things that are no longer relevant, add things known needed but missing.
- Improve descriptions of epics and stories where we now have more detail.
- Double check that story points are specified consistently as uninterrupted time and not calendar-time-including-overhead.
- Check that sizing of ongoing bucket epics like "Qserv Refactoring" is sensible.
- Prioritize which features could be delayed or dropped if we start to get squeezed.
- Fritz to work 1-1 or in small groups with devs to refresh the plan, so everybody isn't tortured by endless, mind-numbing, planning meeting(s).
- Mike has been working on doc refresh for secondary index in LDM-135; pull-request to integration branch coming soon.
- Nate has some Butler doc for LDM-463; needs to rebase ticket branch then pull-request to integration branch.
- Brian's prototype rework of webserv to support a TAP-like interface is ready for review.
- Webserv API will change non-backward-compatibly with these commits. Brian has talked to Tatiana about this; feedback is that this will be okay if we can maintain the previous server instance during transition period.
- Includes doc update, and some ancillary PEP8 cleanup.
- Nate is working on persistence config improvements.
- Nate, Fritz, and Kian-Tat Lim to meet soon to specify stories for next upcoming sprint.
- Serge has finished sphgeom Python wrap, and is now digging in on spatial indexing utilities.
- AndyH working on memmanreal changes to support new shared-scan code.
- John and Vaikunth still having difficulties with runqueries.py on the cluster:
- John has observed a problem with dropped client connections when issuing a batch of low-volume queries, though proxy+czar still seems up
- John has observed what appears to be some sort of intermittent memory leak on the proxy+czar.
- Strange perf. problems, even with czar-local queries like "SHOW VARIABLES".
- John working with valgrind to try get some info to help track these down.
L1 database design
- AndyS is back from vacation and continuing to dig in on design requirements.
- Thinking now on what kind of indexing will be required to support both region-based and time-based queries.
- Will start making up some dummy data soon to work out details of the necessary schemas.
- Fabrice has cloned data to second 25-node cluster, and it is now be available for use.
- Enhancements made to deployment scripts to generalize for use on multiple clusters; pull-request coming soon.
- Change needed so wmgr can read master hostname from env var.
- Latest rumors now put arrival of data in the September time frame.
- IN2P3 continues to work on necessary logging/monitoring package configuration.
- Vaikunth talked with John about the new queryid feature; plan to gather stats per query-type by timeslicing query issue and log queries.
- Percona session on server monitoring may prove useful.
Vertical partitioning tests
- Test results seem to indicate that MyISAM joins tables in query order, while InnoDB joins by decreasing table size.
- Vaikunth experimenting now with optimizer tuning parameters.
- Will continue these experiments. Verify split between query compilation and query run. Verify that some of these experiments aren't running with cached plans.
- We have a couple of good leads now, one pretty far along, the other at the introduction/screening stage.
- Python job listing is in progress – there was some iteration on the description to make it meet the requirement of the Stanford staffing department, but they should have what they need now.
- Connection timeout fix seems good! Needs an additional error check added, then going up for review.
- Alex Szalay stopped by earlier in the week to chat about some new developments with MS SQLServer.
- Brian mentions that SUI/T team would really like a continuous deployment model for DAX services as we move forward with more integration (e.g. on the upcoming NCSA integration cluster). We should add stories for this to our plan for the next cycle and figure out what we might need from SQuaRE.