Introduction

ATTENTION: please, make sure you've read the read the parent document to which this one is linked in order to understand the general concepts of the semi-parallel catalog loading protocol which is being utilized here!

This document explains specific steps which had to be taken in a context of loading a family of the WISE catalogs into the Qserv instance at NCSA/PDAC. There have been two loading efforts in this context. Each resulted in catalogs stored in a separate database. Both are covered in DM-9372:

  • wise_00 : the LSST-equivalent of the Object and ForcedSource tables
  • wise_ext_01 : the so called Reject table which is similar to the Object one except it contains faint or rejected objects which didn't get into the Object table

All examples shown below were taken from the second loading effort. All loading operations were performed from the master node lsst-qserv-master01 of the PDAC cluster. The rest of this document explains specific commands which had to be executed and methods used for monitoring the results of each stage.

Deploying and configuring the tools

Cloning then from GitHub to he GPFS file system at the head machine lsst-dev of the LSST Verification cluster:

% mkdir -p /datasets/gapon/development/
% cd /datasets/gapon/development/
% git clone https://github.com/lsst-dm/db_pdac_wise.git

The next step was to modify the dataset specification file to configure the number of key parameters used by the loading tools. This included:

  • the path name to a location of the input data on the GPFS file system as it's seen by the master and worker nodes of the cluster
  • the name of the new database
  • the names of all tables which will be loaded

There package has a two such files each tuned for the corresponding target database:

% ls -al /datasets/gapon/development/db_pdac_wise/scripts/ 
-rw-r--r-- 1 gapon grp_202  732 May 25 13:29 dataset.bash.wise_00
-rw-r--r-- 1 gapon grp_202  617 May  8 01:20 dataset.bash.wise_ext_00

Hence the first step before loading one of those databases is to create a symbolic link pointing to the right file:

% cd /datasets/gapon/development/db_pdac_wise/scripts/
% ln -s dataset.bash.wise_00 dataset.bash

This is how the configuration file should look like:

% cat /datasets/gapon/development/db_pdac_wise/scripts/dataset.bash
...
# Output database and tables

OUTPUT_DB="wise_00"

OUTPUT_OBJECT_TABLE="allwise_p3as_psd"
OUTPUT_FORCED_SOURCE_TABLE="allwise_p3as_mep"

OUTPUT_NONPART_TABLES="allwise_p3am_cdd allwise_p3as_cdd allsky_4band_p1bm_frm allsky_3band_p1bm_frm allsky_2band_p1bm_frm"

# Data to be loaded into new database

INPUT_DATA_DIR="/datasets/gapon/production/wise_catalog_load/production_load"

All scripts of which are found in the scripts/ subfolder of the package relies upon these parameters.

Another important file which is not supposed to be modified under normal circumstances contains hard-wired parameters of the PDAC cluster and Qserv. This is the relevant fragment of the file:

% cat /datasets/gapon/development/db_pdac_wise/scripts/env.bash
..
# Custom versions of the LSST Stack and the latest version of the Qserv
# management scripts.

BASE_STACK="/datasets/gapon/stack"
QSERV_PKG="/datasets/gapon/development/qserv"

# Qserv deployment parameters (adjust accordingly)

MYSQL_PASSWORD="CHANGEME"

MASTER="lsst-qserv-master01"
SSH_MASTER="qserv-master01"

WORKERS=`seq --format 'lsst-qserv-db%02.0f' 1 30`
SSH_WORKERS=`seq --format 'qserv-db%02.0f' 1 30`

QSERV_MYSQL_DIR="/qserv/data/mysql"
QSERV_DUMPS_DIR="/qserv/data/dumps/$OUTPUT_DB"

# The default location for the log files created on Qserv node

LOCAL_TMP_DIR="/tmp/$OUTPUT_DB"
LOCAL_LOG_DIR="$LOCAL_TMP_DIR/log"


Creating the database

Script:

% /bin/sudo -u qserv /datasets/gapon/development/db_pdac_wise/scripts/run_create_database.bash \
--delete \
--verbose \
>& run_create_database.log&

NOTE: the --delete option which will delete the database (and all relevant data) if it existed.

Inspected log files (on the master node):

% ls -al /tmp/wise_00/log/qserv-*_create_database.log

Each log file should report:

deleting database: wise_00
creating database: wise_00
configuring access privileges for database: wise_00


Configuring CSS

This stage requires running two scripts:

% /bin/sudo -u qserv /datasets/gapon/development/db_pdac_wise/scripts/setup_css.bash >& setup_css.log
% /bin/sudo -u qserv /datasets/gapon/development/db_pdac_wise/scripts/setup_css_nonpart.bash >& setup_css_nonpart.log

Verifying results:

% source /datasets/gapon/stack/loadLSST.bash
% setup -t qserv-dev qserv_distrib; setup -j -r /datasets/gapon/development/qserv
% qserv-admin.py "DUMP EVERYTHING;" | grep wise_00
...

NOTE: the last command should show new CSS entries related to the database wise_00.

Loading the director (allwise_p3as_psd) table

Script:

% nohup /bin/sudo -u qserv \
  /datasets/gapon/development/db_pdac_wise/scripts/run_load_object.bash \
  --verbose \
  >& run_load_object.log&


Inspecting log files after the completion of the operation to make sure all went well (the log files are located on the *master* node):

% tail /tmp/wise_00/log/qserv-db*_load_object.log | grep 'Finished'

...

This should report exactly 30 lines which have ** Finished loading **

Generating and loading the secondary index

This operation has two phases:

  1. harvesting triplets of the (source_id,chunkId,subChunkId) for all object (allwise_p3as_psd) tables access all worker nodes. The results will be stored locally as TSV files (one file per object table) on each worker node
  2. transferring the TSV files from the worker nodes to the master node and loading them into the secondary index table

This script does the first stage:

% nohup \
  /bin/sudo -u qserv \
    /datasets/gapon/development/db_pdac_wise/scripts/run_dump_triplet.bash \
    --verbose \
    >& run_dump_triplet.log&

There are a few progressive ways to monitor the status of the operation. The first one is to track the script s which may be still running on the worker nodes by launching this from the master node:

% for worker in `seq --format="qserv-db%02g" 1 30`; do
    ssh -n $worker 'ps -ef  | grep dump_triplet'
done

Then if no active scripts were found on the nodes one should inspect the log files for a sign of any issues (for example, the scripts might terminate and report errors):

% for worker in `seq --format="qserv-db%02g" 1 30`; do
    ssh -n $worker 'for f in `ls /tmp/wise_00/log/dump_triplet_chunks.chunks_*.log`; do tail -1 $f; done'
done
Dumping triplets of chunk: 21772
Dumping triplets of chunk: 27203
Dumping triplets of chunk: 30849
Dumping triplets of chunk: 34698
...

It may also make a sense to cont the number of successfully completed operations by counting the relevant rows in the log files. The resulting number must match the total number of unique chunks in the partitioned catalog:

% for worker in `seq --format="qserv-db%02g" 1 30`; do
    ssh -n $worker 'cat /tmp/wise_00/log/dump_triplet_chunks.chunks_*.log'
done | grep "Dumping triplets of chunk: " | wc -l
146322

Finally, if there is any doubt on a success of the operation one can also inspect the per-chunk log files at:

% for worker in `seq --format="qserv-db%02g" 1 30`; do
    ssh -n $worker 'for f in `ls /tmp/wise_00/log/dump_triplet.*.log`; do tail $f; done'
done

The second (loading) step is implemented by this script:

% nohup \
  /bin/sudo -u qserv \
    /datasets/gapon/development/db_pdac_wise/scripts/load_secondary_index.bash \
    --verbose \
    --get-remote-triplets \
    >& load_secondary_index.log&

IMPORTANT: option --get-remote-triplets is required to pull TSV files with the previously dumped triplets from the worker nodes. Otherwise a previously cached files will be assumed. And if no files were found in the local (master's cache) then the resulting secondary index table will be empty.

The progress of the operation can be monitored by inspecting the last log file:

% cat load_secondary_index.log
nohup: ignoring input
Pulling triplets from the worker nodes:
  qserv-db01
  ...

One may also check the TSV files retrieved from the worker nodes and stored in this local directory of the master node:

% /bin/sudo -u qserv ls -al /qserv/data/dumps/wise_00/ | grep tsv
-rw-r--r-- 1 qserv qserv   1223093 Jun 10 16:01 idx_object_100016.tsv
-rw-r--r-- 1 qserv qserv   1246813 Jun 10 16:01 idx_object_100046.tsv
-rw-r--r-- 1 qserv qserv   1247068 Jun 10 16:01 idx_object_100136.tsv
-rw-r--r-- 1 qserv qserv   2178682 Jun 10 16:01 idx_object_100166.tsv
-rw-r--r-- 1 qserv qserv   1467867 Jun 10 16:01 idx_object_100196.tsv
-rw-r--r-- 1 qserv qserv   1241449 Jun 10 16:01 idx_object_100226.tsv
...

A simple test for the number of triplets after the completion of the file retrieval phase of the index loading operation. This number should match the total number o unique chunk in the catalog:

% /bin/sudo -u qserv ls -al /qserv/data/dumps/wise_2band_00/ | grep tsv | wc -l
146322

Loading the dependent (allwise_p3as_mep) table

Script:

% nohup /bin/sudo -u qserv \
  /datasets/gapon/development/db_pdac_wise/scripts/run_load_forced_source.bash \
  --verbose \
  >& run_load_source.log&

Inspecting log files after the completion of the operation to make sure all went well (the log files are located on the master node):

/tmp/wise_00/log/*_load_forced_source.log

This should report exactly 30 lines which have ** Finished loading **

Loading non-partitioned tables

Running this script from the master node:

% nohup /bin/sudo -u qserv \ 
  /datasets/gapon/development/db_pdac_wise/scripts/run_load_nonpart.bash \
  --verbose \
  >& run_load_nonpart.log&

Checking the log files (on master) after the completion of the loading operation:

% tail /tmp/wise_00/log/qserv-db*_load_nonpart.log | grep 'Finished'
...

Setting up the empty chunk list

The empty chunk list file was generated by a simple Python script to include a list of chunk numbers in a range of 0 to 1000000 except those which were found in the input dataset at the pre-staging area. The list was installed on the master node at:

% /bin/sudo -u qserv ls -l /qserv/data/qserv/ | grep wise_00
-rw-r--r-- 1 qserv qserv 5922929 Feb 23 23:55 empty_wise_00.txt

TODO: explain how to generate the list from scratch.

Enabling the new database in Qserv

This is the last stage of the database loading protocol which needs to be implemented before proceeding to the final tests. The database is enabled in *Qserv* by running this script (from the master node):

% /bin/sudo -u qserv \
  /datasets/gapon/development/db_pdac_wise/scripts/run_qserv_enable_database.bash \
  --verbose \
  >& run_qserv_enable_database.log&

Testing results

Issues

In a course of the initial testing of the dataset 3 blocking issues with *Qserv* were discovered. The issues were triggered by a scale or table schemas of the dataset.

  • (tick) DM-9672 - the cluster hangs with very large number of chunks due to a deadlock in an implementation of the worker component
  • (tick) DM-9706 - The WISE object identifiers are strings and this is not supported by the Qserv implementation
  • (minus) DM-9736 - Double quotes around column names in SQL statements cause a parser error. This prevents users from quoting SQL reserved words like dec (for declination) when making queries against the allwise_p3as_psd or allwise_p3as_mep tables. A quick workaround to the problem was to modify the schema of all relevant tables to rename those columns to decl.

After discovering these problems a work began to improve Qserv.

Performance and sizes of the tables

When the first two previously mentioned problems were resolved then a few simple queries were successfully launched against the new dataset in order to test the performance of the *Qserv* service:

SELECT COUNT(*) FROM wise_00.allwise_p3as_psd;
+----------------+
| SUM(QS1_COUNT) |
+----------------+
| 747634026      |
+----------------+
1 row in set (6 min 6.76 sec)
SELECT COUNT(*) FROM wise_00.allwise_p3as_mep;
+----------------+
| SUM(QS1_COUNT) |
+----------------+
| 41997706856 |
+----------------+
1 row in set (6 min 30.62 sec)

The secondary index optimization

The next set of queries tested the secondary index:

SELECT source_id FROM wise_00.allwise_p3as_psd WHERE source_id='0075m016_ac51-028478';
+----------------------+
| source_id            |
+----------------------+
| 0075m016_ac51-028478 |
+----------------------+
1 row in set (0.40 sec)
SELECT source_id FROM wise_00.allwise_p3as_psd WHERE source_id IN ('0075m016_ac51-028478');
+----------------------+
| source_id            |
+----------------------+
| 0075m016_ac51-028478 |
+----------------------+
1 row in set (0.34 sec)
ELECT source_id FROM wise_00.allwise_p3as_psd WHERE source_id IN ('0075m016_ac51-028478','0000m016_ac51-000001');
+----------------------+
| source_id            |
+----------------------+
| 0000m016_ac51-000001 |
| 0075m016_ac51-028478 |
+----------------------+
2 rows in set (0.35 sec)


These can be compared with the poor performance of the full-table scan query yielding the same result:

SELECT source_id FROM wise_00.allwise_p3as_psd WHERE source_id LIKE '0075m016_ac51-028478';
+----------------------+
| source_id            |
+----------------------+
| 0075m016_ac51-028478 |
+----------------------+
1 row in set (6 min 17.70 sec)

Loading the Reject table into database wise_ext_01

This is essentially the same procedure which needs to be repeated after switching to a different configuration file:

% cd /datasets/gapon/development/db_pdac_wise/scripts/
% ln -s dataset.bash.wise_ext_00 dataset.bash

NOTE: that this database just one director table, and it doesn't have any other partitioned (dependent) or non-partitioned (fully replicated) tables.

Also watch for the changed database name of wise_ext_00 instead of wise_00 when looking for the log files. And the same rule applies to the name of the secondary index table.


  • No labels