This tutorial was adapted from instructions for executing the Summer 2013 Data Release Production (DRP) on XSEDE platforms, and test runs on the LSST development cluster. This particular data challenge used a database backend to select images based on quality (i.e. seeing, airmass, quality flag), store and retrieve calculated zeropoints, and store and retrieve centroids for forced photometry. The preferred way to retrieve these data is through the Butler. If you are using a more recent version of the stack, a database is unnecessary to make co-adds and perform forced photometry.
To follow this particular demo, however, you must have write access to a database server (default is NCSA). The database server must contain the input SDSS
SeasonFieldQuality_Test table; the server will also be used to create the output catalogs of results. If you do not have write access to the NCSA database or wish to run locally, please follow the instructions for setting up a local database server compatible with this demo at Setup a Database for Stripe82 Demo.
You must create a policy file under your home directory, in a sub-directory called
/.lsst. The first is a DB authorization file to present login credentials to whatever DB server is being used. To do this, create a
$HOME/.lsst/db-auth.paf file with the following content:
If you are using a local database server, adapt to point to localhost and your local port number. Authorization information for more than one DB server may be included, if relevant.
The directory must have 700 permissions and
.db-auth.paf must each have 600 permissions (
go-rwx in all cases).
You must have the LSST Stack installed on your system (see LSST Stack Installation) to proceed. The commands listed in the code blocks below primarily assume you are using the bash shell; analogous commands for (t)csh should work as well. If you have not already done so, load the LSST environment: where
Load the LSST Environment
$INSTALL_DIR is the directory where the LSST Stack was installed.
You must have the LSST Stack installed on your system (see LSST Stack Installation) to proceed. The commands listed in the code blocks below primarily assume you are using the bash shell; analogous commands for (t)csh should work as well. If you have not already done so, load the LSST environment:
Setup the packages necessary for processing data:
It is convenient to define an environment variable for the path to your local processing directory:
Download Tutorials Package
This tutorial makes use of some utilities in the tutorials package. Fetch it from the LSST source code repository to a working directory, and set up an environment variable:
Create a Repository
The first step is to create a data repository, which will contain the input images, a camera-specific mapper of files in the repository, and a registry of metadata in the repository. First, create the directory structure and define an environment variable for the location of the respository:
The mapper is specific to a particular camera and is specified in a file which, for SDSS image data, consists of the following single line of content:
Define the Region of Interest
Create a SkyMap
Create a SkyMap, which is the mapping from the sky region of interest to the output geometry of the Co-Adds. In this example the patches will cover the only a portion of the declination range, but the geometry itself is defined over the full range: -1.2° to +1.2° which spans 12 patches. The output will appear in a scratch directory,
tempSkyMap_dir, which will be created by the task if needed.
Given this SkyMap, the next step is to identify those patches that cover it in the range of interest 5.0° < RA < 5.2° and 0.05° < Dec +0.25°, using the
reportPatches.py task in the pipe_tasks package (a small boundary will be added; this selection anticipates input fields from only two camcols). The results are stored in an ASCII file:
Although the patches do happen to reside in the SkyMap Tract=0, the required
--id argument is not actually used by
Identify Images to Process
The next step is to generate a list of r-band images that overlap the selected region of the SkyMap, using
reportImagesToCoadd.py task in the pipe_tasks package.
This script makes use of the
SeasonFieldQuality_Test table in a mysql database.
Both the N and S strips are selected. No images are excluded on the basis of the SDSS (weather) quality metric, so that they will be available in a later step for forced-source photometry:
select.database may be specified in the above task to point to a different database server than the default (on the LSST cluster). However, the selected server must contain a copy of the SDSS
Regrettably, the output files
rawInputs.txt do not exactly match the expected format for the next step. Use the
refCoaddList.py script from the tutorials package to trim the unneeded information and insert the "
--id " command line option:
Run the script:
Acquire the Supporting Data
You will need to download the relevant Stripe 82 data from the SDSS Archive and create a data repository. For this example, the
rawInputs_r.txt file contains the field identifiers of all exposures falling within the region of interest, over 700 SDSS fields. You will also need to install the reference astrometry/photometry catalog.
Disk Space Requirements
The SDSS input data and the output results (and intermediate results) will require 35 GB of disk space on machine where you perform your processing. Most of the space is consumed by an intermediate product, the
The standard flow for Data Release Production processing is:
- Perform basic image processing on the input images, which is very light-weight since good calibrations were performed by the SDSS pipeline. Specifically the DR7 products include the following for each field from which fully qualified calexp images will be constructed:
- fpC - flat-fielded science frames
- fpM - data quality pixel mask
- psField - PSF characterization
- asTrans - initial astrometric calibration
- Perform photometric calibration using the magnitudes of stars in the reference SDSS Stripe 82 Standard Star Catalog of Ivezic et al. 2007.
- Create a deep Co-add image of a region of sky from the calexp images.
- Detect sources on the Co-Add images.
- Run forced-photometry on the input calexp images at the world coordinates of the Co-Add sources.
- Associate astronomical objects from multi-epoch source detections.
- Create a database of the photometric results.
Some of the tasks in datarel, which are needed for some processing steps below, are not yet included in the user's
PATH, which means the full path to the task must be specified. This will change in a future release. In the mean time it is convenient to define an environment variable for the path to these tasks:
SDSS science images (fpC files) must be preprocessed before they can be co-added. First setup multi-shapelet source measurement:
The core processing task is
processCcdSdss.py, which requries some configuration parameters. These configs ensure that the same
apCorrRadius is being used consistently for all steps. Since there are rather a lot of fields to process (over 700 in this example), this task can be run with multiple processes in parallel, assuming your hardware has multiple cores to support it. It is handy for the rest of this tutorial to define an environment variable to denote the number of cores you wish to make available for parallel processing:
Here, processing jobs are started for each input listed in the
rawInputs_r.txt file; the outputs will be stored in the subdirectory
/calexp_dir (which will be created if necessary).
The following task takes ~50 min on a 2006-era iMac with 4 cores and 8 GB of memory, and will consume ~20 GB of disk space to store intermediate files. Do not delete or compress these files until after the forced-photometry step.
The following task takes ~50 min on a 2006-era iMac with 4 cores and 8 GB of memory:
The processing will generate a substantial amount of normal output. If you wish to restrict the output, add the command-line option "
-L <LogLevel>" where
LogLevel can be, in decreasing quantity of output:
INFO (the default),
FATAL. Note that specifying a level other than INFO has the effect of turning off the information that would be written to the
--logdest. Re-direct the output to avoid having lots of output fill your screen.
Create a Results Database
The metadata from the single-frame preprocessing must be loaded into a database so that the
assembleCoadd task has access to the photometric zeropoints calculated in
processSdssCcd. First, create the database:
--host argument with the value of the IP address for a different database server, if an alternative is to be used. For the
lsst-db.ncsa.illinois.edu server, the convention is for the database name to be prepended by the current user's MySQL username (denoted
<name> in the example) to match privilege grant rules. (Also, there are a lot of extant databases on this server so the convention helps to avoid name collisions.) Now create an output directory for the csv files, and start the ingest:
An alternative host may be specified with the
--host option. Since this task generates a lot of output (one line per SDSS field, plus additional information), you might want to re-direct output to a log file as shown.
Create Co-Add Images
The geometry of a Co-Add image is defined by a sky map, which is a set of large tracts (essentially, large exposures) subdivided into patches (which are subregions of approximately the size of a science image). Create the SkyMap using an existing data directory as input and a new directory for output. That new directory then becomes both the input and output for subsequent tasks.
Make the Co-Add Temp Exposures
To create a Co-Add image, the input calexp images must be mapped to the output geometry. To create the input file, insert '
filter=r' into each patch in
With the output geometry defined, the next step is to warp the calexp images to the relevant output patch(es) with
makeCoaddTempExp.py. This is also done in a production:
Assemble the Co-Adds
The next step is to form the Co-Add input images for each patch from the warped calexp's. For the Summer 2013 Data Challenge runs were identified with low non-astrophysical backgrounds to use as references for matching the backgrounds of the input images. They also met the following SDSS quality thresholds for all camcols/filters/fields that covered their respective tracts:
In addition the reference run data were visually inspected to omit images with rapidly varying airglow. Tract0 in the output geometry of the Co-Adds contains 0° < RA < 90°, so run 5823 was selected for odd y-indicies, and run 6955 for rows with even y-indicies. Create the list of patches:
Now run a production to assemble the Co-Add image for each patch. The co-adds are computed as a mean, with background subtracted and outliers rejected:
scaleZeroPoint.selectFluxMag0.host need to be set if a different MySQL database server is used;
select.database should be the SDSS quality database.
The background-matching parameters for the above production are the following:
maxMatchResidualRatio: See table below
maxMatchResidualRMS: See table below
scaleZeroPoint.selectFluxMag0.database: specifies the database of metadata, which includes the zero-points of the flux scale for each field.
For details see S13 production notes. The filter-dependent parameters, based upon a quality analysis for the Summer2013 production, should have the following values:
Create Catalog of Co-Add Sources
Now process the Co-Added images to detect and measure properties of Co-Add sources, using the
processConfig config file (see above):
Because no output is specified, the Co-Add source catalogs will go into the same directory as the Co-Adds. (If you prefer, add
--output /path/to/new/catalog/dir/ to specify a separate directory for the catalogs.) Details of the configuration choices may be found on the LSST wiki.
The next step is to ingest all of the Co-Add sources into the database. Create a staging directory to facilitate the process:
Perform Forced Photometry
The argument parser for the next task,
forcedPhot.py, is different from that of
processSdssCcd.py in that it does not accept the "
rerun" field in the dataIds. The following command will create a new input file without the
rerun field, in preparation for running a production to perform forced photometry:
The tutorials package includes a configuration file (
forcedPhotConfig.py), which will be needed to perform forced photometry (in parallel if possible):
Loading the Science Database
With the image processing completed, only a few final steps are needed to make all the catalogs available in the database. These steps should be run after merging all the result data (sources, coadd [deep] sources, and forced sources).
First, create directories to contain the output:
Now ingest the forced-source photometry (this may take awhile, and will generate a line of output for each field). The following task first converts forced photometry source tables to CSV files suitable for loading into MySQL, then performs the ingest:
As each field is processed, it is typical to see a few to several thousand sources denoted "good." The penultimate task is to associate the Co-Add sources with the single frame sources, using the
sourceAssoc.py task in the ap package:
This will take quite a long time to run (~90 min on the LSST cluster; perhaps several hours on a modest desktop machine) and produce about 5 lines of output for every input field. Now ingest the associated sources into the database:
Finally, use MySQL to enable the database keys, which will make your table much more useful for scientific inquiries. Here, <hostname> is the address of the DB server you have been using,
by default. The SQL statements to enable the keys have the syntax:
They are included in the tutorials package. Execute the SQL:
You may query the Database to view the results of this tutorial. See Example Catalog Queries to inspire your thinking.
For convenience, the sequence of setup commands and task invocations is given below, with minimal intervening explanation: