Skip to end of metadata
Go to start of metadata

The SDSS images from Stripe 82 have been used in Data Release Productions (DRPs) to assess the scientific performance of the LSST Stack. Scientist users may find re-creating a portion of the SDSS Data DRPs to be valuable for their science as well. This tutorial demonstrates how to do that for a small portion of Stripe 82 in the r-band, and in the process illustrates several concepts that apply to processing general astronomical imaging data with the LSST Stack, including parallel processing, the creation of Co-Add images, forced photometry, source association, and the ingest of catalogs into a database.

The SDSS Stripe 82 DRP covered a 2.5° wide Stripe, centered on the Celestial equator, from -40° < RA < 55°, in all 5 survey passbands. This tutorial will show how to process a small sub-region of SDSS Stripe 82 in the r-band; the extension to other bands is straightforward. Processing the data requires some preparation, as described below. 


In This Tutorial


This tutorial was adapted from instructions for executing the Summer 2013 Data Release Production (DRP) on XSEDE platforms, and test runs on the LSST development cluster. This particular data challenge used a database backend to select images based on quality (i.e. seeing, airmass, quality flag), store and retrieve calculated zeropoints, and store and retrieve centroids for forced photometry. The preferred way to retrieve these data is through the Butler. If you are using a more recent version of the stack, a database is unnecessary to make co-adds and perform forced photometry.

To follow this particular demo, however, you must have write access to a database server (default is NCSA).  The database server must contain the input SDSS SeasonFieldQuality_Test table; the server will also be used to create the output catalogs of results. If you do not have write access to the NCSA database or wish to run locally, please follow the instructions for setting up a local database server compatible with this demo at Setup a Database for Stripe82 Demo.

You must create a policy file under your home directory, in a sub-directory called $HOME/.lsst. The first is a DB authorization file to present login credentials to whatever DB server is being used. To do this, create a $HOME/.lsst/db-auth.paf file with the following content: 

Content of db-auth.paf policy file
database: {
    authInfo: {
        port: 3306
        user: <your mysql user name>
        password: <your mysql password>
    authInfo: {
        port: 3306
        user: <your mysql user name>
        password: <your mysql password>

If you are using a local database server, adapt to point to localhost and your local port number. Authorization information for more than one DB server may be included, if relevant. 

The directory must have 700 permissions and .db-auth.paf must each have 600 permissions (go-rwx in all cases). 

Initial Steps 

Load the LSST Environment

You must have the LSST Stack installed on your system (see LSST Stack Installation) to proceed. The commands listed in the code blocks below primarily assume you are using the bash shell; analogous commands for (t)csh should work as well. If you have not already done so, load the LSST environment:

source $INSTALL_DIR/loadLSST.bash          # bash users

where $INSTALL_DIR is the directory where the LSST Stack was installed. 

Setup the packages necessary for processing data:  

setup obs_sdss -t Winter2014      # tasks for SDSS-specific data management
setup pipe_tasks -t Winter2014    # tasks ffor pipeline excecution
setup datarel -t Winter2014       # tasks for DB management & parallel processing

It is convenient to define an environment variable for the path to your local processing directory: 

export DEMO_DIR=/path/to/your/processing/directory

Download Tutorials Package

This tutorial makes use of some utilities in the tutorials package. Fetch it from the LSST source code repository to a working directory, and set up an environment variable: 

cd /path/to/install/directory
git clone
export TUT_DIR=$PWD/tutorials/sdssDrpTutorial

Create a Repository

The first step is to create a data repository, which will contain the input images, a camera-specific mapper of files in the repository, and a registry of metadata in the repository. First, create the directory structure and define an environment variable for the location of the respository: 

mkdir Stripe82
cd Stripe82 && mkdir runs
export DATA_DIR=$DEMO_DIR/Stripe82/runs

The mapper is specific to a particular camera and is specified in a file which, for SDSS image data, consists of the following single line of content:

echo "lsst.obs.sdss.sdssMapper.SdssMapper" > runs/_mapper

Define the Region of Interest 

Create a SkyMap

Create a SkyMap, which is the mapping from the sky region of interest to the output geometry of the Co-Adds. In this example the patches will cover the only a portion of the declination range, but the geometry itself is defined over the full range: -1.2° to +1.2° which spans 12 patches. The output will appear in a scratch directory, tempSkyMap_dir, which will be created by the task if needed. 

  --config,1.2 \,1907 \ \ \
  --output tempSkyMap_dir

Given this SkyMap, the next step is to identify those patches that cover it in the range of interest 5.0° < RA < 5.2° and 0.05° < Dec +0.25°, using the task in the pipe_tasks package (a small boundary will be added; this selection anticipates input fields from only two camcols). The results are stored in an ASCII file: tempSkyMap_dir \
  --config raDecRange="5.0, 0.05, 5.2, 0.25" \
  --id tract=0 patch=0,0 > patches.txt

Although the patches do happen to reside in the SkyMap Tract=0, the required --id argument is not actually used by

Identify Images to Process

The next step is to generate a list of r-band images that overlap the selected region of the SkyMap, using task in the pipe_tasks package.

This script makes use of the SeasonFieldQuality_Test table in a mysql database. 

Both the N and S strips are selected. No images are excluded on the basis of the SDSS (weather) quality metric, so that they will be available in a later step for forced-source photometry tempSkyMap_dir \
  --config raDecRange="5.0, 0.05, 5.2, 0.25" \
  select.strip=Both \
  select.quality=None select.rejectWholeRuns=False \
  showImageIds=True \
  --id filter=r > rawInputs.txt 
# For other filters, use the same task and arguments above, except e.g.: 
  --id filter=g > rawInputs.txt

The parameters and select.database may be specified in the above task to point to a different database server than the default (on the LSST cluster). However, the selected server must contain a copy of the SDSS SeasonFieldQuality_Test table.

Regrettably, the output files rawInputs.txt do not exactly match the expected format for the next step. Use the script from the tutorials package to trim the unneeded information and insert the "--id " command line option: 

Run the script:

$TUT_DIR/python/ rawInputs.txt > rawInputs_r.txt
cp rawInputs_r.txt ./Stripe82       # This list will be needed to download data from SDSS

Acquire the Supporting Data

You will need to download the relevant Stripe 82 data from the SDSS Archive and create a data repository. For this example, the rawInputs_r.txt file contains the field identifiers of all exposures falling within the region of interest, over 700 SDSS fields. You will also need to install the reference astrometry/photometry catalog. 

Disk Space Requirements

The SDSS input data and the output results (and intermediate results) will require 35 GB of disk space on machine where you perform your processing. Most of the space is consumed by an intermediate product, the calexp images.


The standard flow for Data Release Production processing is:

  1. Perform basic image processing on the input images, which is very light-weight since good calibrations were performed by the SDSS pipeline. Specifically the DR7 products include the following for each field from which fully qualified calexp images will be constructed:
    1. fpC - flat-fielded science frames
    2. fpM - data quality pixel mask
    3. psField - PSF characterization
    4. asTrans - initial astrometric calibration 
  2. Perform photometric calibration using the magnitudes of stars in the reference SDSS Stripe 82 Standard Star Catalog of Ivezic et al. 2007
  3. Create a deep Co-add image of a region of sky from the calexp images. 
  4. Detect sources on the Co-Add images
  5. Run forced-photometry on the input calexp images at the world coordinates of the Co-Add sources
  6. Associate astronomical objects from multi-epoch source detections. 
  7. Create a database of the photometric results.

Pending Update

Some of the tasks in datarel, which are needed for some processing steps below, are not yet included in the user's PATH, which means the full path to the task must be specified. This will change in a future release. In the mean time it is convenient to define an environment variable for the path to these tasks: 

export DR_PATH=$DATAREL_DIR/bin/ingest 

Single-Frame Measurement

SDSS science images (fpC files) must be preprocessed before they can be co-added. First setup multi-shapelet source measurement: 

setup meas_extensions_multiShapelet --keep 

The core processing task is, which requries some configuration parameters. These configs ensure that the same apCorrRadius is being used consistently for all steps. Since there are rather a lot of fields to process (over 700 in this example), this task can be run with multiple processes in parallel, assuming your hardware has multiple cores to support it. It is handy for the rest of this tutorial to define an environment variable to denote the number of cores you wish to make available for parallel processing: 

export NCORES=$((sysctl -n hw.ncpu || (test -r /proc/cpuinfo && grep processor /proc/cpuinfo | wc -l) || echo 2) 2>/dev/null)

Here, processing jobs are started for each input listed in the rawInputs_r.txt file; the outputs will be stored in the subdirectory /calexp_dir (which will be created if necessary).

The following task takes ~50 min on a 2006-era iMac with 4 cores and 8 GB of memory, and will consume ~20 GB of disk space to store intermediate files. Do not delete or compress these files until after the forced-photometry step.

The following task takes ~50 min on a 2006-era iMac with 4 cores and 8 GB of memory: $DATA_DIR/ \
  --output $DEMO_DIR/calexp_dir \
  --configfile $TUT_DIR/config/ \
  @rawInputs_r.txt \
  -j $NCORES \
  --logdest procCcdLog.txt 

The processing will generate a substantial amount of normal output. If you wish to restrict the output, add the command-line option "-L <LogLevel>" where LogLevel can be, in decreasing quantity of output: INFO (the default), WARN, or FATAL. Note that specifying a level other than INFO has the effect of turning off the information that would be written to the --logdest. Re-direct the output to avoid having lots of output fill your screen. 

Create a Results Database

The metadata from the single-frame preprocessing must be loaded into a database so that the assembleCoadd task has access to the photometric zeropoints calculated in processSdssCcd. First, create the database: 

export DB_NAME="<name>_Stripe82_demo"    # handy handle for the DB name
$DR_PATH/ $DB_NAME --camera=sdss

Add a --host argument with the value of the IP address for a different database server, if an alternative is to be used. For the server, the convention is for the database name to be prepended by the current user's MySQL username (denoted <name> in the example) to match privilege grant rules. (Also, there are a lot of extant databases on this server so the convention helps to avoid name collisions.) Now create an output directory for the csv files, and start the ingest: 

mkdir $DEMO_DIR/ingestProcessed_csv_dir 
  --database=$DB_NAME \
  $DEMO_DIR/ingestProcessed_csv_dir \
  $DEMO_DIR/calexp_dir \
  --camera=sdss >& ingestProcessed_csv_log.txt

An alternative host may be specified with the --host option. Since this task generates a lot of output (one line per SDSS field, plus additional information), you might want to re-direct output to a log file as shown. 

Create Co-Add Images

The geometry of a Co-Add image is defined by a sky map, which is a set of large tracts (essentially, large exposures) subdivided into patches (which are subregions of approximately the size of a science image). Create the SkyMap using an existing data directory as input and a new directory for output. That new directory then becomes both the input and output for subsequent tasks. 

Make the Co-Add Temp Exposures

To create a Co-Add image, the input calexp images must be mapped to the output geometry. To create the input file, insert 'filter=r' into each patch in patches.txt.

sed -e 's/^--id /filter=r /' patches.txt > patches_r.txt $DEMO_DIR/calexp_dir \
  --config,1.2 \,1907 \ \ \
  --output coadd_r_dir

With the output geometry defined, the next step is to warp the calexp images to the relevant output patch(es) with This is also done in a production: $DEMO_DIR/calexp_dir \
  --config bgSubtracted=False \
  warpAndPsfMatch.warp.warpingKernelName='lanczos4' \
  warpAndPsfMatch.warp.cacheSize=0 \
  --output $DEMO_DIR/coadd_r_dir \
  @$DEMO_DIR/patches_r.txt \
  -j $NCORES >& mkCoaddExp_log.txt

Assemble the Co-Adds

The next step is to form the Co-Add input images for each patch from the warped calexp's. For the Summer 2013 Data Challenge runs were identified with low non-astrophysical backgrounds to use as references for matching the backgrounds of the input images. They also met the following SDSS quality thresholds for all camcols/filters/fields that covered their respective tracts

  •  quality > 2
  •  isblacklisted=0
  •  psfWidth < 2

In addition the reference run data were visually inspected to omit images with rapidly varying airglow. Tract0 in the output geometry of the Co-Adds contains 0° < RA < 90°, so run 5823 was selected for odd y-indicies, and run 6955 for rows with even y-indicies. Create the list of patches: 

sed -e 's/\([13579]$\)/\1 run=5823/' patches_r.txt | sed -e 's/\([02468]$\)/\1 run=6955/' > patches_r_runs.txt

Now run a production to assemble the Co-Add image for each patch. The co-adds are computed as a mean, with background subtracted and outliers rejected: $DEMO_DIR/coadd_r_dir \
  --config maxMatchResidualRatio=1.2 maxMatchResidualRMS=0.5 scaleZeroPoint.selectFluxMag0.database=$DB_NAME \
  @$DEMO_DIR/patches_r_runs.txt \
  -j $NCORES >& assembleCoadd_log.txt

Note that and need to be set if a different MySQL database server is used; select.database should be the SDSS quality database. 

 The background-matching parameters for the above production are the following: 

  • maxMatchResidualRatio: See table below
  • maxMatchResidualRMS: See table below
  • scaleZeroPoint.selectFluxMag0.database: specifies the database of metadata, which includes the zero-points of the flux scale for each field. 

For details see S13 production notes. The filter-dependent parameters, based upon a quality analysis for the Summer2013 production, should have the following values: 


Create Catalog of Co-Add Sources

Now process the Co-Added images to detect and measure properties of Co-Add sources, using the processConfig config file (see above): $DEMO_DIR/coadd_r_dir --configfile $TUT_DIR/config/ \
  @$DEMO_DIR/patches_r.txt \
  -j $NCORES

Because no output is specified, the Co-Add source catalogs will go into the same directory as the Co-Adds. (If you prefer, add --output /path/to/new/catalog/dir/ to specify a separate directory for the catalogs.) Details of the configuration choices may be found on the LSST wiki

The next step is to ingest all of the Co-Add sources into the database. Create a staging directory to facilitate the process: 

mkdir $DEMO_DIR/ingestCoadd_r_csv_dir
$DR_PATH/ --camera=sdss \
  --database=$DB_NAME \
  $DEMO_DIR/ingestCoadd_r_csv_dir $DEMO_DIR/coadd_r_dir \
  --coadd-names=deep \   # we only want to ingest the Deep Co-Add type

Perform Forced Photometry

The argument parser for the next task,, is different from that of in that it does not accept the "rerun" field in the dataIds. The following command will create a new input file without the rerun field, in preparation for running a production to perform forced photometry: 

cut -d " " -f1,2,3,4,6 rawInputs_r.txt > forcedPhotInputs_r.txt

The tutorials package includes a configuration file (, which will be needed to perform forced photometry (in parallel if possible): $DEMO_DIR/calexp_dir --output $DEMO_DIR/forcedPhot_dir --configfile $TUT_DIR/config/ \
  --config references.dbName=$DB_NAME references.filterName=r \
  @$DEMO_DIR/forcedPhotInputs_r.txt \
  -j $NCORES >& forcedPhot_log.txt

Loading the Science Database

With the image processing completed, only a few final steps are needed to make all the catalogs available in the database. These steps should be run after merging all the result data (sources, coadd [deep] sources, and forced sources).

First, create directories to contain the output:

mkdir $DEMO_DIR/forcedPhot_csv_dir
mkdir $DEMO_DIR/sourceAssoc-csv

Now ingest the forced-source photometry (this may take awhile, and will generate a line of output for each field). The following task first converts forced photometry source tables to CSV files suitable for loading into MySQL, then performs the ingest:  

$DR_PATH/ --camera=sdss \
  --database=$DB_NAME \
  --coadd-name=deep \
  --create-views \
  $DEMO_DIR/forcedPhot_csv_dir $DEMO_DIR/forcedPhot_dir >& ingestForcedSource_log.txt

As each field is processed, it is typical to see a few to several thousand sources denoted "good." The penultimate task is to associate the Co-Add sources with the single frame sources, using the task in the ap package: $DEMO_DIR/coadd_r_dir \
  --output $DEMO_DIR/sourceAssoc_dir \
  --config measSlots.modelFlux='flux.gaussian' >& sourceAssoc_log.txt

This will take quite a long time to run (~90 min on the LSST cluster; perhaps several hours on a modest desktop machine) and produce about 5 lines of output for every input field. Now ingest the associated sources into the database: 

  --camera sdss \
  --create-views \
  -j $NCORES \
  $DEMO_DIR/sourceAssoc-csv $DEMO_DIR/sourceAssoc_dir  >& ingestSourceAssoc_log.txt

Finally, use MySQL to enable the database keys, which will make your table much more useful for scientific inquiries. Here, <hostname> is the address of the DB server you have been using, lsst-db.ncsa.illinois.eduby default. The SQL statements to enable the keys have the syntax: 


They are included in the tutorials package. Execute the SQL: 

mysql $DB_NAME -h <hostname> -u <user> -p < $TUT_DIR/enable_keys.sql

You may query the Database to view the results of this tutorial. See Example Catalog Queries to inspire your thinking. 


For convenience, the sequence of setup commands and task invocations is given below, with minimal intervening explanation: 

Summary of DRP processing commands
# Processing Command Summary
# Tutorial: Co-Add and forced-photometry of a small portion of Stripe 82
# Load the environment, set up packages for processing:
source /path/to/your/lsstInstallDir/      # modify for your installation
setup obs_sdss -t Winter2014
setup pipe_tasks -t Winter2014
setup datarel -t Winter2014
# Be sure to define the path to the processing directory on your system:
export DEMO_DIR=/path/to/your/working/directory
# Fetch the tutorials package
cd /path/to/install/directory
git clone
export TUT_DIR=$PWD/tutorials/sdssDrpTutorial

# Create a repository:
mkdir Stripe82
cd Stripe82 && mkdir runs
export DATA_DIR=$DEMO_DIR/Stripe82/runs
echo "lsst.obs.sdss.sdssMapper.SdssMapper" > runs/_mapper
# Create a SkyMap
cd $DEMO_DIR $DATA_DIR/ --config,1.2,1907 --output tempSkyMap_dir

# Identify patches that cover the SkyMap for the RA/Dec range: tempSkyMap_dir --config raDecRange="5.0, 0.05, 5.2, 0.25" --id tract=0 patch=0,0 > patches.txt

# Identify SDSS fields to process (finds 715 exposures, of which 702 are retained): tempSkyMap_dir --config raDecRange="5.0, 0.05, 5.2, 0.25" select.strip=Both select.quality=None select.rejectWholeRuns=False showImageIds=True --id filter=r > rawInputs.txt

# Reformat the output of the rawInputs file:
$TUT_DIR/python/ rawInputs.txt > rawInputs_r.txt
cp rawInputs_r.txt ./Stripe82

## Acquire the data:
# Install the Astrometry Catalog
curl -O 
tar xzf sdss-2012-05-01-0.tgz 
eups declare -r sdss-2012-05-01-0 astrometry_net_data sdss-2012-05-01-0
setup astrometry_net_data sdss-2012-05-01-0 --keep

# Retrieve SDSS images:
cd $DEMO_DIR/Stripe82
$TUT_DIR/python/getRetrieveList rawInputs_r.txt retrieve.txt
wget -r -b -R "index.html*" -np -nH --cut-dirs=1 -P ./runs -i retrieve.txt

# Create a registry: runs
mv registry.sqlite3 ./runs

# Set environment variable for number of cores to use in parallel processing:
export NCORES=$((sysctl -n hw.ncpu || (test -r /proc/cpuinfo && grep processor /proc/cpuinfo | wc -l) || echo 2) 2>/dev/null)
# Process the input SDSS images:
setup meas_extensions_multiShapelet --keep $DATA_DIR/ --output $DEMO_DIR/calexp_dir --configfile $TUT_DIR/config/ @rawInputs_r.txt -j $NCORES --logdest procCcdSdss_log.txt

# Prepare DB for ingest; be sure to select your own DB name, prepended by your MySQL username:
export DR_PATH=$DATAREL_DIR/bin/ingest
export DB_NAME=<username>_Stripe82_demo2
$DR_PATH/ $DB_NAME --camera=sdss

# Ingest metadata from the calexp's
mkdir $DEMO_DIR/ingestProcessed_csv_dir
$DR_PATH/ --database=$DB_NAME $DEMO_DIR/ingestProcessed_csv_dir $DEMO_DIR/calexp_dir  --camera=sdss >& ingestProcessed_csv_log.txt

# Create the mapping from SDSS fields to the output patches:
sed -e 's/^/--id filter=r /' patches.txt > patches_r.txt $DEMO_DIR/calexp_dir --config,0.25,1907 --output coadd_r_dir

# Warp the calexp's to the output geometry, and store the temporary exposures: $DEMO_DIR/calexp_dir --config bgSubtracted=False warpAndPsfMatch.warp.warpingKernelName="lanczos4" warpAndPsfMatch.warp.cacheSize=0 --output $DEMO_DIR/coadd_r_dir @$DEMO_DIR/patches_r.txt -j $NCORES >& mkCoaddExp_log.txt

# Assemble the Co-Adds
sed -e 's/\([13579]$\)/\1 run=5823/' patches_r.txt | sed -e 's/\([02468]$\)/\1 run=6955/' > patches_r_runs.txt $DEMO_DIR/coadd_r_dir --config maxMatchResidualRatio=1.2 maxMatchResidualRMS=0.5 scaleZeroPoint.selectFluxMag0.database=$DB_NAME @$DEMO_DIR/patches_r_runs.txt -j $NCORES >& assembleCoadd_log.txt

# Process the Co-Adds to measure sources $DEMO_DIR/coadd_r_dir --configfile $TUT_DIR/config/ @$DEMO_DIR/patches_r.txt -j $NCORES >& processCoadd_log.txt

# Ingest all of the Co-Add sources into the DB. 
mkdir $DEMO_DIR/ingestCoadd_r_csv_dir
$DR_PATH/ --camera=sdss --database=$DB_NAME $DEMO_DIR/ingestCoadd_r_csv_dir $DEMO_DIR/coadd_r_dir --coadd-names=deep --create-views 

# Perform forced photometry
cut -d " " -f1,2,3,4,6 rawInputs_r.txt > forcedPhotInputs_r.txt $DEMO_DIR/calexp_dir --output $DEMO_DIR/forcedPhot_dir --configfile $TUT_DIR/config/ --config references.dbName=$DB_NAME references.filterName=r @$DEMO_DIR/forcedPhotInputs_r.txt -j $NCORES >& forcedPhot_log.txt

# Create Science database
mkdir $DEMO_DIR/forcedPhot_csv_dir
mkdir $DEMO_DIR/sourceAssoc-csv
$DR_PATH/ --camera=sdss --database=$DB_NAME --coadd-name=deep --create-views $DEMO_DIR/forcedPhot_csv_dir $DEMO_DIR/forcedPhot_dir >& ingestForcedSource_log.txt

# Associate the Co-Add sources with the single-frame sources $DEMO_DIR/coadd_r_dir --output $DEMO_DIR/sourceAssoc_dir >& sourceAssoc_log.txt
# Ingest the associated sources into the DB
$DR_PATH/ -d $DB_NAME --camera sdss --create-views -j $NCORES $DEMO_DIR/sourceAssoc-csv $DEMO_DIR/sourceAssoc_dir >& ingestSourceAssoc_log.txt
# Enable the table keys:
mysql $DB_NAME -h -u <user> -p < $TUT_DIR/enable_keys.sql
  • No labels