Skip to end of metadata
Go to start of metadata

This page is currently under development: text may change at any time! The content has not yet been reviewed for accuracy and completeness.

pipe_base is the base package for pipeline tasks. 

Key Design Features

Data are processed by a pipeline task, which is usually a subclass of CmdLineTask, by calling the parseAndRun method. parseAndRun parses the command-line arguments and executes the parsed command. It also persists the configuration used for the task, and in most cases the metadata generated by the task (including timing data). Each task may call subtasks (instances of Task or CmdLineTask) as it sees fit. Data are passed between a parent task and its subtasks by subroutine calls. 

Most tasks have a run method that performs the primary data processing. The run method for a subclass of CmdLineTask typically receives a single butler data reference, but subtasks may use any arguments deemed appropriate. Each task's run method should return a pipe_base Struct; this allows named access to returned data, which provides safer evolution than relying on the order of returned values. 

Some useful general tasks are found in the pipe_tasks package, in the python directory. Tasks meant to be run from the command line (subclasses of CmdLineTask) also have a short script in the bin directory. Package-specific tasks belong in the relevant package, e.g. ip_isr has an IsrTask

Pipeline tasks may process multiple data IDs in parallel, using the multiprocessing library. Support for this is built into the argument parser and the code that runs command-line tasks (lsst.pipe.base.TaskRunner). 

On this page:

Argument Parser

The first argument to a task must be the path to the input repository (or --help for command-line tasks). For example: 

myTask path/to/input -id...  # valid: input path is the first argument
myTask -id ... path/to/input # INVALID: an option comes before the input path

To shorten the inputcalib, and output paths see Environment Variables below. Data are specified as key=value pairs for one of the data identifier arguments (typically ending in "Id"; when there is only one, it will typically be called --id); for details see the next subsection. If there is more than one data identifer argument, then the data identifiers are by default handled independently. 

You may show the config, subtasks and/or data using --show. By default --show quits after printing the information, but --show run allows the task also to run. For example:

--show config data tasks  # shows the config, data and subtasks, and then quits
--show tasks run          # shows the subtasks and then runs the task.

For long or repetitive command lines you may wish to specify some arguments in separate text files. See the next subsection for details.

Specifying Data IDs

The data identifier arguments are used to specify IDs for input and output data. The ID keys depend on the camera and on the data product in question. For example for lsstSim, calibrated exposures are identified by the following keys: visitfilterraft and sensor (and a given visit has exactly one filter). Omit a key to specify all values of that key. For example, for visit number 54123: 

--id visit=54123           # specifies all rafts and sensors
--id visit=54123 raft=1,0  # specifies all sensors for raft 1,0

To specify multiple data IDs you may separate values with ^ (a character that does not have special meaning to the unix command parser). The result is the outer product (all possible combinations). For example: 

-id visit=54123^55523 raft=1,1^2,1

specifies four IDs: visits 54123 and 55523 of rafts (1,1) and (2,1). By default (but depending on the application), you may specify a data identifier argument as many times as you like. Each one is treated independently. Thus the following example specifies all sensors for four combinations of visit and raft, plus all sensors for one raft of two other visits: 

-id visit=54123^55523 raft=1,1^2,1 -id visit=623459^293423 raft=0,

Argument Files

You may specify long or repetitive command-line arguments in text files and reference those files using @path syntax. The contents of the files are identical to the command line, except that long lines must not have a continuation character (\). For example if the file foo.txt contains: 

--id visit=54123^55523 raft=1,1^2,1
--config someParam=someValue --configfile configOverrideFilePath

You can then reference it with @foo.txt and include it with other command-line arguments: inputPath @foo.txt --config anotherParam=anotherValue --output outputPat

Overriding Config

The argument parser automatically loads specific config override files based on the camera name and its obs package. See Automatically Loaded Config Override Files. In addition, you can specify config override files on the command line using –configfile and override some (but not all) config parameters by specifying values on the command line using –config

-config str1=foo str2="fancier string" int1=5 intList=2,4,-87 float1=1.53 floatList=3.14,-5.6e7

where the file contains root.strList = "first string", "second string". There are important limitations on using –config; use a config override file to get around these issues:

  • For items in registries you can only specify values for the active (current) item
  • You cannot specify values for lists of strings
  • You cannot specify a subset of list; you must specify all values at once

Automatically Loading Config Override Files

When a pipeline task is run, two camera-specific configuration overrides are loaded, if found; first one for the obs package then one for the camera. (There are two because in some cases an obs package may contain data for multiple cameras). These files may override configuration parameters or even replace subtasks with camera-specific variants (e.g. for instrument signature removal). The configuration override files are, in order: 

  • obs_path/config/
  • obs_path/config/camera_name/

 where the path elements are:

  • task_name: the name of the pipeline task, e.g. "processCcd"
  • camera_name: the name of the camera used to obtain the image, e.g. "lsstSim"
  • obs_path: the path to the obs package for the camera, e.g. "obs_lsstSim"

Environment Variables

The command parser uses environment variables PIPE_INPUT_ROOT, PIPE_CALIB_ROOT, and PIPE_OUTPUT_ROOT, if available, to make it easier to specify the input, calib and output data repositories. Each environment variable is used as a root directory for relative paths and ignored for absolute paths. The default value for each of these environment variables is the current working directory. For example: 

mytask cameraname foo   # use $PIPE_INPUT_ROOT/foo as the input repository (or ./foo if $PIPE_INPUT_ROOT is undefined)
mytask cameraname .     # use $PIPE_INPUT_ROOT (= $PIPE_INPUT_ROOT/.) as the input repository
mytask cameraname /a/b  # use /a/b as the input repository ($PIPE_INPUT_ROOT is ignored for absolute paths)
  • No labels