This page is currently under development: text may change at any time! The content has not yet been reviewed for accuracy and completeness.
pipe_base is the base package for pipeline tasks.
Key Design Features
Data are processed by a pipeline task, which is usually a subclass of CmdLineTask, by calling the parseAndRun method. parseAndRun parses the command-line arguments and executes the parsed command. It also persists the configuration used for the task, and in most cases the metadata generated by the task (including timing data). Each task may call subtasks (instances of Task or CmdLineTask) as it sees fit. Data are passed between a parent task and its subtasks by subroutine calls.
Most tasks have a run method that performs the primary data processing. The run method for a subclass of CmdLineTask typically receives a single butler data reference, but subtasks may use any arguments deemed appropriate. Each task's run method should return a pipe_base Struct; this allows named access to returned data, which provides safer evolution than relying on the order of returned values.
Some useful general tasks are found in the pipe_tasks package, in the
python directory. Tasks meant to be run from the command line (subclasses of CmdLineTask) also have a short script in the bin directory. Package-specific tasks belong in the relevant package, e.g. ip_isr has an IsrTask.
Pipeline tasks may process multiple data IDs in parallel, using the multiprocessing library. Support for this is built into the argument parser and the code that runs command-line tasks (
The first argument to a task must be the path to the input repository (or -
-help for command-line tasks). For example:
To shorten the
output paths see Environment Variables below. Data are specified as
key=value pairs for one of the data identifier arguments (typically ending in "
Id"; when there is only one, it will typically be called
--id); for details see the next subsection. If there is more than one data identifer argument, then the data identifiers are by default handled independently.
You may show the config, subtasks and/or data using
--show. By default
--show quits after printing the information, but
--show run allows the task also to run. For example:
For long or repetitive command lines you may wish to specify some arguments in separate text files. See the next subsection for details.
Specifying Data IDs
The data identifier arguments are used to specify IDs for input and output data. The ID keys depend on the camera and on the data product in question. For example for lsstSim, calibrated exposures are identified by the following keys:
sensor (and a given visit has exactly one filter). Omit a key to specify all values of that key. For example, for visit number 54123:
To specify multiple data IDs you may separate values with ^ (a character that does not have special meaning to the unix command parser). The result is the outer product (all possible combinations). For example:
specifies four IDs: visits 54123 and 55523 of rafts (1,1) and (2,1). By default (but depending on the application), you may specify a data identifier argument as many times as you like. Each one is treated independently. Thus the following example specifies all sensors for four combinations of visit and raft, plus all sensors for one raft of two other visits:
You may specify long or repetitive command-line arguments in text files and reference those files using
@path syntax. The contents of the files are identical to the command line, except that long lines must not have a continuation character (\). For example if the file
You can then reference it with
@foo.txt and include it with other command-line arguments:
The argument parser automatically loads specific config override files based on the camera name and its obs package. See Automatically Loaded Config Override Files. In addition, you can specify config override files on the command line using
–configfile and override some (but not all) config parameters by specifying values on the command line using
where the file
root.strList = "first string", "second string". There are important limitations on using
–config; use a config override file to get around these issues:
- For items in registries you can only specify values for the active (current) item
- You cannot specify values for lists of strings
- You cannot specify a subset of list; you must specify all values at once
Automatically Loading Config Override Files
When a pipeline task is run, two camera-specific configuration overrides are loaded, if found; first one for the obs package then one for the camera. (There are two because in some cases an obs package may contain data for multiple cameras). These files may override configuration parameters or even replace subtasks with camera-specific variants (e.g. for instrument signature removal). The configuration override files are, in order:
where the path elements are:
- task_name: the name of the pipeline task, e.g. "processCcd"
- camera_name: the name of the camera used to obtain the image, e.g. "
- obs_path: the path to the
obspackage for the camera, e.g. "
The command parser uses environment variables
PIPE_OUTPUT_ROOT, if available, to make it easier to specify the
output data repositories. Each environment variable is used as a root directory for relative paths and ignored for absolute paths. The default value for each of these environment variables is the current working directory. For example: