Directory Structure

├── bin # Shell scripts
├── bin.src # Command line python scripts
├── docs # Non-API documentation
├── pipelines # YAML pipeline definition files (may have subdirs)
├── python # All tested python code lives here
│ └── lsst
│ └── faro
│ ├── summary # Tasks that summarize other metric measurements
│ ├── base # Base classes that others can inherit from
│ ├── preparation # Tasks that prepare input for other downstream tasks
│ ├── measurement # Tasks that compute metric measurements
│ ├── scripts # Code invoked by command line python scripts in the bin directory
│ └── utils # Utility code used by other classes
├── tests # Unit tests for code under the python directory
│ └── data # Data serving unit tests
└── ups # EUPS directory

Class Name Conventions

Connections classes consist of the name of the measurement class they are associated with, plus "Connections" (e.g., MatchedCatalogTaskConnections is the connections class for MatchedCatalogTask).

Always use the singular form of the dataset type in names. For example, use "Catalog" instead of "Catalogs", "Tract" instead of "Tracts."

Instead of using "MatchedCatalog" in class names, specify what  has been matched (e.g., MatchedTract, MatchedPatch  or MatchedVisit ).

Measurement tasks:

Measurement tasks and their associated config classes have names such as MetricnameTask/MetricnameTaskConfig (e.g., PA1Task/PA1TaskConfig for metric PA1). The names are CamelCase with the first letter capitalized.

  • Classes that can calculate multiple metrics by means of different configs use names like AMxTask, where "x" denotes that multiple metrics (AM1, AM2, AM3; i.e., x=1, 2, or 3) can be calculated from the class AMxTask. (This is mostly going to apply to KPMs that have defined names such as AM1, ABF1, etc.)

The name of the module containing the class definitions is CamelCase with leading uppercase (e.g., Modules group classes by the dataset type on which metrics will be calculated, and the dataset type should appear in the name. For example, contains measurement tasks that are designed to operate on matched catalogs from visits, while contains tasks to calculate metrics per visit.

Catalog generation tasks:

Tasks that create/compile the datasets to be passed to measurement tasks currently have names like MatchedCatalogTractTask (and associated Config and Connections classes).

I propose rethinking this naming for 2 reasons:

  1. to capture the action that is performed by the task,
  2. to shorten the names a bit

For example, MatchedCatalogTractTask performs the matching of catalogs from coadds at the tract scale. Because "matching" implies that catalogs are being combined, we can get rid of "Catalog" from the name (to address point #2 above). To address point #1, start the name with the action that is being performed (i.e., "match") – perhaps something like MatchTractTask ?

Aggregation tasks:

Aggregation tasks have names of the form DatasetTypeAggregationTask (e.g., MatchedCatalogsAggregationTask).

Because these class names can get very long, I propose the following changes:

  1. Get rid of the implied "catalogs",
  2. Shorten "Aggregation" to "Agg"

For example, instead of MatchedCatalogsTractAggregationTask , use MatchedTractAggTask .

Black: summary as currently implemented (21 January 2021)

Blue: proposed

Data UnitExample Metrics

Inputs (Assembly)

Preparation (Prep)

Analysis (Analysis)

Measurement (Meas)


Aggregation (Agg)

Summary / (Roll-up)

Generic catalog analysis (used as base classes)number sources

CatalogAssemblyBaseClass (??)





Matched sources analysis within patch, single-band

Base: MatchedCatalogTask(MatchedBaseTask)


Base: MatchedCatalogAnalysisTask(CatalogAnalysisBaseTask)


Base: MatchedCatalogsAggregationTask(CatalogsAggregationBaseTask)


Matches sources within patch, multi-band

Base: MatchedCatalogMultiTask(MatchedBaseTask)


Base: MatchedMultiCatalogAnalysisTask(CatalogAnalysisBaseTask)


(not implemented??)


Matched sources within tract, single-band

Base: MatchedCatalogTractTask(MatchedTractBaseTask)


Base: MatchedCatalogTractAnalysisTask(CatalogAnalysisBaseTask)


Base: MatchedCatalogsTractAggregationTask(CatalogsAggregationBaseTask)


Matched sources within tract, multi-band

(not implemented??)


(not implemented??)


(not implemented??)


Sources within visit

Base: VisitAnalysisTask(CatalogAnalysisBaseTask)


Base: VisitAggregationTask


Objects within patch

Base: PatchAnalysisTask(CatalogAnalysisBaseTask)


Base: PatchAggregationTask(CatalogsAggregationBaseTask)


objects within patch, multi-band
Objects within tract, single-band

Base: TractAnalysisTask(CatalogAnalysisBaseTask)


(not implemented??)


Objects within tract, multi-bandstellar locus widthN/A

Base: TractAnalysisMultiFiltTask(TractAnalysisTask)


(not implemented??)


DIA sources (per-visit?)

DIA objects (not sure the partitioning, per-patch?)

Solar System Objects (not sure the partitioning, per-patch?)

Single-visit imageghost image controlN/AVisitImageMeasTask
Coadd image, single-band (per-patch?)ghost image controlN/APatchImageMeasTask
Coadd image, multi-band (per-patch?)

Template image (subset of coadd image??)

Injected sources per visit (subset of sources within visit??)transfer function for individual visits

Injected sources within patch / tract (subset of objects within patch / tract)transfer function for coadd

Injected sources DIA (subset of DIA??)transfer function for DIA

Injected sources SSO (subset of SSO??)transfer function for SSO

Map (per-dataset?)coverage map, survey propertiesN/AMapMeasTask (??)

Database Query (per-dataset?)random sampling of source or object tablesN/AQueryMeasTask (??)
Calibration Products (per-dataset?)filter bandpass performance

Anomalies (21 January 2020):

  • TractAnalysisMultiFiltTask
  • MchCatTractAggTaskConnections
  • MatchedCatalogMultiTask, MatchedMultiCatalogAnalysisTask

Intermediate Data Products

Current dataset types:



  • Are these the intermediate data product names we want? Consistency with class names, etc.

Metric Name Conventions

KPMs should use the short names that are assigned to them in requirements documents (e.g., "PA1" or "GhostAF").

Metrics that are not KPMs should have names that are descriptive of what they are meant to capture. These should be camelCase with leading lower-case. (A made-up example could be something like "colorOutlierFrac.") Note: abbreviated phrases is encouraged for brevity (as with "Frac" in the previous example). 

`faro` names each dataset type as "metricvalue_"+`connections.package`+`connections.metric`, where the .package and .metric are defined in the pipeline yaml. The "package" is intended to specify what metrics package from `verify_metrics` contains this metric's definition (e.g., `validate_drp`). To facilitate comparison to these metric definitions, many of the `connections.metric` names contain things like "_design" to specify the level to compare against (i.e., min/design/stretch goals).

Current dataset types:


Note that  for a given dataset type, the measurements in different units of data (e.g., tract, band) are distinguished by their data id. One can find the dimensions of a dataset type as follows:


DimensionGraph({band, instrument, skymap, tract})


  • Do we need the package name in the dataset type name?
  • Do we need design/stretch etc. in dataset type name?
  • How do we want to distinguish the aggregated results from the "per-unit-data" results?
    • Having a convention for the prefix will be important. For now, we consider "metricvalue_[granular]_" and "metricvalue_summary_". It would be helpful to have distinct prefix for the granular and summary metric values. Not sure what is best name for "granular" metric values.
  • How do we want to distinguish the same metric, but different units of data, e.g., patch vs. tract?
  • How do we want to indicate instances of the same metric, but with different configurations?

Pipeline Name Conventions

There are three "stages" to calculating metrics with `faro`: generating/compiling the input data, measuring the metric (i.e., "analysis"), and aggregating the measured values. A full pipeline can then chain any number of these steps as needed.

metrics_pipeline_*.yaml: A pipeline consisting of calls to other pipelines (via "imports") that perform the separate stages of metric calculation begins with the phrase `metrics_pipeline`. An example would be `metrics_pipeline_matched.yaml`, which executes `gen_inputs_matched.yaml`, `analysis_matched.yaml`, and `agg_matched.yaml`.

If, as in the above example, the pipeline operates on a specific type of dataset or calculates particular type of metric, this should be made clear in the pipeline name (to the extent possible; in this example, the metrics being calculated are all based on matched visits, so it is named `*_matched.yaml`). This is done for all types of pipelines so that it is clear which ones may need to be included together in a `metrics_pipeline`. 

gen_inputs_*.yaml: The name of a pipeline that gathers the data to perform measurements on begins with `gen_inputs` (short for "generate" inputs).  assembly_*.yaml

analysis_*.yaml: The "analysis" pipelines are the ones that execute the metric measurements (i.e., perform analysis tasks).

agg_*.yaml: "agg" is short for aggregation - these are the pipelines that aggregate values into rolled-up, summary metrics.

Utility function name conventions

Utility function names are all lower case (and snake_case if needed). They should, to the extent possible, explain what the function is designed to return. For example, `` is a module containing many photometric repeatability routines. Within this module, the functions have names such as `calc_phot_repeat`.

NOTE: This is not currently true of most of the functions adapted from validate_drp – most of those have camelCase names such as `calcPhotRepeat`. 


What is the purpose of in metric_pipeline_tasks ? It doesn't seem to be used.


  • The three steps of metric measurement are assembly → analysis → aggregation. (We're proposing "assembly" instead of "gen_inputs".)
    • A: Proposal is to use: Preparation, Measurement, Summary
  • Change Multi → XBand ("cross-band"; i.e., multiple filters)?
    • A: E.g. AB2. Proposal is to use MultiBand
  • In dataset type definitions, do we need the metrics package name?  [Are all metrics required to have a definition in a .yaml metrics definition file somewhere (as most currently do in `verify_metrics`)?]

    • A: Yes, we would like to retain this convention but we want to update the metrics package name to be pointing more towards function than implementation (e.g., associated with a requirements document)
    • e.g validate_drp.yaml → srd_performance.yaml. And separate out the non-normative metrics in validate_drp.yaml into (say) dm_metrics.yaml
  • In dataset type definitions, do we need design/stretch, etc.? [This typically won't change how the metric is measured. Could this instead be packaged in the Measurement object itself?]

    • A: We would like to avoid having the specification details in the dataset type name. Note that there are some requirements that have different thresholds for minimum, design, and stretch, and hence the measurement is done in a different way.
  • In dataset type definitions, what conventions do we want for the prefix? For the summary metrics?
    • A:

Progress tracking:

  • lsst/faro/base/
  • lsst/faro/measurement/
  • lsst/faro/preparation/
  • lsst/faro/scripts/
  • lsst/faro/summary/
  • lsst/faro/utils/
  • pipelines


  • The file lsst/faro/measurement/ contains some classes that are Summary tasks rather than Measurement tasks. Propose that we create a new file in the summary directory to hold these general summary tasks, lsst/faro/measurement/
  • In lsst/faro/measurement/, NumpyAggTask names the output Measurement with metricvalue_aggname_package_metric. Do we want this naming behavior? (Is this task used anywhere? If not, maybe it should be? It seems to be a generally useful task.)

  • Not sure what to do here: we moved the “Analysis” tasks (e.g., into lsst/faro/preparation . Our proposal was to rename “Analysis” to “Meas”. But we already have in lsst/faro/measurement (it’s the one that contains, e.g., PA1Task.) This potential naming ambiguity will be confusing. Should we rename the one that defines the particular KPM measurements (in measurements)? We could call it MatchedCatalogMetricTasks instead?
    • I moved all of the "Analysis" tasks to lsst/faro/measurement, and renamed them to things like Files containing individual metrics are now called (for example).
  • No labels