Andy Salnikov and I talked about the representation of the data collected by the common Activator code from all the define_quanta() calls during pre-flight.

We started with a work plan for the Activator for a particular campaign roughly like this (omitting many previous and following steps):

We then considered what to do with the results returned.

Imagine two classes, Dataset and Job.  (We'll probably need different names; Dataset is already taken, and Job leads to confusion with "batch job", one of which may be associated with many "Jobs".)

A Dataset object represents just that, a single dataset, defined by a fully-specified DataId and a Butler dataset type (and implicitly mapped to a concrete external artifact by the Repository configuration in force).

It has two additional attributes: 

A Job object represents a single application of run_quantum() for a single SuperTask in the Pipeline.  It has three relevant attributes:

In addition, the data structures produces by the common Activator code should then include:

We don't current propose that the graph explicitly include Job-to-Job dependency links; these can be extracted from traversal of Job-to-inputs-to-producer links.

This is a concept, not a Python design, just yet.  It makes sense to look at existing workflow management packages to see whether their data models can accommodate this.  This is especially relevant because we want to be able to persist the graph and reconstruct it, so that the Pre-flight phase can be separated from the "submit units of work" phase.

The generation of this graph is done by common code shared by all Activators.

The intent is for this graph then to be usable by all concrete Activators (e.g., the Level 2 / DRP production system) as the input for their construction of a concrete execution plan.  It is at that stage, then, that a concrete Activator could determine, e.g., that it was going to wrap up 100 CCDs' worth of ISR-SuperTask "Jobs" into a single batch job.  The concrete Activator would then be responsible for building the resulting batch-job-level aggregate DAG.

Concrete Activators might not end up using the persistency of the common execution plan graph, but might restate the graph after some post-processing in an Activator-specific form and persist it that way.  This is TBD.  It is not a requirement that the output of one Activator's Pre-flight phase be usable by a different concrete Activator's Run phase!

The MVP version of CmdLineActivator will do both Pre-flight and Run together and is not required to be able to persist its execution plan, but it would be useful if a later version added this capability, allowing the persisting of the plan as an ancillary output, or even allowing a user to optionally perform the two phases separately.  This should facilitate testing, among other things.  The default behavior should continue to be to do both phases under a single command.