Gen3 Notes:

New unique id design has 2 separate id values: one indicating "origin" of ingestion and one incrementing id. For example, the data backbone would get 1 origin id for every endpoint. Each staff and end user would get their own id (thinking could use their LSST/NCSA id).
Both subset and merge implementations depend upon ids (will new unique id method be implemented first so we don't have to rewrite subset and merge?).
Subset or Merge use cases may want the Butler to also transfer the file. What is Gen3 mechanism for transferring files between Butlers? Does one just provide the correct transfer object in addition to the source or dest Butler? Is there configuration that between these two butlers use this transfer object?

Note: Excluding from Butler subset and merge functionality the replication of data from Database tables not maintained by Butler itself (e.g., Registry or internal table for Datastore are included but other science tables like loaded catalogs would not be).

-- Subset --

At a high-level, to "subset a Gen3 Butler" is what it sounds like: given a source Gen3 Butler (Registry and Datastore), replicate a subset of the data in a separate Gen3 Butler (Registry and Datastore)

Subset Details

The databases supporting the source and destination Butlers may be different RDBMS.
The Datastores of the source and destination Butlers may be different (e.g., POSIX vs Object Store).
Sometimes replication of the files is handled by a different entity (e.g., Workflow Management System or Bulk Download).
The destination Butler may be empty or already have data (including some of the data to be replicated).
Not only can sets of replicated data be defined by datasets, but what the registry needs to function in various use cases. For example, the Registry information needed to run a PipelineTask should not require all of the dataset information for everything in the inputs' provenance chain. A Full Registry includes information on intermediate datasets to complete the provenance chain, etc.

Subset Use Cases

		Enough Registry to run Pipeline Task(s)	Full Registry	Update Datastore (transfer separate)	Transfer file
1	Batch Processing Service	X		X
2	Bulk Download (entire release)		X	X
3	Bulk Download (certain dataset types from release)		X	X
4	From DBB to OODS (need to confirm OODS will store data originating from DBB)	X (if need full provenance already in DBB)		?	?
5	End-user/developer wanting a subset of data on their machine	X	?		X
6	Want files replicated, but share registry???

Subset Design

- Assumptions:
  - Include all datasetType definitions as opposed to only those used for inputs and outputs under the assumption that this isn't a large amount of data.
- Inputs
  - src butler obj
  - dest butler obj
  - list of input dataset ids (do we need full datasetRefs?) or collection name
  - list of output datasetRefs (can be empty list)
  - Type of Registry subset: run, full
  - Transfer files: true/false (or transfer object?)
- Outputs
  - The dest butler obj was updated so that it now contains a superset of the requested data. The unique id means that any new information copied into the dest butler should have the same unique id as it had in the source butler.
- Process
  - TBD
Example Batch Processing Service:
- src butler obj = Butler looking at Production DBB (Oracle, POSIX (or Rucio))
- dest butler obj = empty Butler, job scratch butler config (sqlite3, POSIX)
- Has list of input datasetRefs and list of output datasetRefs that it wants as in a single subset from some "pre-flight" mechanism
- Type of Registry subset = run
- Transfer files = False/None

-- Merge --

At a high-level, a butler merge means taking data from a source butler and adding it to a destination butler. The difference from subset is that a merge updates the ids of the copied data as if they were originally registered with the destination butler. This is used to allow for the use of temporary scratch butlers inside compute jobs etc.

Merge Details

The databases supporting the source and destination Butlers may be different RDBMS.
The Datastores of the source and destination Butlers may be different (e.g., POSIX vs Object Store).
The destination Butler may be empty or already have data (including some of the data to be replicated).
Sometimes replication of the files is handled by a different entity (e.g., Workflow Management System or Bulk Download).

Merge Use Cases

		Full Registry	Update Datastore (transfer separate)	Transfer file
1	Batch Processing Service	X	X
2	User/developer wanting to merge own multiple butlers into one^*	X	?	?

* e.g., staff may have laptop butler, near-OODS butler, near-DBB@chile butler, near-DBB@ncsa butler. Users may also have more than 1 butler where they do work. Current unique id solution would give each user one id. Can we solve this merge use case by giving each user enough "origin" ids which would turn this merge use case into a subset use case?

Merge Design

- Inputs
  - src butler obj
  - dest butler obj
  - list of dataset ids (do we need full datasetRefs?) or collection name
  - Transfer files: true/false (or transfer object?)
- Outputs
  - The dest butler obj was updated so that it now contains a superset of the requested data. Any new information copied into the dest butler should have new ids per the unique id design.
- Process
  - TBD
Example Batch Processing Service:
- src butler obj = job scratch butler (sqlite3, POSIX)
- dest butler obj = either an entire run scratch butler (sqlite3, POSIX) or Production DBB Butler (Oracle, POSIX (or Rucio))
- Use output collection name
- Transfer files = False/None

-- Gen3 Butler Subset/Transfer/Merge for Shared-Nothing Workflow --

Use Case

Identify within shared Registry/Datastore the content needed for executing one or more Quanta, and dump this to one or more files.
1. Input to this operation is a QuantumGraph (which is probably a subset of a larger QuantumGraph).
2. Output is a set of files written to a filesystem accessible from the Python Butler client, and a list (in Python? In another file?) of input actual-Dataset files.
Transfer the input dump files and input actual-Dataset files to a worker node (done by e.g. Pegasus, not Butler code).
Construct a local (limited, SQLite) Registry and (Posix) Datastore given the input dump files and input actual-Dataset files.
Run the QuantumGraph on the worker node (done by some single-node activator, probably a generalization of LaptopActivator, not Butler code). Outputs and provenance are written into the local Registry and Datastore.
Identify local Registry/Datastore content that should be sent back to the shared Registry/Datastore (i.e. don't need to re-transfer inputs). Dump this to one or more files on the worker node filesystem.
1. Input to this operation is ... some kind of diff between what was originally transferred (which doesn't need to be transferred back) and what exists now? Some other data structure output by the local Butler, which knows what has been put, and could "record" that for playback to the central Registry/Database?
2. Output is a set of files written to the worker-node filesystem, and a list of output actual-Dataset files.
Transfer the output dump files and output actual-Dataset files to some common location(s) (done by e.g. Pegasus, not Butler code).
Merge content from the output dump files and output actual-Dataset files into the shared Registry/Datastore.
1. This will require regenerating (autoincrement) dataset_ids and execution_ids while maintaining their relationships.
2. We may want to further modify the Registry content when inserting it, particularly to remove/change Collections and Runs.
3. There may be different logic for outputs from nodes that experienced some kind of processing failure.
4. To the extent the shared Registry SQL interface is implemented as views, this logic may need to be specific to that Registry implementation (or even not considered part of any Registry) rather than something provided by the base SqlRegistry class.

Design Questions

Transfer from shared to local looks like a pretty standard SQL dump, with some filtering on what rows should be dumped.
- Do we want to just dump INSERT INTO statements or CSV? This may depend on how much we need to specialize at either end for Oracle vs. SQLite.
- To separate the dump operation from the subset operation, it'd be nice to have some kind of in-memory representation for the output of the subset, which would then let us use the same code to dump subsets generated from many different kinds of inputs. Question mostly for Andy Salnikov : could we use "iterator-over-PreFlightUnitsRow" (i.e. what selectDataUnits outputs) as the intermediate representation?
  - Can we go from a QuantumGraph to iterator-over-PreFlightUnitsRow?
  - Can we implement a useful dump operation by iterating on PreFlightUnitsRow?
  - Can we write something to adapt an arbitrary (but carefully-written) custom SQL query into an iterator over PreFlightUnitsRow? This would also let us plug custom SQL into the QuantumGraph generation code in pipe_supertask, which would support a longstanding Michelle Gower use case.
- We probably only need to be able to dump limited-Registry content to meet the shared-nothing workflow use case, but it'd be nice to be able to dump full-Registry content to support other transfer use cases.
Right now, the subset-and-dump needed to transfer back from local worker to shared feels like something qualitatively different: it's more like a record of put calls that we would want to selectively re-play in the shared system.
- Is "replay" the right model for this?
- Is there any way we could make it share subset-and-dump code with the transfer-from-shared-to-local case? If so, is sharing that code still harder than implementing "replay"?
To what extent can we make merge Registry-implementation generic? If it only needs to call Registry.addDataset, for example, then a highly-customized, view-based Oracle Registry might be able to implement just that method and use a common Registry merge implement.
We also need to transfer the QuantumGraph itself (presumably by making it round-trip serializable in its own right). There may be information in the QuantumGraph (e.g. spatial regions) that would not be transferred to the local node by just dumping limited Registry content.

Space shortcuts

Page tree