My biggest concern with this proposal is that it implies a provenance relationship where one only usually exists, though this is a concern with any approach that mimics the Gen2 behavior. The Gen3 butler tracks provenance at a per-dataset level, instead of a per-repo or collection level, and hence it has the necessary information to answer much more precisely questions like "which calexps were used to generate this coadd", which is often what a user is really asking when retrieving a dataset from a parent repository. In fact, using the collection search path naively in Gen3 is actually slightly more likely to be misleading about provenance than parent lookup was in Gen2, because Gen2's support for multiple parent repositories never made it into the
CmdLineTask driver and hence was extremely rarely used (and conflicts between parent collections/repos is the primary way collection/repo provenance can differ from per-dataset provenance). That said, such confusion about provenance should still be quite rare, and is probably best addressed by making sure we provide intuitive (and well-documented) APIs for asking provenance questions properly.
Immutable DatasetTypes, DataIds, and DatasetRefs
The objects we used to describe datasets can all exist in several states, generally in a sort of progression from "unvalidated/incomplete" to "validated/complete":