The Gen3 versions of the AP and DRP pipelines are already able to do at least as much as their Gen2 counterparts, and we currently have no reason to believe that they perform any worse at anything.
We also now have the green light to start removing the Gen2 versions of things, at least according to the release stability promised we've made in the past.
At present, however, some important test coverage still relies on Gen2 middleware and CmdLineTasks - this is a problem in its own right, because it means we're testing things under unrealistic, non-production conditions, and reliance on those tests make it harder to change our tasks in Gen3-only ways. This page attempts to capture all cases in which we use Gen2 as a test harness, and transform this information into a plan for removing different aspect of Gen2 middleware.
Known Gen2 integration tests and multi-task unit tests
ci_hsc_gen2
This:
- Runs the full Gen2 DRP pipeline on HSC data.
- Tests that the expected output datasets are actually produced and meet very minimal expectations.
- Runs and tests the Gen2 → Gen3 repository conversion tools.
- Tests the pipe.base.ShimButler transition convenience code.
(1) is already done by ci_hsc_gen3
.
(2) is not, and this is probably the single biggest blocker for removing CmdLineTask
inheritance from most hybrid PipelineTask
/CmdLineTasks
; we cannot remove ci_hsc_gen2
until ci_hsc_gen3 includes those checks (
-
DM-28806Getting issue details...
STATUS
).
(3) and (4) are worth keeping around as long as long as the rest of ci_hsc_gen2
has to stay around, since they provide coverage for tools that may still be useful while we are converting Gen2 test code to Gen3.
pipe.tasks.mocks and nopytest_test_coadds.py
This test utility subpackage and the (only) test that uses it provide a lot of early coverage for our coaddition and coadd-processing tasks. They use very simple (i.e. Gaussian stars, little or no noise) simulations, in order to check that the bookkeeping for CoaddPsf is exactly right on a small amount of data, rather than statistically right on a lot of noisy data. There's a lot of infrastructure here just to mimic a Gen2 CameraMapper, but it's also tied up with the code that does the simulations.
The test script has also attracted a number of miscellaneous test methods that simply want to check something about the outputs of the tasks the test runs.
Porting all of this coverage to Gen3 would be a lot of work, and I'm skeptical that it's all useful:
- The zero-noise, simple-PSF simulation framework was intended to be useful for more than
CoaddPsf
, but no other tests made use of that (that I know of, at least). - While that test was helpful in getting
CoaddPsf
up and running initially,CoaddPsf
still shipped with a critical bug in weights that the test was too simple to catch, and a much bigger problem withCoaddPsf
involving sigma-clipping plagued us for a while, too. My takeaway from all of this is that for something likeCoaddPsf
, large-scale integration tests on realistic data (~RC2 or test-med-1, not justci_hsc
orci_imsim
) are necessary; if we have those, they are also sufficient, and a smaller test like this is only useful as a way to catch some problems early, especially when there is high churn. CoaddPsf
isn't experiencing high churn at all, and in fact we hope to drop it in favor of cell-based coadds someday instead of develop it further.- The coverage from running the other tasks here is again mostly just coverage that they run at all, and hence it duplicates coverage we get from
ci_hsc_gen3
andci_imsim
. Nobody should be making changes to these tasks without running thoseci_
packages before merge, and is probably developing against real data, not this test, so I don't think discovering failures "earlier" from this test is a strong argument.
So, my tentative recommendation here is to port the test methods that perform checks on the processing outputs to ci_hsc_gen3
and/or ci_imsim
, and scrap the rest.
pipe_tasks' test_processCcd.py
This runs ProcessCcdTask on data from obs_test. It seems to mostly check that the tasks run at all, but also has some purely empirical regression tests ("we got this value once; check that the new one isn't worse").
obs_test is a pure-Gen2 thing, and the pipelines_check
package runs the exact same tasks already. Let's move the checks to tests in that package, and drop this test entirely.
Guidelines for Gen2 content removal
TODO: what conditions need to be met first?
TODO: what are the typical things to remove?
Removing CmdLineTask inheritance from hybrid tasks
Removing Gen2 support from obs packages
Removing Gen2 support from obs_base
Removing Gen2 support from pipe_base
Removing daf_persistence
Post-Removal cleanups
TODO