Ingest

Catalog products

  • Ingested into temporary database
    • For use in other DRP steps or by SDQA
    • "Patch" updates happen by execution of new Tasks
      • Must be possible to reproduce entire DRP, including any "patches"
  • Level 1 products ingested into internal Level 1 database
  • Level 2 products ingested into Qserv database
    • Can remove a batch (with queries disabled), then reingest after "patch" update Task
    • Need to track batches and their status

EFD

  • Transformed within 24 hours into Level 1 Science Data Archive EFD
    • Transformation includes removal of sensitive data (e.g. personnel-relevant log entries)
    • Transformation includes restructuring schema to be more science-query-friendly
      • Adding join keys
      • Denormalizing
      • Creating views
    • Note: it's possible that the Science Data Archive EFD will not actually be in relational form.  Something like a NoSQL document database or BigTable/Hypertable might be more appropriate.
  • Cleansed and transformed for Level 2 EFD as part of annual CPP
    • Cleansing includes flagging of invalid data

Image products

  • Ingested into archive, including provenance
  • Metadata ingested into archive metadata database
  • Accessible by internal image services

SDQA

Catalog Products

  • Zeroth pass is metrics produced by Tasks
    • Automatic flagging of metrics outside threshold
  • First pass is in temporary database
    • Automatic metric generation (for metrics using data from multiple Task executions)
  • Second pass is in Qserv database for Level 2 products, internal Level 1 database for Level 1 products
    • Verification of completeness and consistency
    • Large-scale analyses across entire dataset
    • Verification of performance for end users

Image products

  • Most SDQA on metadata
  • Inspect outliers using internal cutout service

 

 

 

  • No labels