You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 25 Next »

Date

Attendees

Goals

This is a design sprint to kick off the LSST Science Pipelines documentation reboot. Our goal is to a create a tangible vision of what the Science Pipelines documentation will be. Questions we want to answer are:

  • Who are the users of Science Pipelines documentation? What does each group want to get out of the Science Pipelines and its documentation? Do those needs conflict? Do we need to prioritize one user group in the initial implementation?
  • What are the boundaries of the Science Pipelines documentation (the site at https://pipelines.lsst.io)? What are adjacent documentation projects that the Science Pipelines documentation might link against?
  • What's the curriculum for learning the Science Pipelines? What are the concepts that the Science Pipelines documentation site needs to cover? How are these concepts organized (hierarchically or as a bottom-up information network). Do different types of users need specific entry points into the documentation and Science Pipelines itself?
  • What kinds of content are we going to be producing? What do the templates of these topic types look likes?
  • How are concepts unique to Science Pipelines, like tasks and command line tasks, documented in both a code and information architecture sense?

Intended Sprint Products

These are suggested products and outcomes from the sprint:

  • A map of the science pipelines site. This map should resolve into individual HTML/reStructuredText documents (topics in Every Page is Page One terminology). Each topic should be annotated with:
    • Topic name.
    • Content purpose and scope.
    • Topic type (i.e., template).
    • Adjacent topics (topics that link into this page; topics this page will link out to).
  • Topic types and templates. Each template shapes how different types of topics are written. Examples can be: API reference, task, command line task, tutorial project, conceptual overview, recipe. See Every Page is Page One Chapter 9: EPPO Topics Conform to a Type.
  • Timelines. Timeline for content and for documentation infrastructure.

Prep Work/Background Reading Material

Meeting Logistics

  • Tuesday December 6: campfire chat at Bentley's or elsewhere.
  • Wednesday December 7. 9:00 am to 5:00 pm. LSST Workroom.
  • Thursday December 8. 9:00 am to 5:00 pm. LSST Workroom.
  • Friday December 9. 9:00 am to 5:00 pm (or as participants depart). LSST Workroom.

Discussion items

TimeItemWhoNotes
 

What is the scope of the "Science Pipelines" documentation site?

  • Note that important obs packages are outside lsst_distrib; omitting the obs packages would reduce the current usability of the Pipelines documentation.
 
  • Technical constraint: tightly coupled packages should be documented together since docs will be versioned tightly with the codebase (docs embedded in Git; also known as 'docs as code'). This is an LSST the Docs feature: https://sqr-006.lsst.io.
  • We agreed that a lot of middleware (things beyond Princeton and UW) should be included in pipelines.lsst.io because of the tight API integration, including:
    • task/supertask framework
    • butler
    • logging
    • display packages
  • Example: document the Butler API in pipelines.lsst.io, but document the DAX service elsewhere.
    • Butler is an API that has implementations for different backends.
    • Document implementations to each backend.
    • Doc how to write an implementation.
  • Example: document the Firefly display package in pipelines.lsst.io, but document Firefly itself elsewhere.
  • There is a list of obs packages that will be supported. These will be included in pipelines.lsst.io.
  • lsst.validate packages will be in pipelines.lsst.io.
  • Can all packages in the lsst Python namespace be thought of as pipelines.lsst.io (excluding simulations).? Is pipelines.lsst.io effectively the documentation for the "lsst" python package?
  • Think of pipelines.lsst.io as documentation for the open source project that might be used in other contexts besides LSST AP and DRP pipelines (other observatories, building L3 data products). Data release documentation will specify exactly how the Science Pipelines were used to build a data release.

Boundary between Pipelines docs and the Developer Guide

Should the pipelines documentation cover developer and build-oriented topics currently in the DM Developer Guide? Do pipelines users need to be able to create Stack packages to make Level 3 data products?

  • lsstsw and lsst-build
  • Structure of Stack packages (including sconsUtils and EUPS details)
  • etc?
  • developer.lsst.io is intended to define policies and practices specific to DM staff. We can't use it as documentation to end users.
  • If the build and packaging system are described in pipelines.lsst.io, it could be awkward for other software projects, like Qserv and Sims, that also depend on EUPS/sconsUtils/lsst-build/lsstsw, etc..
  • However, putting build/packaging documentation in pipelines.lsst.io probably makes the most sense for astronomers extending the stack. pipelines.lsst.io is already where astronomers will look to learn how to write new packages against the Pipelines API. Overall, we can just learn that pipelines.lsst.io is where build and packaging is fundamentally documented.
 Science Pipelines docs and LDM-151 
  • LDM-151 is where we're designing and planning the stack.
  • Eventually it will grow to say what the Stack is.
  • pipelines.lsst.io will also say what the Stack is.
  • LDM-151 is change controlled: not continuously deployed like the Stack documentation.
  • What if LDM-151 is kept as a record of the Stack used for reviews and related communities? And most users only use pipelines.lsst.io?
  • This needs to be discussed by DM/TCT leadership.
  • Existing proposal: https://ldm-493.lsst.io/v/v1/index.html#change-controlled-design-documents (suggests that content is transplanted and single sourced in design docs).
 

Who are our users?

  • What user group should be prioritized?
  • What are common activities that this group wants to achieve? What documentation will assist with that?
  • Where do the needs of different groups overlap?
 
  • DM developers in construction
    • Need API references most.
    • Currently learn APIs by introspection or reading the source and code that uses an API. Doxygen isn't useful.
    • Descriptions of how tasks fit together (both API-wise, and higher-level concepts; even LDM-151-level).
    • Examples to help us develop one package given lower level APIs.
    • Run tasks for validating processing; run on verification clusters.
    • DM is the biggest consumer of pipelines.lsst.io.
  • Construction-era science collaborations (sims users?)
    • Currently consumers of Sims (MAF).
    • Many won't contribute to the pipelines stack.
    • May want to give feedback. Need algorithmic descriptions.
  • DESC
    • Running real imaging data now with the stack
    • Want to contribute feedback (knowledge). E.g. on algorithms.
    • Want to contribute packages. E.g. twinkles.
    • Want to implement a measurement algorithm and compare against the performance of factory algorithms.
    • Need:
      • developer docs (to support development)
      • algorithm background (to comment on)
      • how to run pipelines on their own infrastructure.
  • LSST operators/scientists in operations
    • DRP may want an internal ops guide (out of scope)
    • Science directorate will have similar needs to DM developers now.
  • 'DataSpace' users in operations
    • SDSS experience: Small queries to subset data. Complex queries to get objects of interest. Use cut-out service to give context to catalogs.
    • Will want to run tasks on a subset of image data. Customize our algorithms.
    • Use Butler to get/put datasets within their storage quota (question).
    • Develop and test algorithms that may be proposed for incorporation in DRP.
  • Other observatories/surveys

 

Summary

  • DM developer needs generally match the needs of all other groups, possibly with the exception of some conceptual framing documentation. DM will be API oriented, whereas new users will need more conceptual docs.
  • Need priorities, still.
 

EUPS Packages as units of organization

  • It's natural to organize documentation (to some extent) according to units of EUPS packages, given that doc content should live with code. Should every EUPS package have a topic page and be linked from the homepage (like the astropy docs do for sub-packages)? Are there exceptions where documentation that may live in an EUPS package should actually be organized altogether independently of EUPS package structure?
  • What should typical in-package documentation look like? See https://validate-drp.lsst.io as a prototype, and https://docs.astropy.org in general.
  • To what extent should documentation refer to EUPS packages (e.g., afw) versus Python namespaces (lsst.afw)?

 

 

 

  • Document at the level of the Python module. e.g. afw.image, afw.table, pipe.base, not necessarily at the Git repository level.
  • Docs live inside packages and package docs can be built locally and independently of the full pipelines.lsst.io site.
  • However, the pipelines.lsst.io homepage can arrange docs for modules into topical groupings.
 

What is the structure of the documentation homepage?

 

Frameworks.

  • obs
  • meas.
  • modelling.
  • tasks.
  • Butler/Data Access Framework.
  • Data structures
  • geometry
  • display
  • log
  • debug
  • validate
  • Build system

 

Twinkles workflow.

 

Homepage structure.

 

Where should concepts of science interest (such as algorithm details) be documented?

  • Docstrings of code that implements algorithms?
  • Tasks/Command line task interface references?
  • Concept topics that then introduce task/API references?
  • To what extent are LSST design documents (e.g., LDM-151) cross-linked and referenced?
 
  • Algorithms don't match Python/C++ APIs 1:1. Indicates that algorithms should be described at a higher level.
  • Tasks might be the best home for algorithm documentation.
  • Need for higher-level overviews that describe "processing topics" that link to composed tasks.
 

How should examples and tutorials be produced?

  • All tutorial and in-text examples need to be runnable.
  • How do we leverage the example/ directories?
  • Should documentation pages essentially be written as Jupyter notebooks?
  
 

How should C++/Python API reference documentation be produced?

  
 

Listing topic types and templates

  • What are all the distinct types of things we'll need to document?
  • What should each type of content look like?
 

Preliminary listing.

 Community.lsst.org and the docs 
  • Approach 1: use community.lsst.org as a draft for docs: see new content on the forum, write the docs, and then post a link to that doc in the original topic. This is a culture and process problem.
  • Approach 2: auto-link to community.lsst.org topics from documentation pages. Can be done by looking for Community topics that link to the documentation site, and by looking for certain watch words that are embedded in the metadata of each reStructuredText page. DocEng will make this.
Homepage Outline

LSST Science Pipelines

  • Installation and setting up
  • Processing data: a tutorial
  • Release Notes
  • Community, and getting help
  • How to report issues
  • How to contribute

Processing Data

Data repositories

  • ...

Single Frame

  • Processing
  • Measurement

Coaddition

  • Processing
  • Measurement

Difference imaging

  • Processing
  • Measurement

Multi-epoch datasets

  • Processing
  • Measurement

Post-processing

  • ...

Frameworks

Data structures

  • Overview
  • lsst.afw.image - Images
  • lsst.afw.table - Tables
  • ...

Geometry framework

  • Overview
  • lsst.afw.geom - Geometry primitives
  • ...

Measurement framework

  • Overview
  • ...

Modelling framework

  • Overview
  • ...

Task framework

  • Overview
  • ...

Display framework

  • Overview
  • ...

Data access framework

  • Overview
  • lsst.daf.base
  • lsst.daf.persist

Observatory interfaces

  • Overview
  • Building observatory interfaces
  • CFHT (obs_cfht)
  • HSC (obs_hsc)
  • ...

Validation framework

  • Overview
  • lsst.validate.base
  • lsst.validate.drp

Logging

  • lsst.log

Debug

  • lsst.debug

Build and continuous integration

  • Packaging
  • Dependency packaging
  • CI datasets
  • utils
  • sconsUtils
  • lsstsw

 

Module topic

lsst.module.name — Readable name

Context establishment paragraph.

Links to related modules, pages, and disambiguation.

Design/High Level Overview

If necessary?

Command line tasks

  • Overview (if necessary)
  • Listing of command line task pages

Tasks

  • Overview (if necessary)
  • Listing of tasks

Using the lsst.module.name API

  • Links to API concept pages
  • If it has a C++/Python API

Python API reference

  • list of API object reference pages

C++ API reference

  • list of API object references pages

Packaging

  • Link to EUPS package/GitHub repository
  • Dependencies: auto-generated graph/list of EUPS dependencies

Related documentation

  • Linked design documents
  • Linked technotes
  • Linked papers
  • Linked Community conversations

 

Action items

  •  
  • No labels