API for Data Access Services (v0 - Archived)

Reference document for LSST data products:

General comments about URI structure

  • Start with /<serviceType>. Supported service types: "meta", "db", "image"
    • all "meta" URIs are redirected to MetaServ
    • "db" URIs are redirected to different places (L1, Qserv) depending on database level
    • all "image" URIs are redirected to ImageCutout
  • <serviceType> is followed by API version. Examples: meta/v3, /image/v0, db/v17. A new version will be assigned each time there is a breaking change. New additions to the API are considered a non-breaking change.
  • Retrieving images
    • image kind + image Id uniquely identify an image. 
    • Examples of supported image kinds: bias, calexpcolorForEpo, deepCoaddrawtemplate,... etc.
    • By default, a returned image will contain all 3 planes: data, mask and variance. To select a subset, fetch one plane per request using "plane=<value>" parameter
  • For URIs that are listing things, if the list is long, by default the first <maxResultsPerPage> items will be shown. To overwrite this, use "start=<x>;count=<y>", e.g. "start=2000;count=1000" to get elements 2000-2999.
  • For all URIs one can use Accept request-header field to receive data in appropriate format. Formats we anticipate to support (default shown underlined, in bold):
    • for images: image/fits, application/hdf5, image/jpeg
    • for metadata: text/html, text/csv, text/tsv, application/fitsTable, application/json
    • for database query result: text/csv, application/fitsTable, application/IPAC table format, application/json, text/csv, application/VOtable
  • Many requests can be run asynchronously (in background). These requests are marked with "**" next to "GET", which means that "GET" should be replaced with "POST". POST will return a resource id which can then be used to check the status and retrieve results.

General information about output:

  • Get image or get image cutout will return url of the result. Result can be:
    • a status, e.g., "processing"
    • an error, e.g. "Image not found"
    • the requested full images, or cutout
  • In case of URI that allows start/count parameters, return values will include
    • a flag indicating whether there are more results
    • a flag indicating whether the results are "stable" (e.g. if one selects results 0-1000, and then 1000-2000, for some tables, such as Level 1, the 0-1000 might be different that what was returned when we request 1000-2000). Hint: appropriete result sorting might alleviate this problem.

Unclassified:

  • coadds should be addressable by either tract/patch (currently) unique identifer or by spatial region (with no unique identifer).There may need to be additional parameters like "filter" or "airmass".

Open questions, comments, concerns:

  • It is not possible to retrieve all images meeting certain criteria regardless of image kind through a single query

Related pages/ticket(s):

  • DM-1694 - Getting issue details... STATUS
  • DM-1916 - Getting issue details... STATUS
  • DM-2453 - Getting issue details... STATUS
  • DM-1868 - Getting issue details... STATUS
  • DM-3477 - Getting issue details... STATUS
  • DM-3484 - Getting issue details... STATUS
  • DM-3478 - Getting issue details... STATUS
  • DM-3479 - Getting issue details... STATUS
  • DM-3480 - Getting issue details... STATUS


#APIFull DescriptionOptional ParametersReturned JSON structureExamples of Returned Result

GET /List services.


Array of strings["db", "image", "meta"]

Metadata Service (metaserv) API
M1GET /metaList API versions for "meta".


Array of strings["v0", "v1"]
M2GET /meta/v0List types served for v0 of "meta" API.


Array of strings
{
  "result": ['db']
}
M3GET /meta/v0/dbList levels of databases.


Array of strings["dc", "L1", "L2", "L3", "dev"]
M4GET /meta/v0/db/L1?containing=%Stripe82%List databases available for a given level, containing substring "Stripe82"
  • start=0
  • count=1000
  • containing (show only names containing a given substring / regexp)
Array of strings

for L1: ["live", "userDB"]

for L2: ["DR1", "DR2"]

for L3: ["joe_myDb", "bill_test1", "mike_scratch56"]

M5GET /meta/v0/db/L3/joe_myDbRetrieve information about L3 database "joe_myDb"


Array containing 2 dictionaries. Keys for 1st:

  • name
  • owner
  • connectionHost
  • connectionPort

Keys for 2nd:

  • key-value pairs representing user annotations
[{"name":"joe_myDb", "owner": "joe", host: "lsst10", "port": "3360"}, {}]
M6GET /meta/v0/db/L2/DC_W13_Stripe82/tablesList tables for L2 database "DC_W13_Stripe82"
  • containing (show only names containing given keyword)
Array of strings

Example of results (truncated for formatting)

{
  "results": [
    [
      "AvgForcedPhot"
    ],
    [
      "AvgForcedPhotYearly"
    ],
    [
      "DeepCoadd"
    ],
    [
      "DeepCoadd_Metadata"
    ],
    [...]
  ]
}

M7GET /meta/v0/db/L3/joe_myDb/tables/ObjectRetrieve information about table "Object" in L3 database "joe_myDb"


Array of two dictionariers. Keys for 1st:

  • name
  • description

Keys for 2nd:

  • key-value pairs representing user annotations
[{"name": "Object", "descr": "this is my object table"}, {}]
M8GET

/meta/v0/db/L2/DC_W13_Stripe82/tables/Science_Ccd_Exposure/schema

Retrieve schema for table "Object" in database "Science_Ccd_Exposure".


String containing output from "SHOW CREATE TABLE"

Truncated for formatting:

{
  "result": [
    "Science_Ccd_Exposure",
    "CREATE TABLE `Science_Ccd_Exposure` (\n `scienceCcdExposureId` bigint(20) NOT NULL,\n `run` int(11) NOT NULL,\n ... PRIMARY KEY (`scienceCcdExposureId`),\n ...) ENGINE=MyISAM DEFAULT CHARSET=latin1"
  ]
}
M9GET /meta/v0/imageList levels of images.


Array of strings["DC", "L1", "L2", "L3", "dev"]
M10GET /meta/v0/image/L1List image collections available in a given <level>
Array of strings["DR1", "DR2", "ktl/test20150202"]
M11GET/meta/v0/image/L2/DR1List image kinds available in a given collection
Array of strings["raw", "fpCoadd", "deepCoadd", "diffIm", "template", "calExp"]
M12GET /meta/v0/image/L2/DR1/coadd?start=200&count=100List coadd images (200-300) for L2 DR1
  • start=0
  • count=1000
  • owner
  • createAfter
  • createBefore
  • more TBD
Array of strings["url/of/im1", "url/of/im2"]
M13GET /meta/v0/image/L2/DR1/coadd/12345Retrieve information about a coadd image identified by imageId = 12345


Dictionary. Keys:

  • url
  • owner
  • more TBD


{"url": "url/of/img", "owner": "tom"}

Database Query (dbserv) API
DB1GET /db/v0/tap<Nothing>


DB2POST** /db/v0/tap/sync?query=SELECT+id,ra,decl+
FROM+myDb.Object+WHERE+flux=3.2
Run a given query on L2 DR1 database
  • sql

2 rows from "select deepForcedSourceId,scienceCcdExposureId" would look like:

{
  "result": {
    "metadata": {
      "elements": [
        {
          "datatype": "long",
          "name": "deepForcedSourceId"
        },
        {
          "datatype": "long",
          "name": "scienceCcdExposureId"
        }
      ]
    },
    "table": {
      "data": [
        [
          8404051561545729,
          125230127
        ],[
          8404051561545730,
          125230127
        ]
      ]
    }
  }
}
DB3
tbd, see DM-3477 - Getting issue details... STATUS
Retrieve query type for a given query



Image Query (imgserv) API (see also Image Service and Image Cutout Details)
I1GET /image/v0/<nothing>




I2GET /image/v0/654/explainReturn cost estimate of asynchronous query identified by a resourceId (returned through "POST /image/...")


String (for now)TBD
I3GET /image/v0/654/statusRetrieve status of asynchronous request identified by a given resourceId (returned through "POST /image/...")


Dictionary. Keys:

  • status
  • startTime
  • progress
[{"status": "running", "startTime: "2015/05/14 016:43:21", "progress": "34%"}]
I4GET /image/v0/654/resultsRetrieve results of asynchronous request identified by a given resourceId (returned through "POST /image/...")


Array of strings

["/nfs/lsst/L3/jack/scratch/img1", "/nfs/lsst/L3/jack/scratch/img2", "/nfs/lsst/L3/jack/scratch/img3"]

I5GET** /image/v0/L2/DR7/coaddRetrieve all coadd images for L2 DR7
  • start=0
  • count=1000
  • plane (supported: data, mask, variance)
Array of strings["/nfs/lsst/L2/coadds/coad001", "/nfs/lsst/L2/coadds/coad002", "/nfs/lsst/L2/coadds/coad003", "/nfs/lsst/L2/coadds/coad004"]
I6GET** /image/v0/L2/DR1/coadd/12345?plane=maskRetrieve "mask" plane of a full "coadd" image from L2 DR1, identified by imageId = 12345
  • plane (supported: data, mask, variance)
Image
I7GET /image/v0/L2/DR1/coadd/12345?plane=data
GET /image/v0/L2/DR1/coadd/12345?plane=mask
Retrieve a multi-extension FITS file containing coadd identified by imageId = 12345, and the corresponding mask.
  • plane (supported: data, mask, variance)
Image
I8GET** /image/v0/L2/DR1/coadd/12345/cutout?x=1&y=2&width=30&height=30Retrieve a cutout of a "coadd" image identified by imageId = 12345. The cutout area: 30x30 pixels centered around (1,2)
  • plane (supported: data, mask, variance)
Image
I9GET** /image/v0/L2/DR1/calexp/12345/cutout?x1=1&y1=1&x2=2&y2=2Retrieve a cutout of an image identified by imageId. Corners of the cutout: (1,1), (2,2)
  • plane (supported: data, mask, variance)
Image
I10GET** /image/v0/L2/DR1/calexp/12345/cutout?plane=data&ra=1&dec=1&deltaRa=2&deltaDec=2Retrieve "data" plane of a cutout of an image identified by imageId centered around (ra,dec) = (1,1) with a box size 2x2 arcmin.
  • plane (supported: data, mask, variance)
Image
I11GET /image/v0/L2/DR1/calexp/12345/cutout?ra=1&dec=1&widthAng=10&heightAng=10 Retrieve a cutout of a "calexp" image identified by imageid=12345. The heightAng and widthAng are in arc seconds.
  • Natural 3-plane results
Image
I12GET /image/v0/L2/DR1/calexp/12345/cutout?ra=1&dec=1&widthPix=30&heightPix=30Retrieve a cutout of a "calexp" image identified by imageid=12345. The heightPix and widthPix are in pixels.
  • Natural 3-plane results
Image
  • No labels

28 Comments

  1. Getting URI design right is arguably the hardest part of RESTful service implementation (smile) There is a lot of debate on whether API version string should be a part of URI or not. I'm not going to tell you which way is better, but we have to think about stability and what our clients are supposed to do when API version changes, what changes are allowed, and for how long we need to support old versions.

    I think for URIs that return short lists of items we should not require supporting itemsPerPage and page parameters, that will simplify both client and server side.

    For error status ReST typically uses HTTP return codes with optional message in the response body (text/plain or any structured format).

    Regarding image types - if image ID includes its type then we probably do not want to expose type as a separate resource. Then "GET /image/v0/full" would return image IDs of all types (and ID information will have type explicitly or implicitly).

    "GET /image/v0/cutout/calexp/id=12345&x1=1&y1=1&x2=2&y2=2" - this does not look quite correct. If we call "/image/v0/cutout/calexp" a resource then URI should look like "GET /image/v0/cutout/calexp?id=12345&x1=1&y1=1&x2=2&y2=2", but if an image in the cutout service is a resource then "GET /image/v0/cutout/calexp/12345?x1=1&y1=1&x2=2&y2=2" is better. Can cutout service merge multiple images? If yes then single image ID probably does not make much sense.

    Are we going to support multiple representations of returned data like XML/JSON/whatever?

  2. I think all optional/keyword parameters should be query parameters, after a "?" and separated by "&".  We should not use ";" as a separator.  For the "plane=" parameter in particular, I think it's better to retrieve all planes together (no "plane=") or retrieve the planes individually ("plane=X") rather than allow combinations.  (Note that a fragment identifier, which might otherwise seem more logical, should not be used here as fragments are a client-side concept only and are not sent to the server unless something like JavaScript is used.

    1. K-T, I implemented your comment about planes. I am not sure why you added the comment about query parameters, what you suggested is exactly how I envisioned and designed it (but perhaps I mis-documented it?).

      1. I think K-T is saying that he does not want to see path parameters (separated from the path by ";") used at all in the API.

        Note that this is a separate issue from the relatively common acceptance of ";" as a separator in a list of query parameters, which I was not suggesting we adopt.

        1. Right, I think I got rid of it completely. Did I miss any place?

  3. Unknown User (xiuqin)

    HDF5 file format was discussed a lot at 2014 ADASS and IVOA. Shall we consider it as one of the supported data formats?

  4. Data release selection in queries:

    I see that the /db/... queries take a "?" query parameter "db" with an example value of "DR1", i.e., a data release selector.  A couple of remarks:

    • Will this query parameter be provided for all the Level 2 image data products, e.g., for retrievals of coadded images?
      • If so, then it needs an equivalent to the M4 "GET /meta/v0/db/<type>" query.
    • I assume the "db" query parameter defaults to the most recent data release.
      • Will the M4 query return an indication of which "?db=" value is the current default?
    • I assume that the numeric-identifier components of the various paths are unique only within a single data release.  That means that eventually, in user documentation, we should make sure that they understand that they can't scan through different releases' versions of the same image (for example) just by varying the "?db=" parameter.  
    • Are the identifiers also unique within a particular type (i.e., "raw", "template", "coadd", "calexp", etc.)?
    1. I'm moving discussion about this to DM-1916

  5. Distinguishing L1 and L2 versions of reprocessed data products

    Since most or all of the L1 data products will be regenerated in each data release, the catalog and image APIs should presumably allow the user to distinguish between the two.  I see how this could be done for catalogs - the "?db=" parameter presumably allows selecting something like "L1" (for the actively updated Level 1 database) in addition to the above-documented "DR1", "DR2", etc.  Will the L2 table names for the reprocessed L1 data products be generally expected to be the same as for L1?  (Barring the discovery of a serious issue that requires revision of the schema for the reprocessing.)

    How will the L1 and reprocessed-L1 image data products be distinguished?

     

    1. I'm moving discussion about this to DM-1916

      1. DM-1916 - Getting issue details... STATUS

  6. For DB3: Are you planning on using a special connection to the database which enforces limits to SELECT statements only?

    I've mentioned this before, but I haven't really elaborated on why it's important: GET requests should have no side effects. HTTP clients, browsers, and proxies are written with this assumption. The implication of this assumption is that it allows clients and proxies to perform an "optimization" where they submit a request twice without explicitly notifying you. Chrome is especially notorious at this: If a server doesn't finish the GET request, and 5 minutes have passed, Chrome will resend the same request again. Or, if a response is unable to be delivered, which especially seems to happen over wifi, a client might send the request again even though the server believes it has sent a response. There is one way of mitigating this, by recording the time of the last get request and making sure to use an If-Unmodified-Since header in every request, but that's messy and still a violation of the HTTP spec.

     

    From RFC 2616, Section 8.1.4

    This means that clients, servers, and proxies MUST be able to recover from asynchronous close events. Client software SHOULD reopen the transport connection and retransmit the aborted sequence of requests without user interaction so long as the request sequence is idempotent (see section 9.1.2). Non-idempotent methods or sequences MUST NOT be automatically retried, although user agents MAY offer a human operator the choice of retrying the request(s). Confirmation by user-agent software with semantic understanding of the application MAY substitute for user confirmation. The automatic retry SHOULD NOT be repeated if the second sequence of requests fails.

    A real world example of this happening:

    1. A Web Application on a laptop over wifi performs a GET request to a server.
    2. Wifi connection drops out. Chrome detects this.
    3. Wifi connection reestablished. Chrome doesn't bother waiting for response from original request, and resends the previous request. (Chrome may even discard the first response, I'm not sure.)

     

    That said, I think there are reasons why DB3 could be used, namely user queries, but the preferred method should be something similar  to the following:

    1. Establish a query object by POST'ing your query to the query resource
    2. If it's necessary to have a cursor for a large result set, especially a forward-only cursor, create a cursor for the query object (i.e. POST /query/1234/cursor).
      1. If it's forward-only, you should repeat POST's to that cursor to consume it's input. When the cursor has reached the end, it may notify you that the cursor is closed. Again, a GET request to consume from the cursor could potentially be executed twice.
      2. Otherwise, it's fine to perform GET requests then explicitly DELETE the cursor when you are done. Of course, the ends the lifecycle of the query as well.
    3. If an explicit cursor is not necessary, you may instead perform GET requests to /query/1234/results to retrieve information.

     

    1. I agree that it's essential that any GET be read-only, and thus likely limited to SELECT queries only.  Even CREATE TABLE AS SELECT is problematic.

      But I'm hoping that Qserv (or the Web service around it) will be smart enough in terms of query/result caching to be able to deal with multiple repeated queries as occur in the scenarios above with no loss of performance.

  7. Other coadds; coverage maps

    The DPDD (LSE-163) says (p. 52)

    We will retain smaller sections of all generated coadds, to support quality assessment and targeted science. Retained sections may be positioned to cover areas of the sky of special interest such as overlaps with other surveys, nearby galaxies, large clusters, etc. 

    regarding a number of coadd types that will not be preserved in full (e.g., short-period coadds, best-seeing coadds, PSF-matched coadds).

    Will the "smaller sections" of these coadds be made available through the API?

    How will coverage maps, for these and the standard coadds, be represented and made available through the API?

    1. Yes, those coadds need to be available through the API.  I'm expecting coverage maps to be represented as a separate image type and possibly as a separate plane of the normal coadd as well (e.g. shortPeriodCoadd would return an MEF with image, mask, variance, and coverage; shortPeriodCoadd_coverage could also be requested separately).

      That's for coverage as depth maps, obviously.  I'm expecting that coverage in terms of which specific visits went into computing any particular coadd pixel will be represented only in the database.  It's possible that the CoaddPsf datasets that contain this information could also be made available as non-image files.

      1. What layer of the system will be responsible for generating low-resolution rollups of coverage maps (a question that could be generalized to all images)?  Will the API generically support rescaling?  What about zooming all the way out to all-sky maps?  Will the API serve Aitoff (or something similar) projections of all-sky images (whether dynamically generated and cached, or statically generated)?

        1. I was expecting the image service to do cutouts and small mosaics at full resolution, not rollups or all-sky images.  There may be specialized data products for those produced for EPO that would also be available via the image service.  Otherwise, I would expect the SUI to produce them (dynamically or statically).

          1. Re: "... I would expect the SUI to produce them (dynamically or statically)."

            Presumably by invoking well-established DM stack code, though.  Will/does the stack have code for all-sky map projections?

            1. As David says below, I'm not sure this presumption is correct.  As far as I know, afw does not currently have code for downscaling in general or all-sky map projections, nor are there specific plans for such.  I would worry that there are many ways to do these, depending on the desired application, so they'd best be handled as extensions or plugins if they're part of the Stack at all.

  8. Unknown User (ciardi)

    Kian-Tat Lim, Gregory Dubois-Felsmann, Unknown User (xiuqin), Trey Roby: we should decide what and where we want these kinds of image produces and displays.  It was my understanding/view that the data products produced and listed in the data products document are the data prodcuts from which we had to work.  If there was another set of images (e.g., lower resolution images) that we would need to produce those in a manner consistent with the needs of the UI - for example, we may decide to use WWT to handle the all-sky images for zooming/panning etc and then use a firefly tool to handle a specific cutout or mosaic

  9. Unknown User (ciardi)

    Kian-Tat Lim: Hi KT - I agree; what I am really advocating for is a definition of what we (the royal we) want.  As we are identifying the design for the SUI and the needs for the displays/functions etc,  we should identify as a whole what we want as a project -

  10. The text above mentions "application/VOtable".  Strictly speaking the recommended MIME type is apparently "application/x-votable+xml".  See for instance the "MIME Type" section in the VOTable standard.

  11. With reference to DB3 - the ability to submit a piece of ADQL/SQL as a query - we think we need an interface for validating a query without actually running it, if only to support the required SUIT function of letting users compose ADQL/SQL queries by hand.  Validation at least in the sense of checking ADQL (or SQL) syntax, ensuring that references to tables and table columns are valid, etc.

    1. In order richly support VOTable by including UCDs, when available and possible, it's on the radar to do this sort of thing in the dbserv ADQL parser (i.e. my python PoC) by also leveraging the metaserv database. So, we will get semantic validation for free and it'd be quite easy to eventually expose a validation/explain endpoint in the API which checks that the implied schema in the query is valid. 

      However, you might not get a good error message for a query like SELECT "a FROM x , for example, because implementing good error messages for syntax errors takes quite a bit more work than validating tables/columns on well-formed queries.

      1. If we forgot about ADQL for the moment, just in the underlying qserv system, when thinking about an SQL query being submitted to the proxy/master, is the master able to fully validate the query or does it effectively pass some of it untested to the shard servers?  I.e., could the validation be done by asking the master to do all the work short of actually sending out the shard queries?

        1. The czar already does that sort of validation before it sends out queries, but right now it does dispatch all valid queries. The czar would need to effectively implement EXPLAIN which would execute the planning stage and validate tables/columns, then return an error or some information about how it will execute the queries (i.e. how many chunks are to be queried against). I don't think it's in any plans to have the czar deduce UCDs, however.

  12. I don't think the discussion in  DM-1916 - Getting issue details... STATUS  about permitting coadd cutout requests without an imageID was ultimately reflected in this page.

  13. We need to understand how we will be referring to Calibration Data Products in the API.