API for Data Access Services (v0 - Archived)
Reference document for LSST data products:
General comments about URI structure
- Start with /<serviceType>. Supported service types: "meta", "db", "image"
- all "meta" URIs are redirected to MetaServ
- "db" URIs are redirected to different places (L1, Qserv) depending on database level
- all "image" URIs are redirected to ImageCutout
- <serviceType> is followed by API version. Examples: meta/v3, /image/v0, db/v17. A new version will be assigned each time there is a breaking change. New additions to the API are considered a non-breaking change.
- Retrieving images
- image kind + image Id uniquely identify an image.
- Examples of supported image kinds: bias, calexp, colorForEpo, deepCoadd, raw, template,... etc.
- By default, a returned image will contain all 3 planes: data, mask and variance. To select a subset, fetch one plane per request using "plane=<value>" parameter
- For URIs that are listing things, if the list is long, by default the first <maxResultsPerPage> items will be shown. To overwrite this, use "start=<x>;count=<y>", e.g. "start=2000;count=1000" to get elements 2000-2999.
- For all URIs one can use Accept request-header field to receive data in appropriate format. Formats we anticipate to support (default shown underlined, in bold):
- for images: image/fits, application/hdf5, image/jpeg
- for metadata: text/html, text/csv, text/tsv, application/fitsTable, application/json
- for database query result: text/csv, application/fitsTable, application/IPAC table format, application/json, text/csv, application/VOtable
- Many requests can be run asynchronously (in background). These requests are marked with "**" next to "GET", which means that "GET" should be replaced with "POST". POST will return a resource id which can then be used to check the status and retrieve results.
General information about output:
- Get image or get image cutout will return url of the result. Result can be:
- a status, e.g., "processing"
- an error, e.g. "Image not found"
- the requested full images, or cutout
- In case of URI that allows start/count parameters, return values will include
- a flag indicating whether there are more results
- a flag indicating whether the results are "stable" (e.g. if one selects results 0-1000, and then 1000-2000, for some tables, such as Level 1, the 0-1000 might be different that what was returned when we request 1000-2000). Hint: appropriete result sorting might alleviate this problem.
Unclassified:
- coadds should be addressable by either tract/patch (currently) unique identifer or by spatial region (with no unique identifer).There may need to be additional parameters like "filter" or "airmass".
Open questions, comments, concerns:
- It is not possible to retrieve all images meeting certain criteria regardless of image kind through a single query
Related pages/ticket(s):
- - DM-1694Getting issue details... STATUS
- - DM-1916Getting issue details... STATUS
- - DM-2453Getting issue details... STATUS
- - DM-1868Getting issue details... STATUS
- - DM-3477Getting issue details... STATUS
- - DM-3484Getting issue details... STATUS
- - DM-3478Getting issue details... STATUS
- - DM-3479Getting issue details... STATUS
- - DM-3480Getting issue details... STATUS
# | API | Full Description | Optional Parameters | Returned JSON structure | Examples of Returned Result |
---|---|---|---|---|---|
GET / | List services. | Array of strings | ["db", "image", "meta"] | ||
Metadata Service (metaserv ) API | |||||
M1 | GET /meta | List API versions for "meta". | Array of strings | ["v0", "v1"] | |
M2 | GET /meta/v0 | List types served for v0 of "meta" API. | Array of strings | { "result": ['db'] } | |
M3 | GET /meta/v0/db | List levels of databases. | Array of strings | ["dc", "L1", "L2", "L3", "dev"] | |
M4 | GET /meta/v0/db/L1?containing=%Stripe82% | List databases available for a given level, containing substring "Stripe82" |
| Array of strings | for L1: ["live", "userDB"] for L2: ["DR1", "DR2"] for L3: ["joe_myDb", "bill_test1", "mike_scratch56"] |
M5 | GET /meta/v0/db/L3/joe_myDb | Retrieve information about L3 database "joe_myDb" | Array containing 2 dictionaries. Keys for 1st:
Keys for 2nd:
| [{"name":"joe_myDb", "owner": "joe", host: "lsst10", "port": "3360"}, {}] | |
M6 | GET /meta/v0/db/L2/DC_W13_Stripe82/tables | List tables for L2 database "DC_W13_Stripe82" |
| Array of strings | Example of results (truncated for formatting) { "results": [ [ "AvgForcedPhot" ], [ "AvgForcedPhotYearly" ], [ "DeepCoadd" ], [ "DeepCoadd_Metadata" ], [...] ] } |
M7 | GET /meta/v0/db/L3/joe_myDb/tables/Object | Retrieve information about table "Object" in L3 database "joe_myDb" | Array of two dictionariers. Keys for 1st:
Keys for 2nd:
| [{"name": "Object", "descr": "this is my object table"}, {}] | |
M8 | GET /meta/v0/db/L2/DC_W13_Stripe82/tables/Science_Ccd_Exposure/schema | Retrieve schema for table "Object" in database "Science_Ccd_Exposure". | String containing output from "SHOW CREATE TABLE" | Truncated for formatting: { "result": [ "Science_Ccd_Exposure", "CREATE TABLE `Science_Ccd_Exposure` (\n `scienceCcdExposureId` bigint(20) NOT NULL,\n `run` int(11) NOT NULL,\n ... PRIMARY KEY (`scienceCcdExposureId`),\n ...) ENGINE=MyISAM DEFAULT CHARSET=latin1" ] } | |
M9 | GET /meta/v0/image | List levels of images. | Array of strings | ["DC", "L1", "L2", "L3", "dev"] | |
M10 | GET /meta/v0/image/L1 | List image collections available in a given <level> | Array of strings | ["DR1", "DR2", "ktl/test20150202"] | |
M11 | GET/meta/v0/image/L2/DR1 | List image kinds available in a given collection | Array of strings | ["raw", "fpCoadd", "deepCoadd", "diffIm", "template", "calExp"] | |
M12 | GET /meta/v0/image/L2/DR1/coadd?start=200&count=100 | List coadd images (200-300) for L2 DR1 |
| Array of strings | ["url/of/im1", "url/of/im2"] |
M13 | GET /meta/v0/image/L2/DR1/coadd/12345 | Retrieve information about a coadd image identified by imageId = 12345 | Dictionary. Keys:
| {"url": "url/of/img", "owner": "tom"} | |
Database Query (dbserv ) API | |||||
DB1 | GET /db/v0/tap | <Nothing> | |||
DB2 | POST** /db/v0/tap/sync?query=SELECT+id,ra,decl+ FROM+myDb.Object+WHERE+flux=3.2 | Run a given query on L2 DR1 database |
| 2 rows from "select deepForcedSourceId,scienceCcdExposureId" would look like: { "result": { "metadata": { "elements": [ { "datatype": "long", "name": "deepForcedSourceId" }, { "datatype": "long", "name": "scienceCcdExposureId" } ] }, "table": { "data": [ [ 8404051561545729, 125230127 ],[ 8404051561545730, 125230127 ] ] } } } | |
DB3 | Retrieve query type for a given query | ||||
Image Query (imgserv ) API (see also Image Service and Image Cutout Details) | |||||
I1 | GET /image/v0/ | <nothing> | |||
I2 | GET /image/v0/654/explain | Return cost estimate of asynchronous query identified by a resourceId (returned through "POST /image/...") | String (for now) | TBD | |
I3 | GET /image/v0/654/status | Retrieve status of asynchronous request identified by a given resourceId (returned through "POST /image/...") | Dictionary. Keys:
| [{"status": "running", "startTime: "2015/05/14 016:43:21", "progress": "34%"}] | |
I4 | GET /image/v0/654/results | Retrieve results of asynchronous request identified by a given resourceId (returned through "POST /image/...") | Array of strings | ["/nfs/lsst/L3/jack/scratch/img1", "/nfs/lsst/L3/jack/scratch/img2", "/nfs/lsst/L3/jack/scratch/img3"] | |
I5 | GET** /image/v0/L2/DR7/coadd | Retrieve all coadd images for L2 DR7 |
| Array of strings | ["/nfs/lsst/L2/coadds/coad001", "/nfs/lsst/L2/coadds/coad002", "/nfs/lsst/L2/coadds/coad003", "/nfs/lsst/L2/coadds/coad004"] |
I6 | GET** /image/v0/L2/DR1/coadd/12345?plane=mask | Retrieve "mask" plane of a full "coadd" image from L2 DR1, identified by imageId = 12345 |
| Image | |
I7 | GET /image/v0/L2/DR1/coadd/12345?plane=data GET /image/v0/L2/DR1/coadd/12345?plane=mask | Retrieve a multi-extension FITS file containing coadd identified by imageId = 12345, and the corresponding mask. |
| Image | |
I8 | GET** /image/v0/L2/DR1/coadd/12345/cutout?x=1&y=2&width=30&height=30 | Retrieve a cutout of a "coadd" image identified by imageId = 12345. The cutout area: 30x30 pixels centered around (1,2) |
| Image | |
I9 | GET** /image/v0/L2/DR1/calexp/12345/cutout?x1=1&y1=1&x2=2&y2=2 | Retrieve a cutout of an image identified by imageId. Corners of the cutout: (1,1), (2,2) |
| Image | |
I10 | GET** /image/v0/L2/DR1/calexp/12345/cutout?plane=data&ra=1&dec=1&deltaRa=2&deltaDec=2 | Retrieve "data" plane of a cutout of an image identified by imageId centered around (ra,dec) = (1,1) with a box size 2x2 arcmin. |
| Image | |
I11 | GET /image/v0/L2/DR1/calexp/12345/cutout?ra=1&dec=1&widthAng=10&heightAng=10 | Retrieve a cutout of a "calexp" image identified by imageid=12345. The heightAng and widthAng are in arc seconds. |
| Image | |
I12 | GET /image/v0/L2/DR1/calexp/12345/cutout?ra=1&dec=1&widthPix=30&heightPix=30 | Retrieve a cutout of a "calexp" image identified by imageid=12345. The heightPix and widthPix are in pixels. |
| Image |
28 Comments
Andy Salnikov
Getting URI design right is arguably the hardest part of RESTful service implementation
There is a lot of debate on whether API version string should be a part of URI or not. I'm not going to tell you which way is better, but we have to think about stability and what our clients are supposed to do when API version changes, what changes are allowed, and for how long we need to support old versions.
I think for URIs that return short lists of items we should not require supporting itemsPerPage and page parameters, that will simplify both client and server side.
For error status ReST typically uses HTTP return codes with optional message in the response body (text/plain or any structured format).
Regarding image types - if image ID includes its type then we probably do not want to expose type as a separate resource. Then "
GET /image/v0/full
" would return image IDs of all types (and ID information will have type explicitly or implicitly)."
GET /image/v0/cutout/calexp/id=12345&x1=1&y1=1&x2=2&y2=2
" - this does not look quite correct. If we call "/image/v0/cutout/calexp
" a resource then URI should look like "GET /image/v0/cutout/calexp?id=12345&x1=1&y1=1&x2=2&y2=2
", but if an image in the cutout service is a resource then "GET /image/v0/cutout/calexp/12345?x1=1&y1=1&x2=2&y2=2
" is better. Can cutout service merge multiple images? If yes then single image ID probably does not make much sense.Are we going to support multiple representations of returned data like XML/JSON/whatever?
Kian-Tat Lim
I think all optional/keyword parameters should be query parameters, after a "?" and separated by "&". We should not use ";" as a separator. For the "plane=" parameter in particular, I think it's better to retrieve all planes together (no "plane=") or retrieve the planes individually ("plane=X") rather than allow combinations. (Note that a fragment identifier, which might otherwise seem more logical, should not be used here as fragments are a client-side concept only and are not sent to the server unless something like JavaScript is used.
Jacek Becla
K-T, I implemented your comment about planes. I am not sure why you added the comment about query parameters, what you suggested is exactly how I envisioned and designed it (but perhaps I mis-documented it?).
Gregory Dubois-Felsmann
I think K-T is saying that he does not want to see path parameters (separated from the path by ";") used at all in the API.
Note that this is a separate issue from the relatively common acceptance of ";" as a separator in a list of query parameters, which I was not suggesting we adopt.
Jacek Becla
Right, I think I got rid of it completely. Did I miss any place?
Unknown User (xiuqin)
HDF5 file format was discussed a lot at 2014 ADASS and IVOA. Shall we consider it as one of the supported data formats?
Gregory Dubois-Felsmann
Data release selection in queries:
I see that the /db/... queries take a "?" query parameter "db" with an example value of "DR1", i.e., a data release selector. A couple of remarks:
Jacek Becla
I'm moving discussion about this to DM-1916
Gregory Dubois-Felsmann
Distinguishing L1 and L2 versions of reprocessed data products
Since most or all of the L1 data products will be regenerated in each data release, the catalog and image APIs should presumably allow the user to distinguish between the two. I see how this could be done for catalogs - the "?db=" parameter presumably allows selecting something like "L1" (for the actively updated Level 1 database) in addition to the above-documented "DR1", "DR2", etc. Will the L2 table names for the reprocessed L1 data products be generally expected to be the same as for L1? (Barring the discovery of a serious issue that requires revision of the schema for the reprocessing.)
How will the L1 and reprocessed-L1 image data products be distinguished?
Jacek Becla
I'm moving discussion about this to DM-1916
Gregory Dubois-Felsmann
DM-1916 - Getting issue details... STATUS
Brian Van Klaveren
For DB3: Are you planning on using a special connection to the database which enforces limits to SELECT statements only?
I've mentioned this before, but I haven't really elaborated on why it's important: GET requests should have no side effects. HTTP clients, browsers, and proxies are written with this assumption. The implication of this assumption is that it allows clients and proxies to perform an "optimization" where they submit a request twice without explicitly notifying you. Chrome is especially notorious at this: If a server doesn't finish the GET request, and 5 minutes have passed, Chrome will resend the same request again. Or, if a response is unable to be delivered, which especially seems to happen over wifi, a client might send the request again even though the server believes it has sent a response. There is one way of mitigating this, by recording the time of the last get request and making sure to use an If-Unmodified-Since header in every request, but that's messy and still a violation of the HTTP spec.
From RFC 2616, Section 8.1.4
A real world example of this happening:
That said, I think there are reasons why DB3 could be used, namely user queries, but the preferred method should be something similar to the following:
Kian-Tat Lim
I agree that it's essential that any GET be read-only, and thus likely limited to SELECT queries only. Even CREATE TABLE AS SELECT is problematic.
But I'm hoping that Qserv (or the Web service around it) will be smart enough in terms of query/result caching to be able to deal with multiple repeated queries as occur in the scenarios above with no loss of performance.
Gregory Dubois-Felsmann
Other coadds; coverage maps
The DPDD (LSE-163) says (p. 52)
regarding a number of coadd types that will not be preserved in full (e.g., short-period coadds, best-seeing coadds, PSF-matched coadds).
Will the "smaller sections" of these coadds be made available through the API?
How will coverage maps, for these and the standard coadds, be represented and made available through the API?
Kian-Tat Lim
Yes, those coadds need to be available through the API. I'm expecting coverage maps to be represented as a separate image type and possibly as a separate plane of the normal coadd as well (e.g.
shortPeriodCoadd
would return an MEF with image, mask, variance, and coverage;shortPeriodCoadd_coverage
could also be requested separately).That's for coverage as depth maps, obviously. I'm expecting that coverage in terms of which specific visits went into computing any particular coadd pixel will be represented only in the database. It's possible that the CoaddPsf datasets that contain this information could also be made available as non-image files.
Gregory Dubois-Felsmann
What layer of the system will be responsible for generating low-resolution rollups of coverage maps (a question that could be generalized to all images)? Will the API generically support rescaling? What about zooming all the way out to all-sky maps? Will the API serve Aitoff (or something similar) projections of all-sky images (whether dynamically generated and cached, or statically generated)?
Kian-Tat Lim
I was expecting the image service to do cutouts and small mosaics at full resolution, not rollups or all-sky images. There may be specialized data products for those produced for EPO that would also be available via the image service. Otherwise, I would expect the SUI to produce them (dynamically or statically).
Gregory Dubois-Felsmann
Re: "... I would expect the SUI to produce them (dynamically or statically)."
Presumably by invoking well-established DM stack code, though. Will/does the stack have code for all-sky map projections?
Kian-Tat Lim
As David says below, I'm not sure this presumption is correct. As far as I know, afw does not currently have code for downscaling in general or all-sky map projections, nor are there specific plans for such. I would worry that there are many ways to do these, depending on the desired application, so they'd best be handled as extensions or plugins if they're part of the Stack at all.
Unknown User (ciardi)
Kian-Tat Lim, Gregory Dubois-Felsmann, Unknown User (xiuqin), Trey Roby: we should decide what and where we want these kinds of image produces and displays. It was my understanding/view that the data products produced and listed in the data products document are the data prodcuts from which we had to work. If there was another set of images (e.g., lower resolution images) that we would need to produce those in a manner consistent with the needs of the UI - for example, we may decide to use WWT to handle the all-sky images for zooming/panning etc and then use a firefly tool to handle a specific cutout or mosaic
Unknown User (ciardi)
Kian-Tat Lim: Hi KT - I agree; what I am really advocating for is a definition of what we (the royal we) want. As we are identifying the design for the SUI and the needs for the displays/functions etc, we should identify as a whole what we want as a project -
Gregory Dubois-Felsmann
The text above mentions "application/VOtable". Strictly speaking the recommended MIME type is apparently "application/x-votable+xml". See for instance the "MIME Type" section in the VOTable standard.
Gregory Dubois-Felsmann
With reference to DB3 - the ability to submit a piece of ADQL/SQL as a query - we think we need an interface for validating a query without actually running it, if only to support the required SUIT function of letting users compose ADQL/SQL queries by hand. Validation at least in the sense of checking ADQL (or SQL) syntax, ensuring that references to tables and table columns are valid, etc.
Brian Van Klaveren
In order richly support VOTable by including UCDs, when available and possible, it's on the radar to do this sort of thing in the dbserv ADQL parser (i.e. my python PoC) by also leveraging the metaserv database. So, we will get semantic validation for free and it'd be quite easy to eventually expose a validation/explain endpoint in the API which checks that the implied schema in the query is valid.
However, you might not get a good error message for a query like
SELECT "a FROM x
, for example, because implementing good error messages for syntax errors takes quite a bit more work than validating tables/columns on well-formed queries.Gregory Dubois-Felsmann
If we forgot about ADQL for the moment, just in the underlying qserv system, when thinking about an SQL query being submitted to the proxy/master, is the master able to fully validate the query or does it effectively pass some of it untested to the shard servers? I.e., could the validation be done by asking the master to do all the work short of actually sending out the shard queries?
Brian Van Klaveren
The czar already does that sort of validation before it sends out queries, but right now it does dispatch all valid queries. The czar would need to effectively implement
EXPLAIN
which would execute the planning stage and validate tables/columns, then return an error or some information about how it will execute the queries (i.e. how many chunks are to be queried against). I don't think it's in any plans to have the czar deduce UCDs, however.Gregory Dubois-Felsmann
I don't think the discussion in DM-1916 - Getting issue details... STATUS about permitting coadd cutout requests without an imageID was ultimately reflected in this page.
Gregory Dubois-Felsmann
We need to understand how we will be referring to Calibration Data Products in the API.