1.1.1.1.1. Services
The current implementation of the system offers two services for allocating (or determining locations of existing) chunks:
- Single chunk allocation.
- Batch mode (multiple chunks) allocation.
A choice of a technique depends on the requirements of a workflow. Though, it's recommended to use the second service due to its efficiency in allocating large numbers of chunks.
Also note, that once a chunk is assigned (allocated) to a particular worker node all subsequent requests for the chunk are guaranteed to return the same name of a worker as a location of the chunk. Making multiple requests for the same chunk is safe. Chunk allocation requests require a valid super-transaction in the STARTED
state.
The following service is meant to be used for a single chunk allocation/location:
method | resource name |
---|---|
POST | /ingest/chunk |
Where the request object has the following schema, in which a client would have to provide the name of a database:
{"database":<string>, "chunk":<number>, "auth_key":<string> }
The service also supports an alternative method accepting a transaction identifier(transactions are always associated with the corresponding databases):
{"transaction_id":<number>, "chunk":<number>, "auth_key":<string> }
If the operation succeeded (see Error reporting in the REST API), the System would respond with the following JSON
object:
{... "location":{ "worker":<string>, "host":<string>, "host_name":<string>, "port":<number>, "http_host":<string>, "http_host_name":<string>, "http_port":<number> } }
For allocating multiple chunks one would have to use the following service:
method | resource name |
---|---|
POST | /ingest/chunks |
Where the request object has the following schema, in which a client would have to provide the name of a database:
{"database":<string>, "chunks":[<number>,<number>,...<number>], "auth_key":<string> }
Like the above-explained case of the single chunk allocation service, this one also supports an alternative method accepting a transaction identifier (transactions are always associated with the corresponding databases):
{"transaction_id":<number>, "chunks":[<number>,<number>,...<number>], "auth_key":<string> }
the difference in the object schema - unlike the single-chunk allocator, this one expects an array of chunk numbers.
The resulting object has the following schema:
{... "location":[ {"chunk":<chunk>, "worker":<string>, "host":<string>, "host_name":<string>, "port":<number>, "http_host":<string>, "http_port":<number>}, ... ] }
1.1.1.1.2. Notes on the connection parameters returned by the services
The table below explains the connection parameters returned by the services:
attr | description |
---|---|
host | The IP address of the worker's Ingest service that supports the proprietary binary protocol. |
host_name | The DNS name of the worker's Ingest service that supports the proprietary binary protocol. |
port | The port number of the worker's Ingest service that supports the proprietary binary protocol. This service requires the content of an input file to be sent directly to the service client. The Replication/Ingest system provides a ready-to-use application qserv-replica-file INGEST that is based on this protocol. |
http_host | The IP address of the worker's Ingest service that supports the HTTP protocol. |
http_host_name | The DNS name of the worker's Ingest service that supports the HTTP protocol. |
http_port | The DNS name of the worker's Ingest service that supports the HTTP protocol. The REST server that's placed in front of the service allows Ingesting a single file from a variety of external sources, such as the locally mounted (at the worker's host) filesystem, or a remote object store. |
Note that, In the current implementation of the Ingest system, values of the hostname attributes host_name
and http_host_name
are captured by the services themselves. The names may not be in the FQDN format. Therefore this information has to be used with caution and only in those contexts where the reported names could be reliably mapped to the external FQDN or IP addresses of the corresponding hosts/pods.