1.1.1.1.1. Services

The current implementation of the system offers two services for allocating (or determining locations of existing) chunks:

  • Single chunk allocation.
  • Batch mode (multiple chunks) allocation. 

A choice of a technique depends on the requirements of a workflow. Though, it's recommended to use the second service due to its efficiency in allocating large numbers of chunks.

Also note, that once a chunk is assigned (allocated) to a particular worker node all subsequent requests for the chunk are guaranteed to return the same name of a worker as a location of the chunk. Making multiple requests for the same chunk is safe. Chunk allocation requests require a valid super-transaction in the STARTED  state.

The following service is meant to be used for a single chunk allocation/location:

methodresource name
POST/ingest/chunk

Where the request object has the following schema, in which a client would have to provide the name of a database:

{"database":<string>,
 "chunk":<number>,
 "auth_key":<string>
}

The service also supports an alternative method accepting a transaction  identifier(transactions are always associated with the corresponding databases):

{"transaction_id":<number>,
 "chunk":<number>,
 "auth_key":<string>
}

If the operation succeeded (see Error reporting in the REST API), the System would respond with the following JSON object:

{...
 "location":{
   "worker":<string>,
   "host":<string>,
   "host_name":<string>,
   "port":<number>,
   "http_host":<string>,
   "http_host_name":<string>, 
   "http_port":<number>
 }
}

For allocating multiple chunks one would have to use the following service:

methodresource name
POST/ingest/chunks

Where the request object has the following schema, in which a client would have to provide the name of a database:

{"database":<string>,
 "chunks":[<number>,<number>,...<number>],
 "auth_key":<string>
}

Like the  above-explained case of the single chunk allocation service, this one also supports an alternative method accepting a transaction identifier (transactions are always associated with the corresponding databases):

{"transaction_id":<number>,
 "chunks":[<number>,<number>,...<number>],
 "auth_key":<string>
}

(warning) the difference in the object schema - unlike the single-chunk allocator, this one expects an array of chunk numbers.

The resulting object  has the following schema:

{...
 "location":[
   {"chunk":<chunk>,
    "worker":<string>,
    "host":<string>,
    "host_name":<string>,
    "port":<number>,
    "http_host":<string>,
    "http_port":<number>},
   ...
 ]
}
1.1.1.1.2. Notes on the connection parameters returned by the services

The table below explains the connection parameters returned by the services:

attrdescription
host
The IP address of the worker's Ingest service that supports the proprietary binary protocol.
host_name

The DNS name of the worker's Ingest service that supports the proprietary binary protocol.

port

The port number of the worker's Ingest service that supports the proprietary binary protocol. This service requires the content of an input file to be sent directly to the service client. The Replication/Ingest system provides a ready-to-use application qserv-replica-file INGEST that is based on this protocol. 

http_host

The IP address of the worker's Ingest service that supports the HTTP protocol.

http_host_name
The DNS name of the worker's Ingest service that supports the HTTP protocol.
http_port
The DNS name of the worker's Ingest service that supports the HTTP protocol. The REST server that's placed in front of the service allows Ingesting a single file from a variety of external sources, such as the locally mounted (at the worker's host) filesystem, or a remote object store.

(warning) Note that, In the current implementation of the Ingest system, values of the hostname attributes host_name  and http_host_name  are captured by the services themselves. The names may not be in the FQDN format. Therefore this information has to be used with caution and only in those contexts where the reported names could be reliably mapped to the external FQDN or IP addresses of the corresponding hosts/pods.




  • No labels