1. Introduction

This documents keeps various notes on design decisions and implementation-specific notes for the new Ingest system.

2. The proposed public interface to the system

This interface would include a series of operations to be performed in an order outlined in a series of the subsections below.

2.1. Registering a new database

Via:

  • the REST service of the Master Replication Controller

Prerequisites:

  • the database must not be known to the system

Parameters:

  • the name of the database
  • the number of stripes  
  • the number of sub-stripes 

Result:

  • completions status (SUCCESS or an explanation of a failure)

2.2. Registering a table 

Via:

  • the REST service of the Master Replication Controller

Prerequisites:

  • the database must be known to the system
  • the database must be in the UNPUBLISHED  state
  • the table must not be known to the system
  • there should not be any SUPER-TRANSACTION open in a scope of the database

Parameters:

  • the name of a database
  • the name of the table
  • the kind of the table (PARTITIONED vs REGULAR)
  • the schema of the table (in a structured format like JSON, YAML, etc.)

Result:

  • completions status (SUCCESS or an explanation of a failure)

2.3. Beginning a super-transaction

Via:

  • the REST service of the Master Replication Controller

Prerequisites:

  • the database must be known to the system
  • the database must be in the UNPUBLISHED  state

Parameters:

  • the name of a database
  • the database must be in the UNPUBLISHED  state

Result:

  • completions status (SUCCESS or an explanation of a failure)
  • a unique identifier of the transaction
  • performance/scalability hints for loading workflows:
    • the number of workers available for loading
    • the number of loading threads at each worker

Notes:

  • the hints returned by the operation could be used by the loading workflows to optimize operations and to avoid overloading the Ingest System.

2.4. Requesting a placement hint for a chunk

Via:

  • the REST service of the Master Replication Controller

Prerequisites:

  • the SUPER-TRANSACTION must already exist in a scope of the database
  • the transaction must be in the STARTED  state
  • the chunk number must have a valid number as defined by the corresponding PARTITIONING PARAMETERS  of the database (family)

Parameters:

  • a unique identifier of a SUPER-TRANSACTION
  • a number of the chunk

Result:

  • completions status (SUCCESS or an explanation of a failure)
  • the name (or an IP address) of a worker host where the new chunk will be placed (or where it's already located)
  • a port number for sending data
  • any additional data if needed

2.5. Loading data of a chunk

This operation may be repeated as many times as needed to load data associated with the chunk.

Via:

  • the INGEST service of the Replication System's workers

Prerequisites:

  • the SUPER-TRANSACTION must already exist in a scope of the database
  • the transaction must be in the STARTED  state
  • the chunk number must have a valid number as defined by the corresponding PARTITIONING PARAMETERS  of the database (family)
  • the table must be known to the system, and it must be associated with the corresponding database (to be extracted from the SUPER-TRANSACTION)

Parameters:

  • a unique identifier of a SUPER-TRANSACTION
  • a number of the chunk
  • the name of a table
  • a data blob (a format of the blob or the structure will be determined later) and its length

Result:

  • completions status (SUCCESS or an explanation of a failure)

2.6. Ending (normally or abnormally) a super-transaction

Via:

  • the REST service of the Master Replication Controller

Prerequisites:

  • the transaction must be known to the system
  • the transaction must be in the STARTED  state

Parameters:

  • a unique identifier of a SUPER-TRANSACTION
  • a flag indicating of the transaction is going to be finished NORMALLY  or ABORTED 

Result:

  • completions status (SUCCESS or an explanation of a failure)

2.7. Publishing a database

Via:

  • the REST service of the Master Replication Controller

Prerequisites:

  • the database must be known to the system
  • the database must be in the UNPUBLISHED  state
  • all SUPER-TRANSACTIONS  associated with the database must be ended

Parameters:

  • the name of a database

Result:

  • completions status (SUCCESS or an explanation of a failure)

3. Managing databases

This section covers design aspects of the Ingest System which are affected by the following requirements to the system:

  • databases which are being ingested needs to be treated differently by the Replication system. In particular, a desired replication level for these database may differ from the fully loaded ones.
  • for a variety of reasons, there must be a way of differentiating between these types of databases. Hence we need a marker to indicate if a particular database is PUBLISHED  or it's not.
  • a chunk placement algorithm for the new chunks (of databases which are still being ingested) must be aware about location of the same chunks within the PUBLISHED  databases. This would allow the algorithm to collocate new chunks with the existing ones, thus reducing (or completely eliminating) a need in a subsequent replica FIX-UP  procedure.

Presently, the collocation between chunks of two databases is determined through a relationship of these databases to the same database FAMILY. Therefore, a simplest solution would be to have the new databases included into the appropriate families (as determined by the number of stripes  and the number of sub-stripes  attributes of the families. This decision leads to the following practical consequences:

  • a persistent schema of the Replication system needs to be extended with a boolean attribute added to the database definitions. This attribute will indicate if the database is PUBLISHED  or not.
  • the transient API to the DatabaseServices  would be extended to expose this flag to the applications
  • (at the initial stage of the development) all standard operations of the Replication system would ignore databases, tables or chunks of databases which are not yet PUBLISHED. This will be done to allow developing and testing the Ingest system w/o interfering with the normal operations of the Replication system. At some later stage of the development a support for the new class of databases will be gradually added to the Replication system's algorithms and reports.

4. Implementing the chunk placement

The chunk placement logic will be based on an existing infrastructure of the Replication System, which maintains a database table chunk  for recording various info on the chunks. These are some of the problems to be solved in this context:

  • how to serialize the chunk placement requests for the same chunk (of the same database) to guarantee that all such requests would get the same result?
  • which worker to be chosen in case of more than one replica of a chunk exists?

The first problem can be solved by implementing the (yet to be developed)  ChunksAllocation service, which would serialize the placement requests via the synchronized  methods. The service would read and update the persistent structure (the above mentioned table chunks, or others).

The second problem may require extending a persistent state of the Replication system with a notion of the PRIMARY location (the name of a worker) of a chunk. The primary location will be the one where the chunks would be appended (within SUPER-TRANSACTIONS) and then  be BACKED up at other locations (workers). An easiest solution to this problem would be to add an additional boolean attribute to the table schema of chunks. Other option to be considered would a numeric VERSION of a chunk. Though, potential benefits and complications of the later may still require some further analysis.

5. What happens when a super-transaction is ending?

TBC...

6. What happens when a database is being published?

TBC...

7. Implementation roadmap

  • (tick) persistent support for super-transactions
    • table schema
    • DatabaseServices  
    • REST  services for managing super-transactions
  • (tick) database attribute to differentiate PUBLISHED databases from the ones which are being ingested
    • table schema
    • Configuration
    • REST  services for managing the database state (Configuration)
    • extended selectors in DatabaseServices  to differentiate operations with databases of different states. The default parameters of the selectors would always include database the PUBLISHED databases
  • (tick) test the above aded functions
    • first, merge DM-19226 into master, then rebase DM-19156 against master. This is neeed because of the schema change, whic
  • (minus)(tick)  REST  services for Ingest
    • (tick) Adding UNPUBLISHED databases
    • (tick)(minus) Adding tables
      • (question)in which format to store table schema for the ingested catalogs? CSS, MySQL INFORMATION_SCHEMA, ?
        • (tick) added Configuration table to the Replication system to store column definitions
      • (tick) creating prototype tables at workers
      • (tick) add the director  table attribute to the Configuration as requested by a client. Also add the name of the objectId  column if the director  table attribute is provided when adding a table. Also add these attributes to class Configuration::DatabaseInfo.
        • (question) what about the overlap  radius? Should it be also stored in the Configuration?
      • (question) what about chunkId  and subChunkId  columns of the partitioned tables? Should the input schema be tested for a presence of those? Or should they be added automatically as the very last columns? (warning) look at how it's done now.
    • (tick) chunk allocator
      • (warning) still needs to be fully tested. Probably after putting in place a simple loader server
    • (tick) (super-)transaction management
      • (question) should the secondary index be also extended at this stage by harvesting triplets  from Object  (Qserv director  table)
    • (minus) publishing databases
      • (minus) build the secondary index.
        • (question) Should this be a separate REST call and a job? Note that it may be a lengthy operation.
        • (question) Should this be done when committing super-transactions? This may be much quicker compared to doing all at once.

        • (warning) from the implementation prospective this would require adding a special kind of Controller and Worker requests to harvest triplets  from the Object  tables of the workers and loading them into the secondary index  table in Qserv czar's database.
      • (minus) create database & tables entries in the Qserv master database

        • (question) should this be rather done when creating UNPUBLISHED  databases an adding tables?
      • (minus) grant SELECT authorizations for the new databases to Qserv MySQL account(s) at all workers and the master(s)

      • (minus) register the databases, tables and partitioning parameters at CSS

      • (minus)(tick) enable databases at Qserv workers by adding an entry to table 'qservw_worker.Dbs'

        • (warning) the implementation of this operation is ready. It just needs to be wired into the REST service

      • (minus)(tick) remove MySQL partitions
        • (warning) the implementation of this operation is ready. It just needs to be wired into the REST service
      • (minus) ask Replication workers to reload their Configurations so that they recognized the new database as the published one. This step should be probably done after publishing the database.

  • (tick) (minus) loader server at workers
    • (tick) extend Configuration to support loader services
      • extended attributes in WorkerInfo  
      • extended Configuration  schema
      • extended Configuration  API
    • (tick) support  both CSV and TSV files
    • (tick) manage MySQL partitions
      • (warning) the prototype tables created by the REST services have the default partition p0  as required by MySQL DDL. The loading service would have to ensure a new partition (based on numbers/values of the provided super-transactions) is added to the chunk-specific tables 
    • (tick) implement a command-line tool for sending CSV/TSV files to the loader services
    • (tick) Added a REST service for generating the empty chunks list and placing it a usual location of the Qserv master  nodes
    • (minus) add support for for the regular  (non-chunked) tables
      • (warning) this can be done via an optional switch to the loading application. The protocol definition may also need to be extended.
  • (minus) add support for replicating semi-ingested tables when committing super-transactions 
  • (minus) add support for translating partitioned MySQL tables into the monolithic ones when publishing databases
  • (minus) Extend the Web Dashboard to display various aspects of the Ingest System (status of the UNPUBLISHED  databases, super-transactions, the number of chunks loaded, the amount of data loaded, etc.)





  • No labels