There are two approaches for building the director (used to be known as secondary) index during catalog ingests:

  • let the ingest system do it automatically
  • do it manually via a dedicated REST service

By default (unless instructed differently as explained later in this section) the System will build the index automatically. Here is what's happening in this case during a typical workflow (assuming the current InnoDB engine's table implementation of the index):

While the automated build process is handy for most workflows, some users may still benefit from deferring this operation till some later stage. This may have the following benefits:

  • The commit time of the super-transactions may become noticeably shorter.
  • The transaction commits may become more reliable as there are (potentially) many ore additional failure modes associated with building the secondary indexes. This includes:
    • Failures when conducting additional operations for harvesting input data for the index from multiple (or all) chunks of a catalog spread across multiple (or all) workers of a setup.
    • Running out of space on an underlying filesystem a database service used to store the indexes is located.
    • Incorrect values of the object identifiers (such as duplicated values) are ingested as a chunk contribution during a super-transaction.

Building the index manually

There are two basic steps that need to be taken when choosing the manual build. First of all, one has to tell the Ingest system that the index would be built manually. This is done at the database registration step by setting the following parameter in the database description:

{...
 "auto_build_secondary_index":0,
 ...
}

For example:

curl http://localhost:25080/ingest/database \
  -X POST -H "Content-Type: application/json" \
  -d '{"database":"test101","auto_build_secondary_index":0,"num_stripes":340,"num_sub_stripes":3,"overlap":0.01667,"auth_key":"SECURED"}'

That would be enough to prevent any further actions which would be undertaken otherwise by the System in a respect to the operation of the index.

The second step would be to make a request to the index build service. This request could be made at any time of a catalog lifecycle (normally when no super-transactions are being opened). Also, the service allows re-building of the index at any stage if needed. Just keep in mind that the content of the index would be based on the object existing in the catalog at a time when such a request was made. Here is an example:

curl http://localhost:25080/ingest/index/secondary \
  -X POST -H "Content-Type: application/json" \
  -d '{"database":"test101","director_table":"Object","allow_for_published":0,"rebuild":0,"local":1,"auth_key":"SECURED"}'

There are three optional attributes (allow_for_publishedrebuild and local) in the request. The attributes are explained in the documentation for the index build service. Please, read that document for further information on this subject. These parameters are important for most workflows.



  • No labels