There are two approaches for building the director (used to be known as secondary) index during catalog ingests:
- let the ingest system do it automatically
- do it manually via a dedicated REST service
By default (unless instructed differently as explained later in this section) the System will build the index automatically. Here is what's happening in this case during a typical workflow (assuming the current InnoDB
engine's table implementation of the index):
- The secondary index table gets created by the table registration service when a director table of a catalog is being registered.
- A MySQL table partition gets created in the secondary index table by a service starting a new super-transaction.
- The index gets populated by a slice of objects when committing a super-transaction based on the rows inserted into the director table of a catalog if any such contributions were made.
- The index may get consolidated by eliminating MySQL partitions from the table (if requested by a workflow) when publishing a catalog.
While the automated build process is handy for most workflows, some users may still benefit from deferring this operation till some later stage. This may have the following benefits:
- The commit time of the super-transactions may become noticeably shorter.
- The transaction commits may become more reliable as there are (potentially) many ore additional failure modes associated with building the secondary indexes. This includes:
- Failures when conducting additional operations for harvesting input data for the index from multiple (or all) chunks of a catalog spread across multiple (or all) workers of a setup.
- Running out of space on an underlying filesystem a database service used to store the indexes is located.
- Incorrect values of the object identifiers (such as duplicated values) are ingested as a chunk contribution during a super-transaction.
Building the index manually
There are two basic steps that need to be taken when choosing the manual build. First of all, one has to tell the Ingest system that the index would be built manually. This is done at the database registration step by setting the following parameter in the database description:
{... "auto_build_secondary_index":0, ... }
For example:
curl http://localhost:25080/ingest/database \ -X POST -H "Content-Type: application/json" \ -d '{"database":"test101","auto_build_secondary_index":0,"num_stripes":340,"num_sub_stripes":3,"overlap":0.01667,"auth_key":"SECURED"}'
That would be enough to prevent any further actions which would be undertaken otherwise by the System in a respect to the operation of the index.
The second step would be to make a request to the index build service. This request could be made at any time of a catalog lifecycle (normally when no super-transactions are being opened). Also, the service allows re-building of the index at any stage if needed. Just keep in mind that the content of the index would be based on the object existing in the catalog at a time when such a request was made. Here is an example:
curl http://localhost:25080/ingest/index/secondary \ -X POST -H "Content-Type: application/json" \ -d '{"database":"test101","director_table":"Object","allow_for_published":0,"rebuild":0,"local":1,"auth_key":"SECURED"}'
There are three optional attributes (allow_for_published
, rebuild
and local
) in the request. The attributes are explained in the documentation for the index build service. Please, read that document for further information on this subject. These parameters are important for most workflows.