The Main Manual Page Dynamic API Documentation CD-ROM API Documentation About Onix Types About Onix Errors Onix's Web Site at Lextek International Lextek International Onix Full Text Indexing and Retrieval Toolkit

ixOptimizeIndex

NAME

ixOptimizeIndex -- Optimize the index for optimum speed and memory usage.

SYNOPSIS

void ixOptimizeIndex(OnixIndexManagerT IndexManager, size_t OptimizationLevel, StatusCodeT *Status);

ARGUMENTS

IndexManager -- An index manager which was created by a call to ixCreateIndexManager(). The index has to have an opened index associated with it (via a call to ixOpenIndex).

OptimizationLevel -- How much optimization is to be performed. This is a number between 1 and 256.

*Status -- A pointer to a StatusCodeT.  (This is where any errors will be reported.)

RETURNS

Nothing.

DESCRIPTION

ixOptimizeIndex is used to tune the index when the index has been added to multiple times. This is especially important if the index is being used in a dynamic fashion to any degree. (i.e., there have been numerous additions and/or deletions to the index.) The index can absorb quite a few additions before it is forced to be optimized by the internal limits set by Onix. However, long before this happens, you will find that it is advantageous for speed reasons to optimize the index.

Every time an indexing session is started and completed, a new partition is added to the index. This partition contains information about the new data added to the index and must be checked every time a search is conducted. ixOptimizeIndex processes and combines the data contained in the unoptimized partitions into larger partitions which can then be further optimized or used as is.

A fully optimized index contains only a single partition. (However, with some indexes especially extremely large distributed indexes, you can have a fully optimized index that has multiple partitions.) You can find out how many partitions are in the index, by calling ixGetNumberOfUnoptimizedInsertions, which returns the number of unoptimized partitions. This function internally returns the number of partions in the index minus one (-1).

The amount of time that it takes to optimize a partition is about twice the speed it takes to index raw text using a fast text extraction algorithm. (If your text parser is slower than this or you are using translation filters for various wordprocessing, database, and other formats, odds are you will find the optimization process much faster than indexing comparatively.)

During the optimization process, record numbers can change. All records which have been deleted will have all their information removed from the index. Part of this process is to shift all the record numbers down so that deleted records may have their record numbers reused. This helps prevent the problem of eternally growing record numbers. Since it is possible for record numbers to change during an optimization session, this can complicate matters if you are coordinating the index with an external database system or flat file. There are two approaches to handling this issue. These are as follows:

ixOptimizeIndex loads all the required portions of the index into memory if it isn't already loaded with a call to ixStartRetrievalSession(). It then processes all the information obtained from past indexing sessions to create a new optimized index partition. This is all done without making any modifications to the index itself. This means that while an optimization is taking place other queries etc. may be conducted by separate threads and processes. The new optimized data is not added to the index until a call to ixEndOptimization is made. During the optimization, the most recent material which has been added to the index is optimized first. For example, let's say that there are 10 partitions in the index and they are optimized with an optimization level of 5. In this case, the data which was added during the 5 most recent indexing sessions will be optimized and will be integrated into a new optimized partition (for a total of 6 partitions.)

Sometimes it makes sense to optimize the index even if only a single partition is present. What happens in this case is that any deleted records will have their information removed and their record numbers reused. (And other record numbers still in use will be shifted down.) This doesn't need to happen very often typically because deleted records do not slow down the query process significantly. You can check to see how many records are deleted in the index by a call to ixGetNumberOfRecordsDeleted. An offhand rule of thumb is to remove the deleted records from the index after 10% or more of the records have been deleted. However depending on your needs, you may optimize the index more or less often.

In the event that you are using a distributed index, ixOptimizeIndex will combine the various partitions created by past indexing sessions together (starting from the most recent) as they are optimized. If your partitions are very large, it is often advantageous to optimize them individually to remove deleted records then to optimize a number of partitions as a group to combine their information and optimize them as a whole. The reason for this is to shrink the amount of data in the partition(s) before combining them to keep the combined data from the partitions from overflowing the 2GB or 4GB file size limit set by many filesystems.

 

SEE ALSO

ixGetNumberOfUnoptimizedInsertions, ixIsRecordDeleted, ixEndOptimization, ixGetNumberOfRecordsDeleted,