5.3. DataparkSearch performance issues

The cache mode is the fastest DataparkSearch's storage mode. Use it if you need maximal search speed.

If your /var directory isn't changed since the indexing has been finished, you may disable file locking using "ColdVar yes" command placed in search.htm (or in searchd.conf, if searchd is used). This allow you to save some time on file locking.

Using UseCRC32URLId yes command (see Section 3.10.12>) allow to speed up indexing, but small number of collisions is possible, especially on large database.

5.3.1. searchd usage recommendation

If you plan use ispell data, synonym or stopword lists, it's recommended setup the searchd daemon for speed-up searches (See Section 5.4>). searchd daemon preload all these data and lists and holds them in memory. This reduce average search query execution time.

Also, searchd can preload url info data (20 bytes per URL indexed) and cache mode limits (4 or 8 bytes per URL depend on limit type). This allow reduce average search time.

5.3.2. Search results caching

Use "Cache yes" command in your search.htm template (or in searchd.conf file, if searchd is used) to enable search results cache. That allows to reduce significantly the answer time for repeating queries.

If you use search results caching, please note you need to empty var/cache directory after each indexing/reindexing.

5.3.3. Memory based filesystem (mfs) usage recommendation

If you use cache storage mode and you have enough RAM on your PC, you may place /usr/local/dpsearch/var directory on memory based filesystem (mfs). This allow speedup both indexing and searching.

If you haven't enough RAM to fit /usr/local/dpsearch/var, you may place on memory filesystem any of /usr/local/dpsearch/var/tree, /usr/local/dpsearch/var/url or /usr/local/dpsearch/var/store directories as well.

5.3.4. URLInfoSQL command

For dbmode cache, you may use URLInfoSQL no command to disable storing URL Info into SQL database. But using this command, you'll be unable to use limits by language and by Content-Type.

5.3.5. SRVInfoSQLcommand

With the SRVInfoSQL no command you can switch off storing auxiliary data into "srvinfo" SQL-table. In this case this table can not be used to load configuration with LoadServerTable command (See Section 3.8.1>).

5.3.6. MarkForIndex command

By default, DataparkSearch are marking all URLs selected for indexing as indexed for 4 hours. This prevent possible simultaneous indexing of the same URL by different indexer instance running. But for huge installation this feature can take some time for processing. You may switch off this markage using "MarkForIndex no" in your indexer.conf file.

5.3.7. CheckInsertSQL command

By default, DataparkSearch trying to insert data into SQL database regardless it's already present there. On some systems this raise some error loggings. To avoid such errors, you may enable additional checks, is the inserting data new, by specifying CheckInsertSQL yes command in your indexer.conf.

5.3.8. MySQL performance

MySQL users may declare DataparkSearch tables with DELAY_KEY_WRITE=1 option. This will make the updating of indexes faster, as these are not logged to disk until the file is closed. DELAY_KEY_WRITE excludes updating indexes on disk at all.

With it indexes are processed only in memory and written onto disk as last resort, command FLUSH TABLES or mysqld shutdown. This can take even minutes and impatient user can kill -9 mysql server and break index files with this. Another downside is that you should run myisamchk on these tables before you start mysqld to ensure that they are okay if something killed mysqld in the middle.

Because of it we didn't include this table option into default tables structure. However as the key information can always be generated from the data, you should not lose anything by using DELAY_KEY_WRITE. So, use this option for your own risk.

5.3.9. Asynchronous resolver library

Using c-ares, an asynchronous resolver library (dns/c-ares in FreeBSD ports collection), allow to perform DNS queries without blocking for every indexing thread. Please note, this also increase the number of concurrent queries to your DNS server.