Full database schema used by DataparkSearch is defined in appropriate sql-scipts for database creation located under create subdirectory.
Table 9-1. server
table schema
rec_id | Unique record identificator. |
enabled | A flag to enable/disable record for indexer. |
url | URL or pattern. |
tag | Tag value. |
category | Categories table rec_id. |
command | =S - this record is a =F - this record is a |
ordre | Sorting key, it define records order for server table loading. |
parent | If not null, this record is added automaticaly by indexer and
url field contain a server name accepted on record pointed by this filed value. |
weight | This record weight for PopRank calculation. |
pop_weight | One link weight from pages of this server. Calculated automatically. Manually change will have no effect. |
Other server's parameters store in srvinfo
table. Possible values for several parameters is
given in table below.
Table 9-2. Several server's parameters values in srvinfo
table
sname value | Possible sval values. |
---|---|
Alias | Alias used for url . |
Period | Reindexing period in seconds. |
DeleteOlder | How much time to hold URLs before deleting them from the database. |
RemoteCharset | Default charset value. |
DefaultLang | Default language value. |
Request.Authorization | For basic authorization. |
Request.Proxy | Proxy server to access documents from this resource. |
Request.Proxy-Authorization | Proxy server authorization. |
MaxHops | Maximum depth of way in "mouse" clicks from start url. |
Index | A flag to enable/disable documents indexing. |
Follow | =0, "page" =1, "path" =2, "site" =3, "world" |
Robots | A flag to enable/disable robots.txt file using. |
DetectClones | A flag to enable/disable "clones" detection. |
MaxNetErrors | Maximum network errors for this server. |
NetDelayTime | Indexing delay time if a network error is occurred. |
ReadTimeout | Network timeout value. |
match_type | =0, DPS_MATCH_FULL - full coincidence. =1, DPS_MATCH_BEGIN - pattern is a URL prefix. =2, DPS_MATCH_SUBSTR - pattern is a URL substring. =3, DPS_MATCH_END - pattern is a URL suffix. =4, DPS_MATCH_REGEX - pattern is a regular expression. =5, DPS_MATCH_WILD - pattern is a wildcards pattern (* and ? wildcards may be used). =6, DPS_MATCH_SUBNET - < not yet supported >. |
case_sense | =1, - case insensitive comparison. =0, - case sensitive comparison. |
nomatch | =1, - URLs not match this record is accepted. =0, - URL match this record is accepted. |
Method | Specify a document action for this command. =Allow, - all corresponding documents will be indexed and scanned for new links. =Disallow, - all corresponding documents will be ignored and deleted from database. =HrefOnly, - all corresponding documents will be only scanned for new links (not indexed). =CheckOnly, - all corresponding documents will be requested by HTTP HEAD request, not HTTP GET, i.e. inly brief info about documents (size, last modified, content type) will be fetched. =Skip, - all corresponding documents will be skipped while indexing. =CheckMP3, - all corresponding documents will be checked for MP3 tags along if its Content-Type is equal to audio/mpeg. =CheckMP3Only, - is equal to CheckMP3, but if MP3 tag is not present, processing on Content-Type will not be taken. =TagIf, - all documents will be maked by tag specified. =CategoryIf, - all documents will be maked by category specified. =IndexIf, - all documents will be indexed, if the value of section specified match the pattern given. =NoIndexIf, - all documents will be ignored and deleted from database, if the value of section specified match the pattern given. |
Section | Section name used in pattern matching for IndexIf and NotIndexIf methods. |