The Main Manual Page Dynamic API Documentation CD-ROM API Documentation About Onix Types About Onix Errors Onix's Web Site at Lextek International Lextek International Onix Full Text Indexing and Retrieval Toolkit

ixCreateRobotsTxtParser

NAME

ixCreateRobotsTxtParser -- create a parser for robots.txt

SYNOPSIS

RobotsTxtParserT ixCreateRobotsTxtParser(StatusCodeT *Status)

ARGUMENTS

Status -- If an error occurs in in the creation of the robots.txt parser, it will be reported here.

RETURNS

RobotsTxtParserT -- A parser for robots.txt which also gives permissions on URLs.

DESCRIPTION

Robots.txt is a standard file which webmasters use to instruct the webcrawlers (web "search engines") which files, and directories to exclude.  A full description of the standard may be found at http://info.webcrawler.com/mak/projects/robots/norobots.html.  By providing the parser with the robots.txt file from a given site, you can then test URLs from that site against the parser to see if you have permission to download and index them.  The robots.txt file for a given site is standard.  It is sitename/robots.txt so for example, the robots.txt file for Webcrawler may be found at: http://www.webcrawler.com/robots.txt.

SEE ALSO

ixDeleteRobotsTxtParser, ixSetRobotName, ixParseRobotsTxt, ixRobotsPermissionGranted, ixRobotsTxtLength