|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
public interface CrawlerSetting
CrawlerSetting defines callback functions that determine the behavior in which a web search algorithm goes through the net and calculates its results. A CrawlerSetting can be used with a org.jscience.net.Spider.
Spider,
Spider.crawlWeb(CrawlerSetting,int,Logger)| Method Summary | |
|---|---|
boolean |
followLinks(java.net.URL url,
java.net.URL referer,
int depth,
java.util.List<java.net.URL> resultURLList,
java.util.List<java.net.URL> closedURLList,
java.util.List<Spider.URLWrapper> searchURLWrapperList)
followLinks() determines whether the given URL is to be searched for its links to be examined further in the next level. |
boolean |
matchesCriteria(java.net.URL url,
java.net.URL referer,
int depth,
java.util.List<java.net.URL> resultURLList,
java.util.List<java.net.URL> closedURLList)
This method decides whether either the URL itself or its content qualifies for what this CrawlerSetting searches for; as this function is also called on every URL encountered, it is also the place for any custom parsing this CrawlerSetting wants to do. |
| Method Detail |
|---|
boolean matchesCriteria(java.net.URL url,
java.net.URL referer,
int depth,
java.util.List<java.net.URL> resultURLList,
java.util.List<java.net.URL> closedURLList)
url - the URL in question to satisfy the criteriareferer - url's referer URLdepth - link distance from the original root URL where the search beganresultURLList - List of URLs that have already been found to match this CrawlerSetting's criteriaclosedURLList - List of URLs that have already been found not to match the CrawlerSetting's criteria
boolean followLinks(java.net.URL url,
java.net.URL referer,
int depth,
java.util.List<java.net.URL> resultURLList,
java.util.List<java.net.URL> closedURLList,
java.util.List<Spider.URLWrapper> searchURLWrapperList)
url - the URL that is to be examined for its linksreferer - url's referer URLdepth - distance from the original root URL where the search beganresultURLList - List of URLs that have already been found to match this CrawlerSetting's criteriaclosedURLList - List of URLs that have already been found not to match the CrawlerSetting's criteriasearchURLWrapperList - List of Spider.URLWrapper objects already identified to be examined in the next levelSpider.URLWrapper
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||