|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectjava.lang.Thread
org.jscience.net.MediaCrawler
public class MediaCrawler
MediaCrawler is a single thread that searches the web for files that are of a given type.
Spider| Nested Class Summary | |
|---|---|
static interface |
MediaCrawler.Handler
used to handle the media files found during the search of the MediaCrawler |
| Nested classes/interfaces inherited from class java.lang.Thread |
|---|
java.lang.Thread.State, java.lang.Thread.UncaughtExceptionHandler |
| Field Summary |
|---|
| Fields inherited from class java.lang.Thread |
|---|
MAX_PRIORITY, MIN_PRIORITY, NORM_PRIORITY |
| Constructor Summary | |
|---|---|
MediaCrawler(java.net.URL rootURL,
int depth,
java.lang.String mediaExtension,
boolean currentSiteOnly,
MediaCrawler.Handler handler,
java.lang.String[] pattern)
|
|
MediaCrawler(java.net.URL rootURL,
int depth,
java.lang.String mediaExtension,
boolean currentSiteOnly,
java.lang.String[] pattern)
|
|
| Method Summary | |
|---|---|
void |
addHandler(MediaCrawler.Handler handler)
|
boolean |
followLinks(java.net.URL url,
java.net.URL referer,
int depth,
java.util.List<java.net.URL> resultURLList,
java.util.List<java.net.URL> closedURLList,
java.util.List<Spider.URLWrapper> searchURLWrapperList)
followLinks() determines whether the given URL is to be searched for its links to be examined further in the next level. |
URLCache[] |
getFilesFound()
|
boolean |
matchesCriteria(java.net.URL url,
java.net.URL referer,
int depth,
java.util.List<java.net.URL> resultURLList,
java.util.List<java.net.URL> closedURLList)
This method decides whether either the URL itself or its content qualifies for what this CrawlerSetting searches for; as this function is also called on every URL encountered, it is also the place for any custom parsing this CrawlerSetting wants to do. |
void |
run()
|
| Methods inherited from class java.lang.Thread |
|---|
activeCount, checkAccess, countStackFrames, currentThread, destroy, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, stop, suspend, toString, yield |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
|---|
public MediaCrawler(java.net.URL rootURL,
int depth,
java.lang.String mediaExtension,
boolean currentSiteOnly,
java.lang.String[] pattern)
public MediaCrawler(java.net.URL rootURL,
int depth,
java.lang.String mediaExtension,
boolean currentSiteOnly,
MediaCrawler.Handler handler,
java.lang.String[] pattern)
| Method Detail |
|---|
public void addHandler(MediaCrawler.Handler handler)
public void run()
run in interface java.lang.Runnablerun in class java.lang.Threadpublic URLCache[] getFilesFound()
public boolean followLinks(java.net.URL url,
java.net.URL referer,
int depth,
java.util.List<java.net.URL> resultURLList,
java.util.List<java.net.URL> closedURLList,
java.util.List<Spider.URLWrapper> searchURLWrapperList)
CrawlerSetting
followLinks in interface CrawlerSettingurl - the URL that is to be examined for its linksreferer - url's referer URLdepth - distance from the original root URL where the search beganresultURLList - List of URLs that have already been found to match this CrawlerSetting's criteriaclosedURLList - List of URLs that have already been found not to match the CrawlerSetting's criteriasearchURLWrapperList - List of Spider.URLWrapper objects already identified to be examined in the next levelSpider.URLWrapper
public boolean matchesCriteria(java.net.URL url,
java.net.URL referer,
int depth,
java.util.List<java.net.URL> resultURLList,
java.util.List<java.net.URL> closedURLList)
CrawlerSetting
matchesCriteria in interface CrawlerSettingurl - the URL in question to satisfy the criteriareferer - url's referer URLdepth - link distance from the original root URL where the search beganresultURLList - List of URLs that have already been found to match this CrawlerSetting's criteriaclosedURLList - List of URLs that have already been found not to match the CrawlerSetting's criteria
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||