org.jscience.linguistics.search
Class ShiftOr
java.lang.Object
org.jscience.linguistics.search.StringSearch
org.jscience.linguistics.search.ShiftOr
- Direct Known Subclasses:
- ShiftOrClasses, ShiftOrWildcards
public class ShiftOr
- extends StringSearch
An implementation of the Shift-Or algorithm by Ricardo Baeza-Yates and
Gaston Gonnet as outlined in "A New Approach to Text Searching" (appeared
in Proceedings of the 12th International Conference on Research and
Development in Datum Retrieval). The Shift-Or algorithm is a
bit-parallel algorithm.
The Shift-Or algorithm is not the fastest and by itself slower than
String's indexOf method. It's usefulness comes from it's
ability to support character classes and searching with errors at the same
speed.
It's searchChars(char[],int,int,char[],Object) method is extremely
slow in the Sun Java Virtual Machines. If possible, the searchBytes(byte[],int,int,byte[],Object) methods should be preferred.
Because the main loop is also used by the ShiftOrWildcards class, the implementation
cannot skip forward until the first character matches (as in the original
algorithm).
This implementation currently limited to at most 31 characters because Java
has no unsigned int type. An implementation that used
long has proved to take twice the amount of time.
Preprocessing: O(n + ∑) time
Searching : O(mn / log n) (worst case and average)
- See Also:
- ftp://sunsite.dcc.uchile.cl/pub/users/rbaeza/papers/CACM92.ps.gz
,
http://citeseer.nj.nec.com/50265.html
,
ShiftOrWildcards,
ShiftOrClasses,
ShiftOrMismatches
|
Constructor Summary |
ShiftOr()
Constructor for ShiftOr. |
|
Method Summary |
java.lang.Object |
processBytes(byte[] pattern)
Pre-processing of the pattern. |
java.lang.Object |
processChars(char[] pattern)
Pre-processing of the pattern. |
int |
searchBytes(byte[] text,
int textStart,
int textEnd,
byte[] pattern,
java.lang.Object processed)
Returns the position in the text at which the pattern was found. |
int |
searchChars(char[] text,
int textStart,
int textEnd,
char[] pattern,
java.lang.Object processed)
Returns the index of the pattern in the text using the pre-processed
Object. |
| Methods inherited from class org.jscience.linguistics.search.StringSearch |
createCharIntMap, createCharIntMap, equals, hashCode, index, processString, searchBytes, searchBytes, searchBytes, searchBytes, searchBytes, searchChars, searchChars, searchChars, searchChars, searchChars, searchString, searchString, searchString, searchString, searchString, searchString, toString, toStringBuffer, usesNative, usesReflection |
| Methods inherited from class java.lang.Object |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
ShiftOr
public ShiftOr()
- Constructor for ShiftOr. Note that it is not required to create multiple
instances.
processBytes
public java.lang.Object processBytes(byte[] pattern)
- Pre-processing of the pattern. The pattern may not exceed 31
bytes in length. If it does, only it's first 31 bytes are
processed which might lead to unexpected results. Returns an
int array.
- Specified by:
processBytes in class StringSearch
- Parameters:
pattern - the byte array containing the pattern, may
not be null
- Returns:
- an Object
- See Also:
StringSearch.processBytes(byte[])
processChars
public java.lang.Object processChars(char[] pattern)
- Pre-processing of the pattern. The pattern may not exceed 31
characters in length. If it does, only it's first 31 characters
are processed which might lead to unexpected results. Returns a
CharIntMap.
- Specified by:
processChars in class StringSearch
- Parameters:
pattern - the pattern
- Returns:
- an Object
- See Also:
StringSearch.processChars(char[])
searchBytes
public int searchBytes(byte[] text,
int textStart,
int textEnd,
byte[] pattern,
java.lang.Object processed)
- Description copied from class:
StringSearch
- Returns the position in the text at which the pattern was found. Returns
-1 if the pattern was not found.
- Specified by:
searchBytes in class StringSearch
- Parameters:
text - text the byte array containing the text, may
not be nulltextStart - at which position in the text the comparing should
starttextEnd - at which position in the text comparing should stoppattern - the pattern to search for, may not be nullprocessed - an Object as returned from StringSearch.processBytes(byte[]), may not be null
- Returns:
- the position in the text or -1 if the pattern was not found
- See Also:
StringSearch.searchBytes(byte[],
int,int,byte[],java.lang.Object)
searchChars
public int searchChars(char[] text,
int textStart,
int textEnd,
char[] pattern,
java.lang.Object processed)
- Description copied from class:
StringSearch
- Returns the index of the pattern in the text using the pre-processed
Object. Returns -1 if the pattern was not found.
- Specified by:
searchChars in class StringSearch
- Parameters:
text - the String containing the text, may not be nulltextStart - at which position in the text the comparing should
starttextEnd - at which position in the text comparing should stoppattern - the pattern to search for, may not be nullprocessed - an Object as returned from StringSearch.processChars(char[]) or StringSearch.processString(String), may
not be null
- Returns:
- the position in the text or -1 if the pattern was not found
- See Also:
StringSearch.searchChars(char[],
int,int,char[],Object)