Class RegexTokenizer
java.lang.Object
org.apache.commons.text.similarity.RegexTokenizer
- All Implemented Interfaces:
Function<CharSequence, CharSequence[]>, CharSequenceTokenizer<CharSequence>, Tokenizer<CharSequence, CharSequence>
A simple word
Tokenizer that utilizes a regex to find words. It applies a regex (\w)+ over the input text to extract words from a given
character sequence.
Instances of this class are immutable and are safe for use by multiple concurrent threads.
- Since:
- 1.0
-
Field Summary
FieldsModifier and TypeFieldDescription(package private) static final RegexTokenizerThe singleton instance.private static final PatternThe whitespace pattern. -
Constructor Summary
Constructors -
Method Summary
-
Field Details
-
PATTERN
The whitespace pattern. -
INSTANCE
The singleton instance.
-
-
Constructor Details
-
RegexTokenizer
RegexTokenizer()
-
-
Method Details
-
apply
- Specified by:
applyin interfaceFunction<CharSequence, CharSequence[]>- Throws:
IllegalArgumentException- if the input text is blank.
-