Class LevenshteinDetailedDistance
- All Implemented Interfaces:
BiFunction<CharSequence, CharSequence, LevenshteinResults>, EditDistance<LevenshteinResults>, ObjectSimilarityScore<CharSequence, LevenshteinResults>, SimilarityScore<LevenshteinResults>
This is the number of changes needed to change one sequence into another, where each change is a single character modification (deletion, insertion or substitution).
- Since:
- 1.0
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final LevenshteinDetailedDistanceThe singleton instance.private final IntegerThreshold. -
Constructor Summary
ConstructorsConstructorDescriptionDeprecated.LevenshteinDetailedDistance(Integer threshold) Constructs a new instance for a threshold. -
Method Summary
Modifier and TypeMethodDescriptionapply(CharSequence left, CharSequence right) Computes the Levenshtein distance between two Strings.apply(SimilarityInput<E> left, SimilarityInput<E> right) Computes the Levenshtein distance between two Strings.private static <E> LevenshteinResultsfindDetailedResults(SimilarityInput<E> left, SimilarityInput<E> right, int[][] matrix, boolean swapped) Finds count for each of the three [insert, delete, substitute] operations needed.static LevenshteinDetailedDistanceGets the default instance.Gets the distance threshold.private static <E> LevenshteinResultslimitedCompare(SimilarityInput<E> left, SimilarityInput<E> right, int threshold) Finds the Levenshtein distance between two CharSequences if it's less than or equal to a given threshold.private static <E> LevenshteinResultsunlimitedCompare(SimilarityInput<E> left, SimilarityInput<E> right) Finds the Levenshtein distance between two Strings.Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface BiFunction
andThen
-
Field Details
-
INSTANCE
The singleton instance. -
threshold
Threshold.
-
-
Constructor Details
-
LevenshteinDetailedDistance
Deprecated.UsegetDefaultInstance().Constructs a new instance that uses a version of the algorithm that does not use a threshold parameter.- See Also:
-
LevenshteinDetailedDistance
Constructs a new instance for a threshold.If the threshold is not null, distance calculations will be limited to a maximum length.
If the threshold is null, the unlimited version of the algorithm will be used.
- Parameters:
threshold- If this is null then distances calculations will not be limited. This may not be negative.
-
-
Method Details
-
findDetailedResults
private static <E> LevenshteinResults findDetailedResults(SimilarityInput<E> left, SimilarityInput<E> right, int[][] matrix, boolean swapped) Finds count for each of the three [insert, delete, substitute] operations needed. This is based on the matrix formed based on the two character sequence.- Type Parameters:
E- The type of similarity score unit.- Parameters:
left- character sequence which need to be converted from.right- character sequence which need to be converted to.matrix- two dimensional array containing.swapped- tells whether the value for left character sequence and right character sequence were swapped to save memory.- Returns:
- result object containing the count of insert, delete and substitute and total count needed.
-
getDefaultInstance
Gets the default instance.- Returns:
- The default instace
-
limitedCompare
private static <E> LevenshteinResults limitedCompare(SimilarityInput<E> left, SimilarityInput<E> right, int threshold) Finds the Levenshtein distance between two CharSequences if it's less than or equal to a given threshold.This implementation follows from Algorithms on Strings, Trees and Sequences by Dan Gusfield and Chas Emerick's implementation of the Levenshtein distance algorithm from http://www.merriampark.com/ld.htm
limitedCompare(null, *, *) = Throws
IllegalArgumentExceptionlimitedCompare(*, null, *) = ThrowsIllegalArgumentExceptionlimitedCompare(*, *, -1) = ThrowsIllegalArgumentExceptionlimitedCompare("","", 0) = 0 limitedCompare("aaapppp", "", 8) = 7 limitedCompare("aaapppp", "", 7) = 7 limitedCompare("aaapppp", "", 6)) = -1 limitedCompare("elephant", "hippo", 7) = 7 limitedCompare("elephant", "hippo", 6) = -1 limitedCompare("hippo", "elephant", 7) = 7 limitedCompare("hippo", "elephant", 6) = -1- Type Parameters:
E- The type of similarity score unit.- Parameters:
left- the first CharSequence, must not be null.right- the second CharSequence, must not be null.threshold- the target threshold, must not be negative.- Returns:
- result distance, or -1.
-
unlimitedCompare
private static <E> LevenshteinResults unlimitedCompare(SimilarityInput<E> left, SimilarityInput<E> right) Finds the Levenshtein distance between two Strings.A higher score indicates a greater distance.
The previous implementation of the Levenshtein distance algorithm was from http://www.merriampark.com/ld.htm
Chas Emerick has written an implementation in Java, which avoids an OutOfMemoryError which can occur when my Java implementation is used with very large strings.
This implementation of the Levenshtein distance algorithm is from http://www.merriampark.com/ldjava.htmunlimitedCompare(null, *) = Throws
IllegalArgumentExceptionunlimitedCompare(*, null) = ThrowsIllegalArgumentExceptionunlimitedCompare("","") = 0 unlimitedCompare("","a") = 1 unlimitedCompare("aaapppp", "") = 7 unlimitedCompare("frog", "fog") = 1 unlimitedCompare("fly", "ant") = 3 unlimitedCompare("elephant", "hippo") = 7 unlimitedCompare("hippo", "elephant") = 7 unlimitedCompare("hippo", "zzzzzzzz") = 8 unlimitedCompare("hello", "hallo") = 1- Type Parameters:
E- The type of similarity score unit.- Parameters:
left- the first CharSequence, must not be null.right- the second CharSequence, must not be null.- Returns:
- result distance, or -1.
- Throws:
IllegalArgumentException- if either CharSequence input isnull.
-
apply
Computes the Levenshtein distance between two Strings.A higher score indicates a greater distance.
The previous implementation of the Levenshtein distance algorithm was from http://www.merriampark.com/ld.htm
Chas Emerick has written an implementation in Java, which avoids an OutOfMemoryError which can occur when my Java implementation is used with very large strings.
This implementation of the Levenshtein distance algorithm is from http://www.merriampark.com/ldjava.htmdistance.apply(null, *) = Throws
IllegalArgumentExceptiondistance.apply(*, null) = ThrowsIllegalArgumentExceptiondistance.apply("","") = 0 distance.apply("","a") = 1 distance.apply("aaapppp", "") = 7 distance.apply("frog", "fog") = 1 distance.apply("fly", "ant") = 3 distance.apply("elephant", "hippo") = 7 distance.apply("hippo", "elephant") = 7 distance.apply("hippo", "zzzzzzzz") = 8 distance.apply("hello", "hallo") = 1- Specified by:
applyin interfaceBiFunction<CharSequence, CharSequence, LevenshteinResults>- Specified by:
applyin interfaceObjectSimilarityScore<CharSequence, LevenshteinResults>- Specified by:
applyin interfaceSimilarityScore<LevenshteinResults>- Parameters:
left- the first input, must not be null.right- the second input, must not be null.- Returns:
- result distance, or -1.
- Throws:
IllegalArgumentException- if either String inputnull.
-
apply
Computes the Levenshtein distance between two Strings.A higher score indicates a greater distance.
The previous implementation of the Levenshtein distance algorithm was from http://www.merriampark.com/ld.htm
Chas Emerick has written an implementation in Java, which avoids an OutOfMemoryError which can occur when my Java implementation is used with very large strings.
This implementation of the Levenshtein distance algorithm is from http://www.merriampark.com/ldjava.htmdistance.apply(null, *) = Throws
IllegalArgumentExceptiondistance.apply(*, null) = ThrowsIllegalArgumentExceptiondistance.apply("","") = 0 distance.apply("","a") = 1 distance.apply("aaapppp", "") = 7 distance.apply("frog", "fog") = 1 distance.apply("fly", "ant") = 3 distance.apply("elephant", "hippo") = 7 distance.apply("hippo", "elephant") = 7 distance.apply("hippo", "zzzzzzzz") = 8 distance.apply("hello", "hallo") = 1- Type Parameters:
E- The type of similarity score unit.- Parameters:
left- the first input, must not be null.right- the second input, must not be null.- Returns:
- result distance, or -1.
- Throws:
IllegalArgumentException- if either String inputnull.- Since:
- 1.13.0
-
getThreshold
-
getDefaultInstance().