Edit Distance
Levenshtein distance
This is a well known problem.
Illustrating this problem using wikipedia's example
Making a table like Interleaving String
+---+---+---+---+---+---+---+---+
| | * | k | i | t | t | e | n |
+---+---+---+---+---+---+---+---+
| * | | | | | | | |
+---+---+---+---+---+---+---+---+
| s | | | | | | | |
+---+---+---+---+---+---+---+---+
| i | | | | | | | |
+---+---+---+---+---+---+---+---+
| t | | | | | | | |
+---+---+---+---+---+---+---+---+
| t | | | | | | | |
+---+---+---+---+---+---+---+---+
| i | | | | | | | |
+---+---+---+---+---+---+---+---+
| n | | | | | | | |
+---+---+---+---+---+---+---+---+
| g | | | | | | | |
+---+---+---+---+---+---+---+---+
*
here is representing an empty string
each gird is the edit distance
of string[0..x]
and string[0..y]
First row and column girds are easy to be filled:
edit distance
of an empty string to an empty string is0
edit distance
of an empty string to any string is the length of that string
So the table should be filled like:
+---+---+---+---+---+---+---+---+
| | * | k | i | t | t | e | n |
+---+---+---+---+---+---+---+---+
| * | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+---+
| s | 1 | | | | | | |
+---+---+---+---+---+---+---+---+
| i | 2 | | | | | | |
+---+---+---+---+---+---+---+---+
| t | 3 | | | | | | |
+---+---+---+---+---+---+---+---+
| t | 4 | | | | | | |
+---+---+---+---+---+---+---+---+
| i | 5 | | | | | | |
+---+---+---+---+---+---+---+---+
| n | 6 | | | | | | |
+---+---+---+---+---+---+---+---+
| g | 7 | | | | | | |
+---+---+---+---+---+---+---+---+
what's about k
and s
in k
's view
- Insert: cant transform to
s
- Delete: cant transform to
s
- Replace: replace
k
tos
use1
edit distance
in s
's view
- Insert: cant transform to
k
- Delete: cant transform to
k
- Replace: replace
s
tok
use1
edit distance
so the number in the gird k
and s
should be 1
what's about ki
and s
(in ki
's view)
- Insert: cant transform to
s
- Delete: delete
i
then, transform to the problems
andk
we solved before. 1 +edit distance
ofs
andk
. - Replace and Delete: replace
k
tos
use1
edit distance
, then deletek
use1
edit distance
.1 + 1 = 2
Delete
and Replace then Delete
are the same thing at last, Replace
is just the 1
edit distance
costed by edit distance
of s
and k
.
In this case, we can transform the problem by deleting one char: 1 + edit distance
of string deleted one
and k
We can fill up the second column and row now:
+---+---+---+---+---+---+---+---+
| | * | k | i | t | t | e | n |
+---+---+---+---+---+---+---+---+
| * | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+---+
| s | 1 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+---+
| i | 2 | 2 | | | | | |
+---+---+---+---+---+---+---+---+
| t | 3 | 3 | | | | | |
+---+---+---+---+---+---+---+---+
| t | 4 | 4 | | | | | |
+---+---+---+---+---+---+---+---+
| i | 5 | 5 | | | | | |
+---+---+---+---+---+---+---+---+
| n | 6 | 6 | | | | | |
+---+---+---+---+---+---+---+---+
| g | 7 | 7 | | | | | |
+---+---+---+---+---+---+---+---+
What if si
and ki
Obviously, si
and ki
equals to the edit distance
of s
and k
, because i
are both in two strings.
+---+---+---+---+---+---+---+---+
| | * | k | i | t | t | e | n |
+---+---+---+---+---+---+---+---+
| * | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+---+
| s | 1 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+---+
| i | 2 | 2 | 1 | | | | |
+---+---+---+---+---+---+---+---+
| t | 3 | 3 | | | | | |
+---+---+---+---+---+---+---+---+
| t | 4 | 4 | | | | | |
+---+---+---+---+---+---+---+---+
| i | 5 | 5 | | | | | |
+---+---+---+---+---+---+---+---+
| n | 6 | 6 | | | | | |
+---+---+---+---+---+---+---+---+
| g | 7 | 7 | | | | | |
+---+---+---+---+---+---+---+---+
then we can fill up the table using technology we used before.
More common case
P[word1][word2]
is the the table,
so P[i][j]
have 4 choices:
P[i - 1][j] + 1
: by deletingword1[i]
fromword1
and convert to previous problem.P[i][j - 1] + 1
: by deletingword2[j]
fromword2
and convert to previous problem.P[i - 1][j - 1]
: only ifword1[i] == word2[j]
P[i - 1][j - 1] + 1
: only ifword1[i] != word2[j]
, then replace one charword1[i]
toword2[j]
, then this problem is converted toP[i - 1][j - 1]
.
Dont worry about Insertion
, it is just the oppsite to Deletion
. Just different view.
So the work remaing is easy, find the MIN of those four is ok.
Source code Read on Github
1 public class Solution {
2 public int minDistance(String word1, String word2) {
3
4 // http://en.wikipedia.org/wiki/Levenshtein_distance
5
6 char[] w1 = word1.toCharArray();
7 char[] w2 = word2.toCharArray();
8
9
10
11 int[][] P = new int[w1.length + 1][w2.length + 1];
12
13 int i, j;
14
15 for(i = 1; i <= w1.length; i++){
16 P[i][0] = i;
17 }
18
19 for(j = 1; j <= w2.length; j++){
20 P[0][j] = j;
21 }
22
23 for(i = 1; i <= w1.length; i++){
24 for(j = 1; j <= w2.length; j++){
25
26 P[i][j] = Math.min(P[i - 1][j], P[i][j - 1]) + 1;
27
28 if(w1[i - 1] == w2[j - 1]){
29 P[i][j] = Math.min(P[i - 1][j - 1], P[i][j]);
30 }else {
31 P[i][j] = Math.min(P[i - 1][j - 1] + 1, P[i][j]);
32 }
33
34 }
35 }
36
37 return P[w1.length][w2.length];
38 }
39 }