# Edit Distance

## Levenshtein distance

This is a well known problem.

Illustrating this problem using wikipedia's example

Making a table like Interleaving String

```
+---+---+---+---+---+---+---+---+
| | * | k | i | t | t | e | n |
+---+---+---+---+---+---+---+---+
| * | | | | | | | |
+---+---+---+---+---+---+---+---+
| s | | | | | | | |
+---+---+---+---+---+---+---+---+
| i | | | | | | | |
+---+---+---+---+---+---+---+---+
| t | | | | | | | |
+---+---+---+---+---+---+---+---+
| t | | | | | | | |
+---+---+---+---+---+---+---+---+
| i | | | | | | | |
+---+---+---+---+---+---+---+---+
| n | | | | | | | |
+---+---+---+---+---+---+---+---+
| g | | | | | | | |
+---+---+---+---+---+---+---+---+
```

`*`

here is representing an empty string

each gird is the `edit distance`

of `string[0..x]`

and `string[0..y]`

First row and column girds are easy to be filled:

`edit distance`

of an empty string to an empty string is`0`

`edit distance`

of an empty string to any string is the length of that string

So the table should be filled like:

```
+---+---+---+---+---+---+---+---+
| | * | k | i | t | t | e | n |
+---+---+---+---+---+---+---+---+
| * | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+---+
| s | 1 | | | | | | |
+---+---+---+---+---+---+---+---+
| i | 2 | | | | | | |
+---+---+---+---+---+---+---+---+
| t | 3 | | | | | | |
+---+---+---+---+---+---+---+---+
| t | 4 | | | | | | |
+---+---+---+---+---+---+---+---+
| i | 5 | | | | | | |
+---+---+---+---+---+---+---+---+
| n | 6 | | | | | | |
+---+---+---+---+---+---+---+---+
| g | 7 | | | | | | |
+---+---+---+---+---+---+---+---+
```

### what's about `k`

and `s`

in `k`

's view

- Insert: cant transform to
`s`

- Delete: cant transform to
`s`

- Replace: replace
`k`

to`s`

use`1`

`edit distance`

in `s`

's view

- Insert: cant transform to
`k`

- Delete: cant transform to
`k`

- Replace: replace
`s`

to`k`

use`1`

`edit distance`

so the number in the gird `k`

and `s`

should be `1`

### what's about `ki`

and `s`

(in `ki`

's view)

- Insert: cant transform to
`s`

- Delete: delete
`i`

then, transform to the problem`s`

and`k`

we solved before. 1 +`edit distance`

of`s`

and`k`

. - Replace and Delete: replace
`k`

to`s`

use`1`

`edit distance`

, then delete`k`

use`1`

`edit distance`

.`1 + 1 = 2`

`Delete`

and `Replace then Delete`

are the same thing at last, `Replace`

is just the `1`

`edit distance`

costed by `edit distance`

of `s`

and `k`

.

In this case, we can transform the problem by deleting one char: 1 + `edit distance`

of `string deleted one`

and `k`

We can fill up the second column and row now:

```
+---+---+---+---+---+---+---+---+
| | * | k | i | t | t | e | n |
+---+---+---+---+---+---+---+---+
| * | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+---+
| s | 1 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+---+
| i | 2 | 2 | | | | | |
+---+---+---+---+---+---+---+---+
| t | 3 | 3 | | | | | |
+---+---+---+---+---+---+---+---+
| t | 4 | 4 | | | | | |
+---+---+---+---+---+---+---+---+
| i | 5 | 5 | | | | | |
+---+---+---+---+---+---+---+---+
| n | 6 | 6 | | | | | |
+---+---+---+---+---+---+---+---+
| g | 7 | 7 | | | | | |
+---+---+---+---+---+---+---+---+
```

### What if `si`

and `ki`

Obviously, `si`

and `ki`

equals to the `edit distance`

of `s`

and `k`

, because `i`

are both in two strings.

```
+---+---+---+---+---+---+---+---+
| | * | k | i | t | t | e | n |
+---+---+---+---+---+---+---+---+
| * | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+---+
| s | 1 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+---+
| i | 2 | 2 | 1 | | | | |
+---+---+---+---+---+---+---+---+
| t | 3 | 3 | | | | | |
+---+---+---+---+---+---+---+---+
| t | 4 | 4 | | | | | |
+---+---+---+---+---+---+---+---+
| i | 5 | 5 | | | | | |
+---+---+---+---+---+---+---+---+
| n | 6 | 6 | | | | | |
+---+---+---+---+---+---+---+---+
| g | 7 | 7 | | | | | |
+---+---+---+---+---+---+---+---+
```

then we can fill up the table using technology we used before.

### More common case

`P[word1][word2]`

is the the table,

so `P[i][j]`

have 4 choices:

`P[i - 1][j] + 1`

: by deleting`word1[i]`

from`word1`

and convert to previous problem.`P[i][j - 1] + 1`

: by deleting`word2[j]`

from`word2`

and convert to previous problem.`P[i - 1][j - 1]`

: only if`word1[i] == word2[j]`

`P[i - 1][j - 1] + 1`

: only if`word1[i] != word2[j]`

, then replace one char`word1[i]`

to`word2[j]`

, then this problem is converted to`P[i - 1][j - 1]`

.

Dont worry about `Insertion`

, it is just the oppsite to `Deletion`

. Just different view.

So the work remaing is easy, find the MIN of those four is ok.

### Source code *Read on Github*

```
1 public class Solution {
2 public int minDistance(String word1, String word2) {
3
4 // http://en.wikipedia.org/wiki/Levenshtein_distance
5
6 char[] w1 = word1.toCharArray();
7 char[] w2 = word2.toCharArray();
8
9
10
11 int[][] P = new int[w1.length + 1][w2.length + 1];
12
13 int i, j;
14
15 for(i = 1; i <= w1.length; i++){
16 P[i][0] = i;
17 }
18
19 for(j = 1; j <= w2.length; j++){
20 P[0][j] = j;
21 }
22
23 for(i = 1; i <= w1.length; i++){
24 for(j = 1; j <= w2.length; j++){
25
26 P[i][j] = Math.min(P[i - 1][j], P[i][j - 1]) + 1;
27
28 if(w1[i - 1] == w2[j - 1]){
29 P[i][j] = Math.min(P[i - 1][j - 1], P[i][j]);
30 }else {
31 P[i][j] = Math.min(P[i - 1][j - 1] + 1, P[i][j]);
32 }
33
34 }
35 }
36
37 return P[w1.length][w2.length];
38 }
39 }
```