# PSC.Search: a new implementation of Levenshtein distance algorithm

This article describes a simple implementation of the string search. It can be used for approximate string matching (for more information, see http://en.wikipedia.org/wiki/Fuzzy_string_searching).

Other algorithms for approximate string searching exist (e.g., Soundex), but those aren’t as easy to implement. The algorithm in this article is easy to implement, and can be used for tasks where approximate string searching is used in an easy way.

The algorithm used the Levenshtein-distance for determining how exact a string from a word list matches the word to be found. Information about the Levenshtein-distance can be found at http://en.wikipedia.org/wiki/Levenshtein_distance.

## C# implementation

```using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace PSC.Search.Demo
{
class Program
{
static void Main(string[] args)
{
string word = "Pure Source Code";
List<string> wordList = new List<string>
{
"Code Project",
"Pure SourceCOde",
"sourcecode",
"puresourcecode",
"Source Kode",
"Kode Project",
"Other Source"
};

List<wordrank> foundWords =
FuzzySearch.Search(word, wordList, 0.30);

foundWords.ForEach(i => Console.WriteLine(i.Word +
" (" + i.Rank.ToString() +
")"));
}
}
}

```

Output:

A basic approach is shown. Instead of the Levenshtein-distance, a more optimized algorithm could be used – but here, a quite simple implementation is given for clarity reasons.

```public static int LevenshteinDistance(string src, string dest)
{
int[,] d = new int[src.Length + 1, dest.Length + 1];
int i, j, cost;
char[] str1 = src.ToCharArray();
char[] str2 = dest.ToCharArray();

for (i = 0; i <= str1.Length; i++)
{
d[i, 0] = i;
}

for (j = 0; j <= str2.Length; j++)
{
d[0, j] = j;
}

for (i = 1; i <= str1.Length; i++)
{
for (j = 1; j <= str2.Length; j++)
{
if (str1[i - 1] == str2[j - 1])
cost = 0;
else
cost = 1;

d[i, j] =
Math.Min(
d[i - 1, j] + 1,                    // Deletion
Math.Min(
d[i, j - 1] + 1,                // Insertion
d[i - 1, j - 1] + cost));       // Substitution

if ((i > 1) && (j > 1) &&
(str1[i - 1] == str2[j - 2]) &&
(str1[i - 2] == str2[j - 1]))
{
d[i, j] = Math.Min(d[i, j], d[i - 2, j - 2] + cost);
}
}
}

return d[str1.Length, str2.Length];
}

```

In the search process, for each word in the wordlist, the Levenshtein-distance is computed, and with this distance, a score. This score represents how good the strings match. The input argument fuzzyness determines how much the strings can differ.

```public static List<wordrank> Search(string word, List wordList,
double fuzzyness)
{
List<wordrank> foundWords = new List<wordrank>();

foreach (string s in wordList)
{
// Calculate the Levenshtein-distance:
int levenshteinDistance = LevenshteinDistance(word, s);

// Length of the longer string:
int length = Math.Max(word.Length, s.Length);

// Calculate the score:
double score = 1.0 - (double)levenshteinDistance / length;

// Match?
if (score > fuzzyness)
foundWords.Add(new WordRank() { Rank = score, Word = s });
}

return foundWords;
}```

## SQL implementation

Also I’ve implemented a version for SQL.

```CREATE FUNCTION edit_distance(@s1 nvarchar(3999), @s2 nvarchar(3999))
RETURNS int
AS
BEGIN
DECLARE @s1_len int, @s2_len int
DECLARE @i int, @j int, @s1_char nchar, @c int, @c_temp int
DECLARE @cv0 varbinary(8000), @cv1 varbinary(8000)

SELECT
@s1_len = LEN(@s1),
@s2_len = LEN(@s2),
@cv1 = 0x0000,
@j = 1, @i = 1, @c = 0

WHILE @j <= @s2_len
SELECT @cv1 = @cv1 + CAST(@j AS binary(2)), @j = @j + 1

WHILE @i <= @s1_len
BEGIN
SELECT
@s1_char = SUBSTRING(@s1, @i, 1),
@c = @i,
@cv0 = CAST(@i AS binary(2)),
@j = 1

WHILE @j <= @s2_len
BEGIN
SET @c = @c + 1
SET @c_temp = CAST(SUBSTRING(@cv1, @j+@j-1, 2) AS int) +
CASE WHEN @s1_char = SUBSTRING(@s2, @j, 1) THEN 0 ELSE 1 END
IF @c > @c_temp SET @c = @c_temp
SET @c_temp = CAST(SUBSTRING(@cv1, @j+@j+1, 2) AS int)+1
IF @c > @c_temp SET @c = @c_temp
SELECT @cv0 = @cv0 + CAST(@c AS binary(2)), @j = @j + 1
END

SELECT @cv1 = @cv0, @i = @i + 1
END

RETURN @c
END```

Usage:

```select
dbo.edit_distance('Fuzzy String Match','fuzzy string match'),
dbo.edit_distance('fuzzy','fuzy'),
dbo.edit_distance('Fuzzy String Match','fuzy string match'),
dbo.edit_distance('levenshtein distance sql','levenshtein sql server'),
dbo.edit_distance('distance','server')```

Happy coding!

#### Share

This site uses Akismet to reduce spam. Learn how your comment data is processed.