Introduction
Finding the longest common substring (LCS) is one of the most interesting topics in computer algorithms. In total for a string with n characters, there are substrings. That is based on choosing the first and the end of array among (n+1) places in the string. For example, to get substrings of "abc
", you need to choose two places among the dashes in : _a_b_c_
which results in:
We wish to find a maximum length common subsequence of X and Y with length m and n in order. There are a variety of ways to find LCS in two strings X and Y:
- A brute-force solution is to have two pointers on each string and start to check if each character on each
string matches or not. If it matches continue if not, move pointer
one character a head and check the rest if they match or not. This is
the easiest but worst algorithm to find the LCS. The time complexity is
O(m^2*n^2) . This time is so big and for long strings, this solution is impractical.
- Find the substrings of x and check in y if it exists or not. We have
m^2 substrings in x and checking if the substring exist takes O(n) time (using Knuth-Morris-Pratt algorithm). In total, it takes O(n* m^2)
- If you know any other ways, please let me know! It is so much fun to find another way. :)
Recursive Solution
Based on the following fact that LCS(i, j) = LCS ( i - 1 , j - 1) + 1 , where X[i] = Y[j]
we can recursively find the length of the common string.
0 if i = 0 and j = 0
LCS(i, j) = LCS ( i - 1 , j - 1) + 1 if (X[i] = Y[j] )
Max ( LCS ( i , j - 1) , LCS ( i - 1 , j - 1)) else
By dynamic programming, we can implement the following code in a bottom up manner with time complexity of O(m*n) which is not comparable to other solutions.
void LongestCommenSubstring(std::string str1, std::string str2)
{
const int m = str1.length();
const int n = str2.length();
int lcs[m][n];
for(int i = 0; i < m; ++i)
{
for(int j = 0; j < n;++j)
{
lcs[i][j] = 0;
}
}
for(int i = 0; i < m; ++i)
{
for(int j = 0; j < n;++j)
{
if(str1[i] == str2[j])
{
if((i-1 >= 0 ) && ( j-1 >= 0))
{
if(str1[i-1] == str2[j-1])
{
lcs[i][j] = lcs[i-1][j-1]+1;
}
}
}
else
{
if((i-1 >= 0 ) &&( j-1 >= 0))
{
lcs[i][j] = std::max(lcs[i][j-1], lcs[i-1][j]);
}
}
}
}
std::cout << "longest commen substring" << lcs[m-1][n-1];}
}
Remember this code just outputs the length. In order to print the longest substring, you need to store it.
Coding Tip
Always initialize your class members, otherwise you'll get garbage in your output !! Which most of the time is not easy to catch while debugging.
More Challenge?
Using this method, you can find the LCS of more strings just by little modification in the code. :)