Friday, December 18, 2009

Java find longest matching part or substring of string

Finding largest common sub string from an array of strings is often very useful. Recently I had this requirement, where I was trying to exclude a list of strings by using regex patterns. For this I had to find the greatest matching substring among the strings, so that I put that substring as a regex, and save a lot of effort.
Unfortunately I didn't find any ready made code on the net, that does the same thing. So I wrote my own.

I tried to make the logic as optimum as possible.
If anyone has a better algorithm in mind, please share it with me.

I have used a custom string length comparator, so optimise the string traversing. Readers can write their own implementation of the comparator, or copy my custom string length comparator class from here.



 import java.util.Arrays;
 import java.util.HashMap;
 import java.util.Map;
 import java.util.Set;
 import java.util.TreeSet;

 public class SubString {

  public static void main(String[] args) {
    String [] strs = {
     "this is a very big first string which should be processed at last",
     "this is a very small string" ,
     "this is a very boring string nothing special",
     "this is a very simple yet bigger string, second large"
    };

   //Arrays.sort(strs, new CustomStringLengthComparator());
   String sub = strs[0];
   Map subStrings = new HashMap();
   int j=strs[0].length();

   String comparingString =strs[1];
   while (j>=0) {
    for(int i=0;i< j;i++) {
     sub = strs[0].substring(i, j);
     if (comparingString.contains(sub)) {
      subStrings.put(sub.length(),sub);
     }
    }
    j--;
   }
   boolean flag =false;
   if(!subStrings.isEmpty()) {
    Set set = subStrings.keySet();
    TreeSet treeSet = new TreeSet(set);     
    String matchedString = subStrings.get(treeSet.last());   
    for(int k=2;k
     if (strs[k].contains(matchedString)) {
      flag=true;     
     }else {
      flag=false;     
      break;     
     }   
    }     
    if (flag) { 
     System.out.println("Matched : " + subStrings.get(treeSet.last()));
    }else {       
     System.out.println("No Match");     
    }   
   } 
  } 
 }

2 comments:

kdIT said...

Thanks for posting this code, but I think it has some "typos", it was easy to figure out though. :)

hirak said...

thanks and sorry