Assume I have a set of strings S and a query string q. I want to know if any member of S is a substring of q. (For the purpose of this question substring includes equality, e.g. "foo" is a substring of "foo".) For example assume the function that does what I want is called
S = ["foo", "baz"] q = "foobar" assert anySubstring(S, q) # "foo" is a substring of "foobar" S = ["waldo", "baz"] assert not anySubstring(S, q)
Is there any easy-to-implement algorithm to do this with time complexity sublinear in
len(S)? It's ok if S has to be processed into some clever data structure first because I will be querying each S with a lot of q strings, so the amortized cost of this preprocessing might be reasonable.
EDIT: To clarify, I don't care el cual member of S is a substring of q, only whether at least one is. In other words, I , solamente care about a boolean answer.
preguntado el 09 de marzo de 12 a las 15:03
So if the length of S is way less then the sum of the lengths of the potential substrings your best option would be to build a árbol de sufijo from S and then do a search in it. This is linear with respect to the length of S plus the summar length of the candidate substrings. Of course there can not be an algorithm with better complexity as you have to pass through all the input at least. If the case is opposite i.e. the length of s is more then the summar length of the substrings your best option would be aho-corasick.
Espero que esto ayude.
Create a regular expression
.*(S1|S2|...|Sn).* and construct its minimal DFA.
Run your query string through the DFA.