TADM2E 4.45
Let's use the example given, for words say <i>A, B, C</i>, in the problem but not get too tied to the specific values. It helps to think about sorting the search string words by their indexes in the document:
<pre> A C B A A C * * B B * * * * C 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 </pre>
In the code example this is just a call to sort but since each word index list is sorted we could use <code>MergeSort</code> which means this portion of the code is <math>\mathcal{O}(n)</math> for <math>n</math> indexes.
Once we have the list we add push each element into a hash with a separate key for each word and update our estimate of the snippet <i>span</i> that includes all the words.
Here's python code for one simple implementation.
<pre> import sys
def smallest_snippet(*args):
args -- K lists of word positions for K words >>> smallest_snippet([1,10], [2,20], [3,30]) [1, 3] >>> smallest_snippet([1,9,27], [6,10,19], [8,12,14]) [8, 10] >>> smallest_snippet([1,4,11,27], [3,6,10,19], [5,8,12,14]) [3, 5] >>> smallest_snippet([1,4,5], [3,9,10], [2,6,15]) [1, 3]
master = [] for i in range(len(args)): master.extend(map(lambda j: (i,j), args[i])) # ith word, jth index master.sort(lambda x,y: cmp(x[1], y[1])) # TODO: mergesort tops = {} # { word i: index j } best = [master[0][1], master[-1][1]] minspan = best[-1] - best[0] + 1
# update span after each new index tuple for (i,j) in master: tops[i] = j if len(tops) == len(args): curr = [min(tops.values()), max(tops.values())] if curr[1] - curr[0] < minspan: minspan = curr[1] - curr[0] best = curr return best
if __name__ == "__main__":
import doctest doctest.testmod()
sys.exit()
</pre>