Difference between revisions of "TADM2E 4.45"
From Algorithm Wiki
m (added python source formatting) |
m (fix typo) |
||
Line 8: | Line 8: | ||
In the code example this is just a call to sort but since each word index list is sorted we could use <code>MergeSort</code> which means this portion of the code is <math>\mathcal{O}(n)</math> for <math>n</math> indexes. | In the code example this is just a call to sort but since each word index list is sorted we could use <code>MergeSort</code> which means this portion of the code is <math>\mathcal{O}(n)</math> for <math>n</math> indexes. | ||
− | Once we have the list we | + | Once we have the list we push each element into a hash with a separate key for each word and update our estimate of the snippet <i>span</i> that includes all the words. |
Here's python code for one simple implementation. | Here's python code for one simple implementation. |
Revision as of 12:19, 5 November 2015
Let's use the example given, for words say A, B, C, in the problem but not get too tied to the specific values. It helps to think about sorting the search string words by their indexes in the document:
A C B A A C * * B B * * * * C 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
In the code example this is just a call to sort but since each word index list is sorted we could use MergeSort
which means this portion of the code is $ \mathcal{O}(n) $ for $ n $ indexes.
Once we have the list we push each element into a hash with a separate key for each word and update our estimate of the snippet span that includes all the words.
Here's python code for one simple implementation.
import sys def smallest_snippet(*args): ''' args -- K lists of word positions for K words >>> smallest_snippet([1,10], [2,20], [3,30]) [1, 3] >>> smallest_snippet([1,9,27], [6,10,19], [8,12,14]) [8, 10] >>> smallest_snippet([1,4,11,27], [3,6,10,19], [5,8,12,14]) [3, 5] >>> smallest_snippet([1,4,5], [3,9,10], [2,6,15]) [1, 3] ''' master = [] for i in range(len(args)): master.extend(map(lambda j: (i,j), args[i])) # ith word, jth index master.sort(lambda x,y: cmp(x[1], y[1])) # TODO: mergesort tops = {} # { word i: index j } best = [master[0][1], master[-1][1]] minspan = best[-1] - best[0] + 1 # update span after each new index tuple for (i,j) in master: tops[i] = j if len(tops) == len(args): curr = [min(tops.values()), max(tops.values())] if curr[1] - curr[0] < minspan: minspan = curr[1] - curr[0] best = curr return best if __name__ == "__main__": import doctest doctest.testmod() sys.exit()