Difference between revisions of "TADM2E 4.45"

From Algorithm Wiki
Jump to: navigation, search
(Recovering wiki)
 
(Recovering wiki)
Line 1: Line 1:
Let's use the example given, for words say <i>A, B, C</i>,  in the problem but not get too tied to the specific values. It helps to think about sorting the search string words by their indexes in the document:
+
Let's use the example given, for words say <i>A, B, C</i>,  in the problem but not get too tied to the specific values. It helps to think about sorting the search string words by their indexes in the document:
  
&lt;pre&gt;
+
<pre>
 
A  C  B  A  A  C  *  *  B  B  *  *  *  *  C
 
A  C  B  A  A  C  *  *  B  B  *  *  *  *  C
 
1  2  3  4  5  6  7  8  9  10 11 12 13 14 15
 
1  2  3  4  5  6  7  8  9  10 11 12 13 14 15
&lt;/pre&gt;
+
</pre>
  
In the code example this is just a call to sort but since each word index list is sorted we could use &lt;code&gt;MergeSort&lt;/code&gt; which means this portion of the code is &lt;math&gt;\mathcal{O}(n)&lt;/math&gt; for &lt;math&gt;n&lt;/math&gt; indexes.
+
In the code example this is just a call to sort but since each word index list is sorted we could use <code>MergeSort</code> which means this portion of the code is <math>\mathcal{O}(n)</math> for <math>n</math> indexes.
  
Once we have the list we add push each element into a hash with a separate key for each word and update our estimate of the snippet &lt;i&gt;span&lt;/i&gt; that includes all the words.
+
Once we have the list we add push each element into a hash with a separate key for each word and update our estimate of the snippet <i>span</i> that includes all the words.
  
 
Here's python code for one simple implementation.
 
Here's python code for one simple implementation.
  
&lt;pre&gt;
+
<pre>
 
import sys
 
import sys
  
Line 19: Line 19:
 
     args -- K lists of word positions for K words
 
     args -- K lists of word positions for K words
 
      
 
      
     &gt;&gt;&gt; smallest_snippet([1,10], [2,20], [3,30])
+
     >>> smallest_snippet([1,10], [2,20], [3,30])
 
     [1, 3]
 
     [1, 3]
     &gt;&gt;&gt; smallest_snippet([1,9,27], [6,10,19], [8,12,14])
+
     >>> smallest_snippet([1,9,27], [6,10,19], [8,12,14])
 
     [8, 10]
 
     [8, 10]
     &gt;&gt;&gt; smallest_snippet([1,4,11,27], [3,6,10,19], [5,8,12,14])
+
     >>> smallest_snippet([1,4,11,27], [3,6,10,19], [5,8,12,14])
 
     [3, 5]
 
     [3, 5]
     &gt;&gt;&gt; smallest_snippet([1,4,5], [3,9,10], [2,6,15])
+
     >>> smallest_snippet([1,4,5], [3,9,10], [2,6,15])
 
     [1, 3]
 
     [1, 3]
 
     '''
 
     '''
Line 42: Line 42:
 
         if len(tops) == len(args):
 
         if len(tops) == len(args):
 
             curr = [min(tops.values()), max(tops.values())]
 
             curr = [min(tops.values()), max(tops.values())]
             if curr[1] - curr[0] &lt; minspan:
+
             if curr[1] - curr[0] < minspan:
 
                 minspan = curr[1] - curr[0]
 
                 minspan = curr[1] - curr[0]
 
                 best = curr
 
                 best = curr
Line 48: Line 48:
 
     return best
 
     return best
  
if __name__ == &quot;__main__&quot;:
+
if __name__ == "__main__":
 
     import doctest
 
     import doctest
 
     doctest.testmod()
 
     doctest.testmod()
  
 
     sys.exit()
 
     sys.exit()
&lt;/pre&gt;
+
</pre>

Revision as of 18:24, 11 September 2014

Let's use the example given, for words say A, B, C, in the problem but not get too tied to the specific values. It helps to think about sorting the search string words by their indexes in the document:

A  C  B  A  A  C  *  *  B  B  *  *  *  *  C
1  2  3  4  5  6  7  8  9  10 11 12 13 14 15

In the code example this is just a call to sort but since each word index list is sorted we could use MergeSort which means this portion of the code is $ \mathcal{O}(n) $ for $ n $ indexes.

Once we have the list we add push each element into a hash with a separate key for each word and update our estimate of the snippet span that includes all the words.

Here's python code for one simple implementation.

import sys

def smallest_snippet(*args):
    '''
    args -- K lists of word positions for K words
    
    >>> smallest_snippet([1,10], [2,20], [3,30])
    [1, 3]
    >>> smallest_snippet([1,9,27], [6,10,19], [8,12,14])
    [8, 10]
    >>> smallest_snippet([1,4,11,27], [3,6,10,19], [5,8,12,14])
    [3, 5]
    >>> smallest_snippet([1,4,5], [3,9,10], [2,6,15])
    [1, 3]
    '''

    master = []
    for i in range(len(args)):
        master.extend(map(lambda j: (i,j), args[i])) # ith word, jth index
    master.sort(lambda x,y: cmp(x[1], y[1]))         # TODO: mergesort
    tops = {}                                        # { word i: index j }
    best = [master[0][1], master[-1][1]]
    minspan = best[-1] - best[0] + 1

    # update span after each new index tuple
    for (i,j) in master:
        tops[i] = j
        if len(tops) == len(args):
            curr = [min(tops.values()), max(tops.values())]
            if curr[1] - curr[0] < minspan:
                minspan = curr[1] - curr[0]
                best = curr
        
    return best

if __name__ == "__main__":
    import doctest
    doctest.testmod()

    sys.exit()