# Sorting

### Applications of Sorting: Numbers

4.1. The Grinch is given the job of partitioning $\displaystyle{ 2n }$ players into two teams of $\displaystyle{ n }$ players each. Each player has a numerical rating that measures how good he or she is at the game. The Grinch seeks to divide the players as unfairly as possible, so as to create the biggest possible talent imbalance between the teams. Show how the Grinch can do the job in $\displaystyle{ O(n log n) }$ time.

4.2. For each of the following problems, give an algorithm that finds the desired numbers within the given amount of time. To keep your answers brief, feel free to use algorithms from the book as subroutines. For the example, $\displaystyle{ S = {6, 13, 19, 3, 8} }$, 19 - 3 maximizes the difference, while 8 - 6 minimizes the difference.
(a) Let $\displaystyle{ S }$ be an unsorted array of $\displaystyle{ n }$ integers. Give an algorithm that finds the pair $\displaystyle{ x, y \in S }$ that maximizes $\displaystyle{ |x-y| }$. Your algorithm must run in $\displaystyle{ O(n) }$ worst-case time.
(b) Let $\displaystyle{ S }$ be a sorted array of $\displaystyle{ n }$ integers. Give an algorithm that finds the pair $\displaystyle{ x, y \in S }$ that maximizes $\displaystyle{ |x - y| }$. Your algorithm must run in $\displaystyle{ O(1) }$ worst-case time.
(c) Let $\displaystyle{ S }$ be an unsorted array of $\displaystyle{ n }$ integers. Give an algorithm that finds the pair $\displaystyle{ x, y \in S }$ that minimizes $\displaystyle{ |x - y| }$, for $\displaystyle{ x \neq y }$. Your algorithm must run in $\displaystyle{ O(n log n) }$ worst-case time.
(d) Let $\displaystyle{ S }$ be a sorted array of $\displaystyle{ n }$ integers. Give an algorithm that finds the pair $\displaystyle{ x, y \in S }$ that minimizes $\displaystyle{ |x - y| }$, for $\displaystyle{ x \neq y }$. Your algorithm must run in $\displaystyle{ O(n) }$ worst-case time.

4.3. Take a list of $\displaystyle{ 2n }$ real numbers as input. Design an $\displaystyle{ O(n log n) }$ algorithm that partitions the numbers into $\displaystyle{ n }$ pairs, with the property that the partition minimizes the maximum sum of a pair. For example, say we are given the numbers (1,3,5,9). The possible partitions are ((1,3),(5,9)), ((1,5),(3,9)), and ((1,9),(3,5)). The pair sums for these partitions are (4,14), (6,12), and (10,8). Thus, the third partition has 10 as its maximum sum, which is the minimum over the three partitions.

4.4. Assume that we are given $\displaystyle{ n }$ pairs of items as input, where the first item is a number and the second item is one of three colors (red, blue, or yellow). Further assume that the items are sorted by number. Give an $\displaystyle{ O(n) }$ algorithm to sort the items by color (all reds before all blues before all yellows) such that the numbers for identical colors stay sorted.
For example: (1,blue), (3,red), (4,blue), (6,yellow), (9,red) should become (3,red), (9,red), (1,blue), (4,blue), (6,yellow).

4.5. The mode of a bag of numbers is the number that occurs most frequently in the set. The set {4, 6, 2, 4, 3, 1} has a mode of 4. Give an efficient and correct algorithm to compute the mode of a bag of $\displaystyle{ n }$ numbers.

4.6. Given two sets $\displaystyle{ S_1 }$ and $\displaystyle{ S_2 }$ (each of size $\displaystyle{ n }$), and a number $\displaystyle{ x }$, describe an $\displaystyle{ O(n log n) }$ algorithm for finding whether there exists a pair of elements, one from $\displaystyle{ S_1 }$ and one from $\displaystyle{ S_2 }$, that add up to $\displaystyle{ x }$. (For partial credit, give a $\displaystyle{ \Theta(n^2) }$ algorithm for this problem.)

4.7. Give an efficient algorithm to take the array of citation counts (each count is a non-negative integer) of a researcher’s papers, and compute the researcher’s $\displaystyle{ h }$-index. By definition, a scientist has index $\displaystyle{ h }$ if $\displaystyle{ h }$ of his or her $\displaystyle{ n }$ papers have been cited at least $\displaystyle{ h }$ times, while the other $\displaystyle{ n-h }$ papers each have no more than $\displaystyle{ h }$ citations.

4.8. Outline a reasonable method of solving each of the following problems. Give the order of the worst-case complexity of your methods.
(a) You are given a pile of thousands of telephone bills and thousands of checks sent in to pay the bills. Find out who did not pay.
(b) You are given a printed list containing the title, author, call number, and publisher of all the books in a school library and another list of thirty publishers. Find out how many of the books in the library were published by each company.
(c) You are given all the book checkout cards used in the campus library during the past year, each of which contains the name of the person who took out the book. Determine how many distinct people checked out at least one book.

4.9. Given a set $\displaystyle{ S }$ of $\displaystyle{ n }$ integers and an integer $\displaystyle{ T }$, give an $\displaystyle{ O(n^{k-1}log n) }$ algorithm to test whether $\displaystyle{ k }$ of the integers in $\displaystyle{ S }$ add up to $\displaystyle{ T }$.

4.10. We are given a set of $\displaystyle{ S }$ containing $\displaystyle{ n }$ real numbers and a real number $\displaystyle{ x }$, and seek efficient algorithms to determine whether two elements of $\displaystyle{ S }$ exist whose sum is exactly $\displaystyle{ x }$.
(a) Assume that $\displaystyle{ S }$ is unsorted. Give an $\displaystyle{ O(n log n) }$ algorithm for the problem.
(b) Assume that $\displaystyle{ S }$ is sorted. Give an $\displaystyle{ O(n) }$ algorithm for the problem.

4.11. Design an $\displaystyle{ O(n) }$ algorithm that, given a list of $\displaystyle{ n }$ elements, finds all the elements that appear more than $\displaystyle{ n/2 }$ times in the list. Then, design an $\displaystyle{ O(n) }$ algorithm that, given a list of $\displaystyle{ n }$ elements, finds all the elements that appear more than $\displaystyle{ n/4 }$ times.

### Applications of Sorting: Intervals and Sets

4.12. Give an efficient algorithm to compute the union of sets $\displaystyle{ A }$ and $\displaystyle{ B }$, where $\displaystyle{ n = max(|A|, |B|) }$. The output should be an array of distinct elements that form the union of the sets.
(a) Assume that $\displaystyle{ A }$ and $\displaystyle{ B }$ are unsorted arrays. Give an $\displaystyle{ O(n log n) }$ algorithm for the problem.
(b) Assume that $\displaystyle{ A }$ and $\displaystyle{ B }$ are sorted arrays. Give an $\displaystyle{ O(n) }$ algorithm for the problem.

4.13. A camera at the door tracks the entry time $\displaystyle{ a_i }$ and exit time $\displaystyle{ b_i }$ (assume $\displaystyle{ b_i \gt a_i }$) for each of $\displaystyle{ n }$ persons $\displaystyle{ p_i }$ attending a party. Give an $\displaystyle{ O(n log n) }$ algorithm that analyzes this data to determine the time when the most people were simultaneously present at the party. You may assume that all entry and exit times are distinct (no ties).

4.14. Given a list $\displaystyle{ I }$ of $\displaystyle{ n }$ intervals, specified as $\displaystyle{ (x_i, y_i) }$ pairs, return a list where the overlapping intervals are merged. For $\displaystyle{ I = {(1, 3),(2, 6),(8, 10),(7, 18)} }$ the output should be $\displaystyle{ {(1, 6),(7, 18)} }$. Your algorithm should run in worst-case $\displaystyle{ O(n log n) }$ time complexity.

4.15. You are given a set $\displaystyle{ S }$ of $\displaystyle{ n }$ intervals on a line, with the $\displaystyle{ i }$th interval described by its left and right endpoints $\displaystyle{ (l_i, r_i) }$. Give an $\displaystyle{ O(n log n) }$ algorithm to identify a point $\displaystyle{ p }$ on the line that is in the largest number of intervals.
As an example, for $\displaystyle{ S = {(10, 40),(20, 60),(50, 90),(15, 70)} }$ no point exists in all four intervals, but $\displaystyle{ p = 50 }$ is an example of a point in three intervals. You can assume an endpoint counts as being in its interval.

4.16. You are given a set $\displaystyle{ S }$ of $\displaystyle{ n }$ segments on the line, where segment $\displaystyle{ S_i }$ ranges from $\displaystyle{ l_i }$ to $\displaystyle{ r_i }$. Give an efficient algorithm to select the fewest number of segments whose union completely covers the interval from 0 to $\displaystyle{ m }$.

### Heaps

4.17. Devise an algorithm for finding the $\displaystyle{ k }$ smallest elements of an unsorted set of $\displaystyle{ n }$ integers in $\displaystyle{ O(n + k log n) }$.

4.18. Give an $\displaystyle{ O(n log k) }$-time algorithm that merges $\displaystyle{ k }$ sorted lists with a total of $\displaystyle{ n }$ elements into one sorted list. (Hint: use a heap to speed up the obvious $\displaystyle{ O(know) }$-time algorithm).

4.19. You wish to store a set of $\displaystyle{ n }$ numbers in either a max-heap or a sorted array. For each application below, state which data structure is better, or if it does not matter. Explain your answers.
(a) Find the maximum element quickly.
(b) Delete an element quickly.
(c) Form the structure quickly.
(d) Find the minimum element quickly.

4.20. (a) Give an efficient algorithm to find the second-largest key among $\displaystyle{ n }$ keys. You can do better than $\displaystyle{ 2n - 3 }$ comparisons.
(b) Then, give an efficient algorithm to find the third-largest key among $\displaystyle{ n }$ keys. How many key comparisons does your algorithm do in the worst case? Must your algorithm determine which key is largest and second-largest in the process?

### Quicksort

4.21. Use the partitioning idea of quicksort to give an algorithm that finds the median element of an array of $\displaystyle{ n }$ integers in expected $\displaystyle{ O(n) }$ time. (Hint: must you look at both sides of the partition?)

4.22. The median of a set of $\displaystyle{ n }$ values is the $\displaystyle{ [n/2] }$th smallest value.
(a) Suppose quicksort always pivoted on the median of the current sub-array. How many comparisons would quicksort make then in the worst case?
(b) Suppose quicksort always pivoted on the $\displaystyle{ [n/3] }$th smallest value of the current sub-array. How many comparisons would be made then in the worst case?

4.23. Suppose an array $\displaystyle{ A }$ consists of $\displaystyle{ n }$ elements, each of which is red, white, or blue'. We seek to sort the elements so that all the reds come before all the whites, which come before all the blues. The only operations permitted on the keys are:
Examine($\displaystyle{ A,i }$) – report the color of the $\displaystyle{ i }$th element of $\displaystyle{ A }$.
Swap($\displaystyle{ A,i,j }$) – swap the $\displaystyle{ i }$th element of $\displaystyle{ A }$ with the $\displaystyle{ j }$th element.
Find a correct and efficient algorithm for red–white–blue sorting. There is a linear-time solution.

4.24. Give an efficient algorithm to rearrange an array of $\displaystyle{ n }$ keys so that all the negative keys precede all the non-negative keys. Your algorithm must be in-place, meaning you cannot allocate another array to temporarily hold the items. How fast is your algorithm?

4.25. Consider a given pair of different elements in an input array to be sorted, say $\displaystyle{ z_i }$ and $\displaystyle{ z_j }$ . What is the most number of times $\displaystyle{ z_i }$ and $\displaystyle{ z_j }$ might be compared with each other during an execution of quicksort?

4.26. Define the recursion depth of quicksort as the maximum number of successive recursive calls it makes before hitting the base case. What are the minimum and maximum possible recursion depths for randomized quicksort?

4.27. Suppose you are given a permutation $\displaystyle{ p }$ of the integers 1 to $\displaystyle{ n }$, and seek to sort them to be in increasing order $\displaystyle{ [1, . . . , n] }$. The only operation at your disposal is reverse$\displaystyle{ (p,i,j) }$, which reverses the elements of a subsequence $\displaystyle{ p_i, . . . , p_j }$ in the permutation. For the permutation [1, 4, 3, 2, 5] one reversal (of the second through fourth elements) suffices to sort.
• Show that it is possible to sort any permutation using $\displaystyle{ O(n) }$ reversals.
• Now suppose that the cost of reverse$\displaystyle{ (p,i,j) }$ is equal to its length, the number of elements in the range, $\displaystyle{ |j - i| + 1 }$. Design an algorithm that sorts $\displaystyle{ p }$ in $\displaystyle{ O(n log^2 n) }$ cost. Analyze the running time and cost of your algorithm and prove correctness.

### Mergesort

4.28. Consider the following modification to merge sort: divide the input array into thirds (rather than halves), recursively sort each third, and finally combine the results using a three-way merge subroutine. What is the worst-case running time of this modified merge sort?

4.29. Suppose you are given $\displaystyle{ k }$ sorted arrays, each with $\displaystyle{ n }$ elements, and you want to combine them into a single sorted array of $\displaystyle{ kn }$ elements. One approach is to use the merge subroutine repeatedly, merging the first two arrays, then merging the result with the third array, then with the fourth array, and so on until you merge in the $\displaystyle{ k }$th and final input array. What is the running time?

4.30. Consider again the problem of merging $\displaystyle{ k }$ sorted length-$\displaystyle{ n }$ arrays into a single sorted length-$\displaystyle{ kn }$ array. Consider the algorithm that first divides the $\displaystyle{ k }$ arrays into $\displaystyle{ k/2 }$ pairs of arrays, and uses the merge subroutine to combine each pair, resulting in $\displaystyle{ k/2 }$ sorted length-$\displaystyle{ 2n }$ arrays. The algorithm repeats this step until there is only one length-$\displaystyle{ know }$ sorted array. What is the running time as a function of $\displaystyle{ n }$ and $\displaystyle{ k }$?

### Other Sorting Alogrithims

4.31. Stable sorting algorithms leave equal-key items in the same relative order as in the original permutation. Explain what must be done to ensure that mergesort is a stable sorting algorithm.

4.32. Wiggle sort: Given an unsorted array $\displaystyle{ A }$, reorder it such that $\displaystyle{ A[0] \lt A[1] \gt A[2] \lt A[3] . . . . }$ For example, one possible answer for input [3, 1, 4, 2, 6, 5] is [1, 3, 2, 5, 4, 6]. Can you do it in $\displaystyle{ O(n) }$ time using only $\displaystyle{ O(1) }$ space?

4.33. Show that $\displaystyle{ n }$ positive integers in the range 1 to $\displaystyle{ k }$ can be sorted in $\displaystyle{ O(n log k) }$ time. The interesting case is when $\displaystyle{ k \ll n }$.

4.34. Consider a sequence $\displaystyle{ S }$ of $\displaystyle{ n }$ integers with many duplications, such that the number of distinct integers in $\displaystyle{ S }$ is $\displaystyle{ O(log n) }$. Give an $\displaystyle{ O(n log log n) }$ worst-case time algorithm to sort such sequences.

4.35. Let $\displaystyle{ A[1..n] }$ be an array such that the first $\displaystyle{ n - \sqrt n }$ elements are already sorted (though we know nothing about the remaining elements). Give an algorithm that sorts $\displaystyle{ A }$ in substantially better than $\displaystyle{ n log n }$ steps.

4.36. Assume that the array $\displaystyle{ A[1..n] }$ only has numbers from $\displaystyle{ {1, . . . , n^2} }$ but that at most $\displaystyle{ log log n }$ of these numbers ever appear. Devise an algorithm that sorts $\displaystyle{ A }$ in substantially less than $\displaystyle{ O(n log n) }$.

4.37. Consider the problem of sorting a sequence of $\displaystyle{ n }$ 0’s and 1’s using comparisons. For each comparison of two values $\displaystyle{ x }$ and $\displaystyle{ y }$, the algorithm learns which of $\displaystyle{ x \lt y, x = y }$, or $\displaystyle{ x \gt y }$ holds.
(a) Give an algorithm to sort in $\displaystyle{ n - 1 }$ comparisons in the worst case. Show that your algorithm is optimal.
(b) Give an algorithm to sort in $\displaystyle{ 2n/3 }$ comparisons in the average case (assuming each of the $\displaystyle{ n }$ inputs is 0 or 1 with equal probability). Show that your algorithm is optimal.

4.38. Let $\displaystyle{ P }$ be a simple, but not necessarily convex, $\displaystyle{ n }$-sided polygon and $\displaystyle{ q }$ an arbitrary point not necessarily in $\displaystyle{ P }$. Design an efficient algorithm to find a line segment originating from $\displaystyle{ q }$ that intersects the maximum number of edges of $\displaystyle{ P }$.
In other words, if standing at point $\displaystyle{ q }$, in what direction should you aim a gun so the bullet will go through the largest number of walls. A bullet through a vertex of $\displaystyle{ P }$ gets credit for only one wall. An $\displaystyle{ O(n log n) }$ algorithm is possible.

### Lower Bounds

4.39. In one of my research papers [Ski88], I discovered a comparison-based sorting algorithm that runs in $\displaystyle{ O(n log(\sqrt n)) }$. Given the existence of an $\displaystyle{ \Omega(n log n) }$ lower bound for sorting, how can this be possible?

4.40. Mr. B. C. Dull claims to have developed a new data structure for priority queues that supports the operations Insert, Maximum, and Extract-Max—all in $\displaystyle{ O(1) }$ worst-case time. Prove that he is mistaken. (Hint: the argument does not involve a lot of gory details—just think about what this would imply about the $\displaystyle{ \Omega (n log n) }$ lower bound for sorting.)

### Searching

4.41. A company database consists of 10,000 sorted names, 40% of whom are known as good customers and who together account for 60% of the accesses to the database. There are two data structure options to consider for representing the database:
• Put all the names in a single array and use binary search.
• Put the good customers in one array and the rest of them in a second array. Only if we do not find the query name on a binary search of the first array do we do a binary search of the second array.
Demonstrate which option gives better expected performance. Does this change if linear search on an unsorted array is used instead of binary search for both options?

4.42. A Ramanujan number can be written two different ways as the sum of two cubes—meaning there exist distinct positive integers $\displaystyle{ a, b, c }$, and $\displaystyle{ d }$ such that $\displaystyle{ a^3 + b^3 = c^3 + d^3 }$. For example, 1729 is a Ramanujan number because $\displaystyle{ 1729 = 1^3 + 12^3 = 9^3 + 10^3 }$.
(a) Give an efficient algorithm to test whether a given single integer $\displaystyle{ n }$ is a Ramanujan number, with an analysis of the algorithm’s complexity.
(b) Now give an efficient algorithm to generate all the Ramanujan numbers between 1 and $\displaystyle{ n }$, with an analysis of its complexity.

### Implementaion Challenges

4.43. Consider an n×n array $\displaystyle{ A }$ containing integer elements (positive, negative, and zero). Assume that the elements in each row of $\displaystyle{ A }$ are in strictly increasing order, and the elements of each column of $\displaystyle{ A }$ are in strictly decreasing order. (Hence there cannot be two zeros in the same row or the same column.) Describe an efficient algorithm that counts the number of occurrences of the element 0 in $\displaystyle{ A }$. Analyze its running time.

4.44. Implement versions of several different sorting algorithms, such as selection sort, insertion sort, heapsort, mergesort, and quicksort. Conduct experiments to assess the relative performance of these algorithms in a simple application that reads a large text file and reports exactly one instance of each word that appears within it. This application can be efficiently implemented by sorting all the words that occur in the text and then passing through the sorted sequence to identify one instance of each distinct word. Write a brief report with your conclusions.

4.45. Implement an external sort, which uses intermediate files to sort files bigger than main memory. Mergesort is a good algorithm to base such an implementation on. Test your program both on files with small records and on files with large records.

4.46. Design and implement a parallel sorting algorithm that distributes data across several processors. An appropriate variation of mergesort is a likely candidate. Measure the speedup of this algorithm as the number of processors increases. Then compare the execution time to that of a purely sequential mergesort implementation. What are your experiences?

### Interview Problems

4.47. If you are given a million integers to sort, what algorithm would you use to sort them? How much time and memory would that consume?

4.49. Implement an algorithm that takes an input array and returns only the unique elements in it.

4.50. You have a computer with only 4 GB of main memory. How do you use it to sort a large file of 500 GB that is on disk?

4.51. Design a stack that supports push, pop, and retrieving the minimum element in constant time.

4.52. Given a search string of three words, find the smallest snippet of the document that contains all three of the search words—that is, the snippet with the smallest number of words in it. You are given the index positions where these words occur in the document, such as word1: $\displaystyle{ (1, 4, 5) }$, word2: $\displaystyle{ (3, 9, 10) }$, and word3: $\displaystyle{ (2, 6, 15) }$. Each of the lists are in sorted order, as above.

4.53. You are given twelve coins. One of them is heavier or lighter than the rest. Identify this coin in just three weighings with a balance scale.

Back to Chapter List