TADM2E 4.11
Note that there can be at most one element which appears more than n/2 (i.e. at least ceil(n/2) ) times in the list.
If we now consider the *sorted* version of the list, we'll see that if such an element exists, it covers a consecutive range in the list. The start of this range can be at various places with the extreme cases being:
- the range starts at element 1
- the range ends at element n
We can easily see that in both cases (and therefore also in the cases between the extremes) that if there is an element which appears more than n/2 times, the element at position ceil(n/2) (in the sorted list) must have this value.
This leads to the following algorithm:
Determine the ceil(n/2) smallest element of the (unsorted) set. The BFPRT algorithm can do that in O(n) running time in the worst case .
Go through the list element by element and count the number of occurrences of this ceil(n/2) smallest element. If it occurs at more than n/2 times, this is the element we were looking for, otherwise the list does not contain such an element.
Alternatively, successively remove pairs of distinct elements from the list. If there is an element that appears more than n/2 times, it won't have a pair eventually. The algorithm below removes pairs of distinct elements from the list until the residual list (maybe empty) contains only equal elements. If the residual list is non-empty, its head is a candidate for the element that appears more than n/2 times. Whether it is so can be checked in one linear sweep. If the residual list is empty, the list doesn't contain an element that appears more than n/2 times.
The algorithm works as follows. We traverse the list recursively and maintain an additional stack. The stack is initially empty. If both the list and stack are non-empty, then either the top of the stack is distinct from the head of the list, in which case they both can be removed, or they are equal, in which case we move the head of the list to the top of the stack, thus maintaining the invariant that the stack contains only equal elements. If the stack becomes empty, but there are still elements in the list, we move the head of the list onto the stack. If the list becomes empty, the stack contains elements for which there was no pair. We can implement this algorithm in Haskell as follows:
candidates :: Eq a => [a] -> [a] candidates = go [] where go ys [] = ys go [] (x:xs) = go [x] xs go (y:ys) (x:xs) | y == x = go (x:y:ys) xs | y /= x = go ys xs
Algorithm for any arbitrary number of occurences
For each element insert it into a dictionary. If the element is not in the dictionary add it with a count of one, if it is already in the dictionary increment its value. This can be done in linear time and space if using something like a hash function. Each time we change the count of an element we check if its frequency exceeds the number we are looking for. If it does we either save it or return it, depending on whether we want to find one or all such elements.