Claudio Cherubino's blog Life of a Googler

1Jul/0812

The missing number

There is an interesting series of programming job interview challenges proposed by Dev102.com, which is now at its tenth puzzle:

This week question is pretty easy. Your input is an unsorted list of n numbers ranging from 1 to n+1, all of the numbers are unique, meaning that a number can’t appear twice in that list. According to those rules, one of the numbers is missing and you are asked to provide the most efficient method to find that missing number. Notice that you can’t allocate another helper list, the amount of memory that you are allowed to allocate is O(1). Don’t forget to mention the complexity of your algorithm…

I'm not sure I understood correctly the constraint related to the memory allocation.

In my opinion, when they say we are limited to O(1), they mean that we can only allocate a single numeric variable and not any other data structure.

According to this interpretation, the solution is quite easy.

First of all, we take our only variable and store into it the sum of all the numbers between 1 and n + 1, which can be easily computed remembering that 1 + 2 + ... + n = n (n + 1) / 2.

Then, we subtract each element of the array from this value, eventually the result is actually our missing number.

The functional implementation of this imperative algorithm is straightforward:

#light

let sum n = ((n + 1) * (n + 2)) / 2

let answer nums = (nums |> List.length |> sum) - (List.reduce_left (+) nums)

Let's talk about the complexity of the algorithm.

If the length of the input list list is known, evaluating the sum of the first n natural numbers takes O(1), otherwise we have to scan the entire list, i.e. O(n).

Then we have to subtract each element of the list, and this takes another iteration, so another O(n).

Therefore we have O(n) + O(n), which is definitely O(n), and can be easily optimized a little by scanning the list only once.

The only problem left is: "am I following the instructions"?

Comments (12) Trackbacks (1)
  1. O(1) doesn’t mean you can only assign one variable, you could assign a hundred or a thousand if you needed to. It just means the amount of memory used can’t be proportional to the length of the list, so you wouldn’t be allowed to solve this by allocating a boolean array, looping through the input list setting each corresponding boolean to true, and then checking the boolean array to see which one is still false. That boolean array would be length n + 1 for input length n, so you’d be using memory proportional to the input list length

  2. Your solution actually uses O(n) memory.

    You’ve made the assumption that n (n + 1) / 2 is less than the maximum integer size. For an arbitrarily long list you would need an arbitrarily large sum variable which would require an arbitrarily large amount of storage space.

  3. Hi Chris,
    are you sure about your assertion?

    According to Michael’s comment, I understood that I did correctly, and in addition my solution was accepted by the authors.

  4. I’m fairly confident :)

    If you read through the comments on the post for this question you’ll see that Jesus DeLaTorre and Heiko Hatzfeld have made similar observations. I accept it’s a very pedantic point though.

    One way to do it with truly O(1) memory usage would be to sort the original array using an in-place sort like quicksort. Once the list was in order it would then be trivial to step through it and find the missing number.

    Of course, the computational complexity of my solution is O(n^2), which is far less efficient than yours.

  5. Eventually, I got yoour point! :)

    By the way, is there a solution that uses O(1) memory and has O(n) computational complexity?

  6. I think there is an O(n) solution. We should be able to sort the array in O(n) time because we can tell where a number should appear in the sorted list from its value. As I said in my last post, once we have a sorted list finding the missing number is easy.

    I’ll give it a go when I get a little free time.

  7. I have improved method_2 over method_1. I think in method_2 there is less chance of integer overflow.

    public static int method_1(int [] sequence)
    {
    // this can overflow
    int allSum = ((sequence.length + 1) * (sequence.length + 2)) / 2;

    int seqSum = 0;
    for (int i = 0; i < sequence.length; ++i)
    {
    // this can overflow
    seqSum += sequence[i];
    }

    return allSum – seqSum;
    }

    /**
    * @param sequence : Sequence of length N Contains integers 1 to (N + 1)
    * with one number missing. The array is unsorted.
    *
    * @return The missing number
    */
    public static int method_2(int [] sequence)
    {
    int missingNum = sequence.length + 1;
    for (int i = 0; i < sequence.length; ++i)
    {
    missingNum += ((i+1) – sequence[i]);
    }

    return missingNum;
    }

  8. Here’s a solution in Python that doesn’t have the overflow/linear space issue. It operates in linear time because the sorting works a bit like a keysort. It operates in constant space because by sorting the original list in-place.

    Once the list is sorted it’s easy to find the missing number.

    
    null = 0
    
    def findMissing( sequence ):
    
    	# Pad the list to fit the missing value
    	sequence.append( null )
    
    	# Sort the list
    	sequenceSort( sequence )
    
    	# Find 'null' index and hence missing value
    	return sequence.index( null ) + 1
    
    # Sorts a sequence (in linear time and constant space)
    # Assumes the list is large enough to fit all values
    def sequenceSort( sequence ):
    	index = 0
    	while( index < len(sequence) ):
    
    		# Pull out a value and set its spot to 'null'
    		value = sequence[index]
    		sequence[index] = null
    
    		# Insert the value into the list.
    		# Keep inserting the evicted values till we fill the slot we first got
    		# the value from.
    		while( True ):
    			value = insert( value, sequence )
    			if( (value == sequence[index]) or (value == null) ):
    				break
    
    		index = index + 1
    
    	return sequence
    
    # Inserts a value into the list, returning the value
    # that used to live at that index.
    def insert( value, sequence ):
    	index = value - 1
    	evicted = sequence[index]
    	sequence[index] = value
    	return evicted
    

    I’d be interested to see what this looks like in F#. Since my solution depends on the mutability of the original list, it’s hard to see how you could do it in a nice, clean, functional way.

  9. Just because there is a less chance doesnt make the method better, there is still a chance of an overflow. Hence I have provided a method which do NOT use any integer summing. Instead I use cumulative XOR to cancel out duplicates. Here is the third implementation:

    /**
    * @param sequence : Sequence of length N Contains integers 1 to (N + 1)
    * with one number missing. The array is unsorted.
    *
    * @return The missing number
    *
    * Algorithm: In this case we do not calculate any integer sum at all. We do cumulative
    * XOR of all elements with index. Please note that N XOR N == 0. So all elements eventually
    * cancel all index and last number standing after cumulative XOR is the missing number.
    * For example consider the sequence [1] [2] [3] [?].
    * So 1 XOR 1 == 0, 2 XOR 2 == 0, 3 XOR 3 == 0, so last number standing will be 4.
    * It does not matter if numbers are permuted instead of being sorted.
    */

    public static int method_3(int [] intSeq)
    {
    int missingNum = intSeq.length + 1;
    for (int i = 0; i < intSeq.length; ++i)
    {
    missingNum ^= ((i+1) ^ intSeq[i]);
    }

    return missingNum;
    }

  10. public static int findMissingNumber3(final int[] numbers)
    {
    int missing = 0;
    for (int i=0; i {
    missing += (i+1 – numbers[i]);
    }
    return missing;
    }
    [/c#]

  11. try this one, writtem in C# and using Linq:

    int missingNumber(IEnumerable list)
    {
    var k = new List(list);
    k.Sort();
    return k.Last() != k.Count() + 1 ? k.Count() + 1 : k.Where((elem, index) => (index + 1) != elem).First() – 1;
    }

  12. Better solution in c# using Linq:

    int missingNumber(IEnumerable myCol)
    {
    return (myCol.Count() + 1) * myCol.Count() / 2 – myCol.Sum();
    }


Leave a comment

(required)