Design and Analysis of Algorithms

You are given *n* coins. They all look identical.
They should all be the same weight, too -- but one is a
fake, made of a lighter metal.

Your neighbor has an old-fashioned balance scale that enables you to compare any two sets of coins. If it tips either to the left or to the right, you will know that the one of the sets is heavier than the other. Sadly, you aren't on speaking terms with the neighbor, so he charges you each time you weigh anything.

Your task is this:

Design an algorithm to find the fake coin in the fewest number of weighings.

How many times must you use the scale?

We have been studying decrease-and-conquer, so it's not too surprising that a decrease-and-conquer algorithm works here.

You may well have realized that you can divide the pile in
half, weigh the halves, and narrow your focus to the pile
that is lighter. But that sounds a lot like binary search --
isn't binary search the prototypical *divide*-and-conquer
approach?

As some of you discovered while studying partitioning and
search earlier this semester, some people prefer not to call
binary search "divide-and-conquer" because *it doesn't solve
both sub-problems*. Instead, it discards have the space
and solves only one of the sub-problems. From this perspective,
it is really a **decrease(-by-half)-and-conquer** algorithm.
That sort of approach works for finding the fake coin.

But we can do better than a factor of 2.

Suppose we divide the coins into *three* piles, where
at least two of them contain the same number of coins. After
weighing the equal-sized piles, we can eliminate ~2/3 of the
coins!

To design an algorithm, we need to be more precise.

- If
*n*mod 3 = 0, we can divide the coins into three piles of exactly*n*/3 apiece. - If
*n*mod 3 = 1, then*n*= 3*k*+ 1 for some*k*. We can divide the coins into three piles of*k*,*k*, and*k*+1. It will simplify our algorithm, though, if we split them into three piles of*k*+1,*k*+1, and*k*-1. - If
*n*mod 3 = 2, then*n*= 3*k*+ 2 for some*k*. We can divide the coins into three piles of*k*+1,*k*+1, and*k*.

Here is an algorithm:

INPUT : integer n if n = 1 then the coin is fake else divide the coins into piles of A = ceiling(n/3), B = ceiling(n/3), and C = n-2*ceiling(n/3) weigh A and B if the scale balances then iterate with C else iterate with the lighter of A and B

How many weighings does this require? Approximately
log_{3} *n*. But you don't have to settle
for an approximate answer...

**Quick Exercise**: Construct and solve the recurrence
relation for this algorithm. Simplify your work by assuming
*n* = 3^{k} for an integer *k*.

How much does this improve on a decrease-by-half approach, in which we split the coins into two piles?

log_{2}n------ log_{3}n

In case you haven't worked with logarithms in a while, I'll drop some arithmetic on you:

k = log_{2}n→n= 2^{k}m = log_{3}n→n= 3^{m}2^{k}= 3^{m}log_{2}2^{k}= log_{2}3^{m}k* log_{2}2 =m* log_{2}3k=m* log_{2}3k- = log_{2}3m

So:

log_{2}n------ = log_{2}3 = 1.584963... log_{3}n

This means that, on top of the log_{2} *n*
speedup, the three-group split gives another 1.6x speedup.
Very nice.

This algorithm and binary search can be classified more
generally as **decrease-by-constant-factor** algorithms.
The larger the factor, generally the more efficient the
algorithm.

Speaking of decrease-and-conquer, how did you like my decrease-and-conquer algorithm for the Election puzzle? Sometimes there's gold hidden in them thar hills.

Back in Session 14, we looked at some
optimizations in divide-and-conquer-multiplication.
Suppose that *n* is even. Then, *n* *
*m* can be rewritten (*n*/2) * (*m**2).
This can be quite efficient at the machine level, because
doubling and halving can be implemented as **shifting**
the bits of the number left and right, respectively.

We do have to handle the case in which *n* is odd.
To do so, we can keep track of values that are lost when we
divide by 2. For example:

n * m LOST 26 * 42 13 * 84 6 * 168 84 3 * 336 1 * 672 336

So, *n* * *m* = 672 + 336 + 84 = 1092.

In order to design a concise algorithm, we need to identify
a simple invariant. Take another look at our sequence of
halves-and-doubles... The values we add to find our product
are exactly the values of *m* when *n* is odd
-- including the final case, where *n* = 1.

Let's rewrite our table to make this invariant clearer:

n * m ADD 26 * 42 13 * 84 84 6 * 168 3 * 336 336 1 * 672 672 --- 1092

This leads to a straightforward algorithm.

**Quick Exercise**: Write it.

Recursively, we might start with:

multiply(n, m) if n = 1 then return m else if n is odd then return m + multiply(n/2, m*2) else return multiply(n/2, m*2)

... and make it better. But we can also write this using a loop in a straightforward way, too:

multiply(n, m) sum = 0 while true if n is odd then sum += m if n = 1 then return sum n = n / 2 m = m * 2

As I said when we saw this the first time...

Again: Beautiful.

Consider again binary search, the prototypical decrease-by-half algorithm:

input : k, a target value v[left..right], a sorted list of values output: the index of k in v, or failure if right < left then fail middle ← (left+right)/2 *** the key line if k = v[middle] then return middle if k < v[middle] then return search(left, middle-1) else return search(middle+1, right)

As noted in
Session 12,
we can let `middle` be any index such that (1 ≤
`middle` ≤ `n`),

... but we get our best performance with middle in the very middle:

We eliminate half of the list on each pass!

This is certainly true for arrays such as

3 14 27 31 39

... in which the values in the array are evenly distributed. But what about arrays such as this one:

1 2 3 4 5 ... 40 41 91 99

When we search for 3, the algorithm will partition the array into ...

1 2 3 4 5 ... 20 21

It will then work its way down to 3, looking in slots 11, 6, and 3. We can't do much better.

But what if we are looking for 91? The left half of every subarray along the way has values that much smaller than 91. Couldn't we take advantage of this to speed things up a bit?

Recurring theme alert: partition by position versus partition by value...

Instead of splitting the array in half each time using the
indices of the left and right values, why not let *the
values themselves* tell us where to go, allowing us to
jump farther into subarrays?

The idea is this: Compute the new value of `middle`
using a ratio based on the relative distance between our
target value *k* and the leftmost and rightmost
values of the array being searched. Let's replace the key
line above:

middle ← (left+right)/2

... with this:

| k - v[left] | middle ← | (right-left) * ------------------ | + left | v[right] - v[left] |

For example, if we are looking for 91 in the original array
above, then on our first pass we compute *middle* as:

91 - 1 middle ← (43-1) * ------ + 1 99 - 1 = (42 * 0.918) + 1 = 39.571 = 39

Because 91 > `v[39]` = 39, the algorithm focuses on
the subarray `v[40 .. 43]` -- an array of size 4!
That's much faster than the standard approach.

This is often called **interpolation search**. To
in·ter·po·late
is to "insert (something) between fixed points" or "to estimate
values of (data or a function) between two known values".
Many of you may recognize the idea of
interpolation
from your high school math courses, where it is a useful
technique for simulating a continuous function when all you
have is a discrete set of ordered pairs.

We can call interpolation search a
**decrease-by-variable-amount** algorithm. The size of
the portion of the original problem discarded on each pass
depends on the values in the problem. This can make for an
efficient and concise algorithm. It also complicates
efficiency analysis, which now must include probabilities
in the computation.

**Quick Exercise**: For what inputs does this algorithm
perform better than straight binary search? In such cases,
how much better can it perform? What is the worst case input
for this algorithm? (In the worst case, interpolation search
performs *worse* than straight binary search!)

*Aside about the key line*: The computation of
`middle` in the straight binary search hides a nasty
potential error...

Small steps. Greedy in Ruby.

"Sorting columns". Surprised so few asked Qs. Start earlier? Work slower. Some possibilities in Python.

First, come up with an idea. Then find the syntax or library functions you need to implement it.

Oh, and be sure to *read the assignment*!

- Greedy: no backtracking. Failure expected.
- Times of 0. Implementation notes!
- Print-out. Read-me only,
*with data*.

See today's zip file for some illustrative code.

- Reading -- Here are a few pages to help you understand the
decrease-and-conquer algorithms we saw today.
- The Balance Puzzle
- Russian Peasant Multiplication at Wikipedia. This algorithm is also known as "multiplication a lá Russe" and ancient Egyptian multiplication.
- interpolation search at Wikipedia. This discussion includes some empirical analysis.

- Homework -- Homework 4 is available and due next session.
- Exam 2 -- Exam 2 is next session.