Design and Analysis of Algorithms

This is a classic problem called *Closest Pair*.
We have a set of points in a rectangle. For example:

Let's label the points p1, p2, ..., pn, where each p*i*
is an ordered pair (x*i*, y*i*).

**Your task**: Design an algorithm that finds the
closest *pair* of points.

Recall that the distance formula for two points in a plane is:

sqrt( (x_{2}- x_{1})^{2}+ (y_{2}- y_{1})^{2})

How efficient is your solution?

We could do this in a **brute force** manner by computing
the distance between every pair of points and selecting the
smallest:

INPUT: x[1..n] and y[1..n] minimum ← ∞ for i ← 1 to n do for j ← i+1 to n do distance ← sqrt((x[i] - x[j])² + (y[i] - y[j])²) if distance < minimum then minimum ← distance first ← i second ← j return (first, second)

What is the complexity of this algorithm? O(n²), because we have to do (n**2-n)/2 distance computations. These are very expensive due to the square root operation. One way to improve the algorithm's performance is not to compute the square root! If sqrt(x) < sqrt(y), then x < y. If we really need the distance of the closest pair, we can compute it once we know first and second. Still, though, the algorithm is O(n²).

Let's try a top-down approach: **divide and conquer**.
Split the grid in half. Solve the two halves, finding the
closest pair on the left and the closest pair on the right.
Choose the smaller of the two.

We don't need to go all the way. If a panel has k points or fewer, then we can solve it with the brute-force approach. For small enough k, O(k²) is acceptable. If not, though, divide and conquer. Let's use k = 3, as that means we don't need a nested loop...

With this algorithm, we end up with four smaller rectangles:

Solve left half, as two panels of size three. This gives us
two pairs, separated by d_{left} and d_{right},
respectively. Choose the smaller, d_{min}.

Complication... The closest pair in the left rectangle might
not be within one of the halves -- it could straddle the
dividing line! Let's consider all the possibilities. But we
can narrow our search... How close would straddling points
have to be to the dividing line in order to be closer to each
other than our closest pair in each half? d_{min}.

Find the closest pair in the straddling region. (Use the same algorithm!!). In this case, a straddling pair is the closest pair, so choose it as the answer for the left half.

Now, solve the right half, including its sanity check. That gives us two points:

Then combine left and right, with *its* sanity check,
and we have the closest pair in the original set.

How efficient is this approach?

T(1) = c c == min(three computed distance) T(n) = 2T(n/2) + B B == cost to combine the two halves

The cost of combining the two halves is the cost of checking the straddling region. This is Θ(n). (Can you figure out why?)

The overall complexity is Θ(n log n), which improves on our brute-force solution. ... in much the same way that quicksort and mergesort improve on the O(n²) sorts!

Brute-force is often a good way to start looking for an algorithm, if only to create a baseline for how good we can do. Then we can proceed as we learned the first couple of weeks, with top-down and bottom-up, always being on the look-out for a useful invariant or bit of knowledge that might let us zoom-in.

We can learn an interesting lesson here. Divide-and-conquer doesn't work here, strictly speaking. But "we made it fit" by accounting for the way it fails. The lesson... Use the basic techniques. Tweak as necessary.

Last time we began formally analyzed an iterative algorithm and then started to analyze a recursive algorithm before reaching the end of our time. Let's pick up where we were.

Our Algorithm Q(n) computes the sum of the first *n*
cubes. Its basic operation is the double multiplication
done on each call.

if n = 1 then return 1 else return Q(n-1) + n * n * n

First, we set up a **recurrence relation** for the number
of multiplications:

M(1) = 0 M(n) = M(n-1) + 2

Then, we started to substitute previous values for M into
the equation unntil we saw a pattern for *n*-*i*:

M(1) = 0 M(n) = M(n-1) + 2 = (M(n-2) + 2) + 2 = M(n-2) + 4 = (M(n-3) + 2) + 4 = M(n-3) + 6 = (M(n-4) + 2) + 6 = M(n-4) + 8 ... ... = M(n-i) + 2i

Once we have that, we can substitute *n*−1 for
*i* to reach a solution for the problem of size
*n* in terms of a problem of size 1. We know the
cost of that problem, so we can simplify down to a value
in terms of *n* itself.

M(1) = 0 M(n) = = M(n-i) + 2i ... [substitute i = n-1] = M(n-(n-1)) + 2(n-1) = M(1) + 2(n-1) = 0 + 2(n-1) = 2(n-1)

So this algorithm performs 2(*n*-1) multiplications and
is O(n).

When analyzing any recursive algorithm, we can use this same process:

- Write a recurrence relation for the basic operation.
- Starting with
*n*, substitute recursively until you find a pattern for*n*−*i*. - Substitute
*n*−1 for*i*to reach a closed-form solution for the problem of size*n*.

Often, the arithmetic for solving a recurrence relation is simpler than that needed to solve the iterative sum. Loops may seem easier to understand than recursion, at least until you gain more experience, but recursion is often much better behaved mathematically!

Show Towers of Hanoi: interactive!

Algorithm:

- Move n-1 disks to the free peg.
- Move biggest disk to target peg.
- Move n-1 disks to the target peg.

**Analyze this**.

Apply the technique...

M(1) = 1 M(n) = = 2*M(n-1) + 1 = 2*(2*M(n-2) + 1) + 1 = 4*M(n-2) + 3 = 4*(2*M(n-3) + 1) + 3 = 8*M(n-3) + 7 ... = ... = 2^{i}M(n-i) + (2^{i}-1) ... [substitute i = n-1] = 2^{n-1}M(1) + (2^{n-1}-1) = 2^{n-1}+ 2^{n-1}- 1 = 2^{n}- 1

*Quick question*. What would it mean to startg with
M(0)? Would that work?

So, this algorithm is O(2^{n}). That is **much**
worse than O(n^{2}), even O(n^{k}) for
k > 2. Check out this chart I found via
a University of Ottawa CS course site:

Complexity matters. For some algorithms, it matters only for large data sets. For others, it matters almost regardless of the size of the data.

Ways other than mathematical analysis to understand the complexity of an algorithm:

- empirical analysis
- visualization

Example of visualization: Sergey's VisualSort application.

Not graded. Violates my goal, which is one-week turnaround.

- Not so bad this time, as the problems were either experiential (#1), mechanical (#2), or philosophical (#3).
- In general, homework in this course will be experiential or mechanical. For practice. For learning. Not for evaluation so much. So: I will grade generously if it is clear that you have made a good-faith effort (time put in, answers sought, ...).
- Only 25% of final grade.

Number 1. It's hard to come up with a new algorithm for some problems...

Number 2. Not in place -- two extra arrays! Not stable -- try it with two 13s and see where they end up.

Number 3. None of us are lawyers (are we?). But we all have intuitions, especially as creators of programs and algorithms. Think them through. Work them out.

The law is unsettled. As it stands, we cannot patent "scientific facts" or "mathematical expressions". We can patent other algorithms, if they are implemented as programs.

To me, this distinction betrays a fundamental misunderstanding of algorithms relative to "mathematical expression". I'm not comfortable with most software patents, especially those that are algorithmic in nature.

- Reading -- ... Study for the exam.
- Homework -- Homework 2 is available and due one week from today.
- Exam -- Exam 1 is next session.