CS 3530 Session 25

Session 25

Heaps and Problem Transformation

CS 3530
Design and Analysis of Algorithms

Game: Wire Connections

A Wire Connections game board consists of a list of n white dots and n blacks dots, interleaved in an arbitrary order. For example:

    W W B W B B

In the game, players take turns connecting a white dot to a black dot. When the game is over, every white dot on the board is connected to a black dot.

There are several different ways a game can proceed. For our example board, I see three immediately:

2-3, 4-5, 1-6
1-3, 2-5, 4-6
1-6, 2-5, 3-4

Note that moves are always written with the position of the leftmost dot first, giving smaller-larger pairs. This means that we can connect the dots in W-B order or B-W order.

Moves are scored based on the distance between the connected dots. The distance between adjacent dots is 1 "hop". So the move 2-3 scores one point, while 2-5 scores three.

After the board is completely wired, the winner is the player with the lowest total number of points. For this reason, game boards usually have an even n, so that each player makes the same number of moves.

Play four games with a partner. Develop a strategy that helps you win reliably -- or at least reliably not lose! Here are the game boards:

    W W W W B B B B
    W B W W B B W B
    W B W B B W W B
    W W B W B B W B

Later, we will compare results and algorithms.

Next, make up two boards each and play again. If you can create a game board that gives your algorithm an advantage, feel free to do so. You may outsmart your opponent!

Minimizing Wire Connections

This game may remind you of Dot Connections, a puzzle from a couple of weeks back. In that puzzle, the dots were already paired up in advance, and our job was to compute the total length of the connections as efficiently as possible.

The goal here is to select low-cost connections that force your opponent to select hugher-cost connections, at least in aggregate.

How might we attack this using brute-force? On a given board, there are n! possible wirings. We could iterate through them all, scoring each, and choose the best move from the best one. This algorithm is O(n!), or O(nⁿ). Ouch.

And it gets worse. Our opponent may not choose a wiring on the best board for us, so we may have to repeat this process every time we make a move. Ouch^ouch.

Given the adversarial nature of the game, we need a way to score board quickly, seeing what future moves we could be forced to make. Here is a greedy approach: Scan the line, making as many minimal connections on each pass. Continue until all dots are connected.

How well does the greedy approach work on this board?

     W W B B W W B B

It generates 2-3, 4-5, 6-7, 1-8. The total length of the connections is 1+1+1+7 = 10, and we would win the game, two to eight.

Our opponent might choose a different second move, though, say, 1-4. That would leave the board as

     W W B B W W B B

Then we could play 6-7, force our opponent to play 5-8, and win two points to six. Notice that the total length of this wiring is only 8. If all we track is game wins and losses, the different point scores don't matter. But if we track points over time, the opponent has an incentive to find minimal total boards, to minimize his loss on boards he doesn't control.

Is there a problem transformation approach available to us? Yes, but the motivation isn't Dot Connections but Onions! Treat each white dot as a left parenthesis and each black dot as a right parenthesis. Then use standard parenthesis matching rules to make connections.

This approach "unties" crossed connections and decomposes the game board into subsequences. For example, the board:

    W W B B W W B B

... is divided into two subsequences: [W W B B] at the front and another [W W B B] on the end. The board:

    W B W W B B W B

... is divided into three subsequences: [W B], [W W B B], and another [W B].

This approach minimizes the total connection length because, by untying crossed connections, we match dots as closely as possible without leaving unmatched dots hanging across long spans. We might call this a "principled greed", I suppose.

There is one small glitch. What do we do when a closing parenthesis precedes its opening parenthesis? Consider this board from above:

    W B W B B W W B

If we convert this into an Onions input, it becomes ()())((), and three immediate moves become obvious: 1-2, 3-4, 7-8. But now the board contains a ")(" pair, to match as 5-6. Our algorithm must account for this.

This "principled greedy" algorithm for examining boards uses a representation change to lead us to a problem transformation, which inspires us to solve the problem in a different way than either the original representation or the second algorithm would indicate.

Quick exercise: How might we transform this problem into the Dot Connections problem? We could preprocess our input, labeling each dot with its position in the list... What value, if any would there be in doing so?

A New Data Structure: The Heap

Last time, we encountered the heap data structure, which makes it easy to implement priority queues. Recall that a heap is a binary tree whose structure meets two requirements:

parental dominance, which requires that every parent is ≥ each of its children
near-completeness, which requires that the tree is complete through level n-1 and then full at the left of level n

Heaps have a couple of convenient features. First, deletion of the maximum value is easy:

    swap the root with the last element in the heap
    delete the last element
    walk the new root down to its correct position,
       always choosing the larger child for a swap

Show an example. Ask them to do the next.

Insertion of any new value is easy, using a similar approach:

    drop the new element into the first open position
    walk the new value up to its correct position

Show an example. Ask them to do the next.

The motivation for heaps goes back to the days of Fortran, a language with fast array operations but no dynamic memory. A heap has a straightforward representation as an array, created with a breadth-first visitation of the binary tree. The root goes in slot 1, and the children of any node i are in 2i and 2i+1. It's easy to find the last element in heap and the first open slot (the position of the last element, plus one). The bubble operation, whether down or up, is O(log n).

We can create a new heap of values by repeatedly applying the insertion algorithm. We call this top-down construction. Is there a better way?

Bottom-Up Construction of a Heap

First, we need to know how efficient the top-down method for constructing a heap is. Each insertion requires O(log n) time, and there are n insertions. So, O(n log n).

We can't do better in terms of the efficiency class, but we can do better in terms of the associated constants. The bottom-up construction algorithm works in this way:

    initialize the tree by placing the nodes in the order given
    for i ← last parent up to root
      if i does not dominate its children
         walk it down to its correct position

Consider the set of values [4 6 1 9 5]. Work through the example.

This algorithm is a bit more efficient than the top-down approach because it takes advantage of "accidental" dominances that occur in the initial configuration of the data. We can think of the initial array as a heap in need of repair.

Two Questions

Construct a heap for the array [1 8 6 5 3 7 4]

using the top-down construction method
using the bottom-up construction method

Two Answers and More Questions

The heap generated top-down: fill in the blank.

The heap generated bottom-up: fill in the blank.

The top-down and bottom-up algorithms generate the same heap for this data set. Do they generate the same heap for all data sets? Why, or why not?

Here is a counter-example for the general case: [1 2 3]. Accidental dominances stop the bottom-up from looking further at a sub-tree, and repeated insertions force moves down maximal branches.

The Heap Sort

A heap defines a set of partial orders on the data it stores, and makes deleting maximum values easy. So we could use a heap to implement a simple algorithm for sorting an array A[1..n]:

    construct a heap from the array
    perform n deletions from the heap

If we use an array implementation for heaps, we can sort in place.

What is the efficiency class of this algorithm? How does it compare to quicksort or mergesort?

Heap construction is O(n log n), and n deletions is also O(n log n), so the algorithm as a whole is the familiar O(n log n). It's constants are a bit higher than those for quicksort and mergesort, because it has two independent O(n log n) steps.

But: Heapsort is O(n log n) even in its worst case, which is better than quicksort's worst case.

But: Heapsort works in place and so requires no extra space, while mergesort requires O(n) extra space.

The heapsort gives us choices that can be quite valuable. These days, it is mostly of historic significance.

The Final Question

Why introduce heaps and heapsort when discussing problem transformation as a design technique?

The Final Answer: It depends.

If we think of the heap as a tree, then heapsort is a representation change. We convert the array to a different form -- a binary tree, with a couple of important conditions -- and then use removals as selections.

If we think of the heap as an array, then heapsort is a form of instance simplification. The first step puts the array into a special form (a kind of representation change), and the second repeatedly turns a problem of size k into a problem of size k-1.

Wrap Up

Reading -- Read these lecture notes. Review the material since our last exam, including Bloom filters, the sliding delta pattern, and transform-and-conquer strategies.

Homework -- None for now.

Exam -- Exam 3 is next session.

Eugene Wallingford ..... wallingf@cs.uni.edu ..... April 15, 2014