CS 3530 Session 21

Session 21

Colored Dots and Onions

CS 3530
Design and Analysis of Algorithms

Puzzle: Dot Connections

We are given a set of 2n dots spaced evenly along a long line. n of the dots are blue, and n are red. The dots of each color are numbered 1..n.

The dots can appear in any order, with only one requirement. For any i in [1..n], the red-i dot appears to the left of the blue-i dot. For any other value of j in [1..n], though, the red-i and blue-i dots can appear left or right of the red-j and blue-j dots.

For example, this is a legal input:

    r4 r2 b2 r3 b4 r1 b3 b1

Notice that every red dot is to the left of its corresponding blue dot, but otherwise we have a jumble.

Suppose that each red dot is connected to its corresponding blue dot.

We want to know the total length of the n connections along the straight line. For this configuration, the total is 4 + 1 + 3 + 2 = 10.

Your first task is this:

Write an algorithm to compute the total length of the connections for a given input.

Some candidate algorithms are O(n²). Can you achieve O(n)? Or better?

Generate a legal input with n = 5.

Try to make a challenging sequence, one that exploits the hardest part of the problem.

Finally:

Trade sequences with someone else. Apply your algorithm to the sequence you receive.

How well does your algorithm work?

Debriefing the Dots

What are some of the possible ways to attack this problem?

My first candidate uses brute force. I call it Single Array, Compute After:

    INPUT: 2n values of form color:number

    store the input in an array slots of size 2n
    total ← 0
    for i = 1 to n do
        red_slot  = slot of red-i
        blue_slot = slot of blue-i
        distance  = blue_slot - red_slot
        total = total + distance
    return total

How well does Candidate 1 perform?

Space. The array is size 2n plus one running total, so O(n).

Time. The outer loop executes n times. Finding the red/blue dot pairs can be done one a single pass for each i, but that is also an O(n) operation. So, O(n²).

A variation of this approach creates two arrays, one for the color of the dot in position i, and one for the number on the dot. This behaves pretty much like the single array approach.

But once we strike upon the idea of multiple arrays, we can use it to create a better approach. Why not store the red dots in one array, and the blue dots in another? Then we can store their positions in the original input.

I call my second candidate Parallel Arrays, Compute After. It preprocesses the input.

    INPUT: 2n values of form color:number

    create two arrays of size n, red and blue
    for each position in the input
        store position in color[number]

    total ← 0
    for i = 1 to n do
        distance  = blue[i] - red[i]
        total = total + distance
    return total

How well does Candidate 2 perform?

Space. Two arrays of size n and one running counter, so O(n).

Time. The preprocess step executes 2n times. The computation step executes n times. So, O(n).

Better! Can we do mo' better? Well, we know that we always see blue-i after red-i, so do we need to store the blue dots at all? We could compute the distance right then.

I call this candidate Single Array, Compute on the Fly. It store only the red dots.

    INPUT: 2n values of form color:number

    create one array of size n, red

    total ← 0
    for i = 1 to 2n do
        if the dot is a red-j then
           red[j] = i
        else it is a blue-j
           distance  = i - red[j]
           total = total + distance
    return total

What improvements does Candidate 3 offer?

Space. One array of size n plus the running total, so O(n).

Time. Now a single pass over the input executes 2n times, so, O(n).

This improves in both space and time, but only in the constants. The complexity classes are the same. How can we do even better?

It sounds like we need a good invariant...

Notice that ...

    Σ [ position(blue-i) - position(red-i) ]

... is equal to ...

    [ Σ position(blue-i) ] - [ Σ position(red-i) ]

So we don't need to store any dots at all! I call the algorithm that takes advantage of this little trick One Good Invariant. It store no dots.

    INPUT: 2n values of form color:number

    total ← 0
    for i = 1 to 2n do
        if the dot is a red-j then
           total = total - i
        else it is a blue-j
           total = total + i
    return total

Now we see an improvement in the order of the algorithm:

Space. One counter for the running total, independent of n. So O(1)!

Time. Of course, we still need a single pass over the input. There is no avoiding this. It executes 2n times, which is O(n).

Nice.

This is about as far as most of us would go with this problem. We have a solution that is O(1) in space and O(n) in time. But a few creative computer scientists have come up with a variant of One Good Invariant that can often be implemented slightly more efficiently. It frames the algorithm in terms of the differences rather than the positions.

We might call this version Explicit Delta.

    INPUT: 2n values of form color:number

    delta ← 0
    total ← 0
    for i = 1 to 2n do
        if the dot is red then
           delta = delta + 1
        else it is blue
           delta = delta - 1
        total = total + delta
    return total

This algorithm uses two counters, one for the delta and one for the running total, independent of n. This is still O(1) in space. It still makes a single pass over 2n items, which is O(n) in time. But the arithmetic may well be simpler, shaving off some of the constant overhead cost of the computation.

You may well look at this algorithm and say,

That can't possibly work. I don't believe you.

Fair enough. I didn't quite believe it myself. So do what I did.

Quick Exercise: Trace the Explicit Delta algorithm on our example input above:

    r4 r2 b2 r3 b4 r1 b3 b1

Trust me. I won't lead you astray...

The Sliding Delta Pattern

The last two algorithms above may seem a bit clever for us to create on our own. But they use a common idea that computer scientists have re-discovered in many domains. It's an idea that, once we know about it, we can use to solve many sequence processing problems efficiently.

Context

We are counting the differences between values of complementary 'types'.

Forces

If there are multiple complementary values, then we might process each value separately. But then we have to process the list multiple times for each of the values. That's O(n²) in time.

If we store "history" information as we process the list, in order to iterate through it only once, then we need O(n) space to accumulate information for each value.

The differences are more important than the actual values or the absolute counts.

Solution

Maintain a counter that records the difference among items.

Increment or decrement the counter on each input.

If you are summing over multiple pairs, use the counter to update the running sums.

This is called the Sliding Delta pattern. The counter maintains the delta.

Examples

A really simple example of using a sliding delta is for determining the winner of a two-person election.
If we care only about knowing who the winner is, not the final vote counts, we can use a sliding delta. If we know the size of the input, we can even recover the vote tallies from the delta. This requires O(n) time and minimal O(1) space.

Dot connections is another example. This problem is an abstraction of many circuit layout and design problems. We need to know the deltas between all pairs, which we accumulate into an overall tally. For such a problem, we can use the delta at each blue dot to update another counter that is the running sum.

Another example: sliding windows in communication protocols. Some of you may have seen this in your Networking course.

Mathematical Basis

After processing the kth item, the delta records the "balance" accumulated through the first k values. As a relative value, it eliminates the need for absolute positions or counts for each item.

Mathematical Characteristics

O(1) in space

O(n) in time

Whenever you encounter a problem that fits the profile of a sliding delta, this pattern can be handy.

Puzzle: Onions

We are given a sequence of n parentheses, left and right, in any order.

For example, (( is a legal sequence. So is ()). As are ((())) and ((())))(()())((()).

An onion is a subsequence of 2k parentheses where the first k are left parentheses and the second k are right parentheses, (((...))). Notice that, when k > 1, the onion contains smaller onions.

We are interested in the number of onions in the input.

For the examples above, the number of onions are 0, 1, 3, and 7, respectively.

Write an algorithm to compute the number of onions in a given input.

Doing this in O(n) space isn't too difficult. Can you do it in O(1)?

Peeling the Onions

The first idea to come to mind is one you may have encountered when studying data structures. It uses a stack.

    stack ← empty
    total ← 0
    while there is input do
      if input is ( then
         push stack
      else if stack not empty
         pop stack
         total ← total + 1
    return total

The worst-case scenario for this algorithm is a large input of size n = 2k, consisting of k left parentheses followed by k right parentheses. In this case, we store k items on the stack, and space usage is O(n).

There's a subtle difficulty lying in this approach. How can we ensure that (()()) counts as only two onions, not three? The outer pair of parentheses is not an onion! But the stack loses context, and we get the wrong answer

Perhaps we can fix the stack-based algorithm to make it work. But there is a simpler candidate 2, based on the Sliding Delta pattern.

Whenever you recognize the beginning of an onion -- either at the beginning of the input or when a left parenthesis follows a right parenthesis, set delta to 1.

For each left parenthesis that follows, increment delta.

If you see a right parenthesis while delta > 0, then increment the onion counter and decrement delta.

This is O(n) in time, of course, but uses minimal space -- only two counters and one character for the input.

What, again you don't believe me? That's fine. Trace this algorithm on all four examples above. Does it work?

Wrap Up

Reading -- Study these lecture notes and trace through each of the algorithms.
Identify another "algorithmic pattern" from our previous study this semester: a common idea that, like sliding delta, cuts across multiple types of problems and appears in different algorithms -- even if in a different form.

Homework -- Homework 5 is available and due next session.

Eugene Wallingford ..... wallingf@cs.uni.edu ..... April 3, 2014