CS 3530 Session 22

Session 22

Green Crocodiles, Green Onions

CS 3530
Design and Analysis of Algorithms

Our Puzzle: Crocodiles

We are given a list of 2n integers, k. Each pair k_2i, k_2i+1 represents the year of birth and year of death, respectively, of a crocodile. Crocodiles have been on the earth for a long, long time, so the range of possible numbers in the list is quite large, say, [-100000 .. 2014].

For example, this is a legal input:

    -90000 -89950 10 60 12 50 48 55 1961 2015

These crocodiles are in order of their birth year, but they need not be. They could be in any order.

We need to know the largest number of crocodiles that were ever alive in a single year. For our sample list, the answer is 3. There were three crocodiles alive in the years 48, 49, and 50.

Your first task is this:

Write an algorithm to compute this answer for any legal input.

Some candidate algorithms use a lot of space. What's the smallest amount of space you can use?

Now, modify your algorithm to return a year in which the largest number of crocodiles were alive.

... or, even better, all such years.

Debriefing the Crocodiles

What are some of the ways we can attack this problem?

My first candidate uses brute force: simple but costly...

Create an array of counters for all possible years. For each k_i, k_i+1 pair, increment the slots of all corresponding years. After processing the entire input list, find the maximum value in the array.
    INPUT: 2n values, as birth, death pairs

    1.  declare c[minYear..maxYear]
    2.  for i ← minYear to maxYear do
          c[i] ← 0

    3.  for all pairs (birth, death) do
          for i ← birth to death do
            c[i] ← c[i] + 1

    4.  max ← minYear
    5.  for i ← minYear+1 to maxYear do
          if c[i] > c[max]
             max ← i

    6.  return c[max]
This already handles Task 2: to find a maximal year, return max instead of c[max]. Finding all maximal years is a simple tweak to steps 4 and 5.

How well does Candidate 1 perform?

Space. The array c[] is O(range), where range >> n.

Time. Step 2 is O(range). Step 3 has an O(n) loop wrapped around a loop that runs over the lifetime of each crocodile. The length of a lifetime is independent of the input or the range, so it's a constant factor on the O(n) loop. Step 5 is again O(range). So, overall, this approach is O(range).

We can implement a more space-efficient form of brute force by being smarter about how we store the lifetime of a crocodile...

Do the same thing as Candidate 1, but use a linked list that contains only the relevant years. The linked list will contain cells containing (year, count) pairs.
    INPUT: 2n values, as birth, death pairs

    1. head ← null
    2. for all pairs (birth, death) do
         for i ← birth to death do
           if no such cell exists with year = i
              insert (i, 1) in list in proper spot
           else
              increment count in cell with year = i

    3. ptr ← head
    4. max ← ptr
    5. while ptr != null do
         if ptr->count > max->count
            max ← ptr
         advance ptr

    6. return max->count
Task 2 is still just a change of return value, max->year instead of max->count. Finding all maximal years requires the same tweak to steps 4 and 5 as before.

How well does Candidate 2 perform?

Space. The worst case is still O(range). We never have to create or store any 0-count cells, but we do have the overhead associated with the linked list.

Time. This is still O(range).

After our topic last session, you may be able to imagine a much more efficient solution using a a sliding delta...

Sort the list, marking births and deaths. Then increment a delta on births, and decrement it on deaths.
    INPUT: 2n values, as birth, death pairs

    1. annotate and sort the list

    2. max   ← 0
    3. delta ← 0
    4. for all i in sorted list
         if i is a birth year
            delta ← delta + 1
            if delta > max
               max ← delta
         else
            delta ← delta - 1

    5. return max
To find a maximal year, we just need to maintain a maxYear variable alongside max. Finding all maximal years requires maxYear to be a collection.

How well does Candidate 3 perform?

Space. The sliding delta process requires only two variables! If we can sort in place, we don't need any new space to store the years from the input list.
What about the birth|death annotation for each year? We could use a bit string, with one bit per item in the input, with the bit set to 0 for a birth and 1 for a death. This requires n bits, or n/64 words. That's O(n), which is much less than we needed for the brute-force approaches.

Time. The sort is O(n log n). The sliding delta process is only O(n). For both, working with the bit string adds a little code overhead. Overall, though, this algorithm is O(n log n).

The final solution uses three techniques that we have now seen multiple times. First, we use a bit string to conserve space. Second, the sliding delta pattern is the keystone of the algorithm. But we can use this pattern only with the years in order, so we first use presorting to prepare the input. Presorting is an example of a general technique known as transform-and-conquer, which is the topic of our next unit.

Sliding Delta as an Algorithmic Pattern

What makes the sliding delta idea so useful? It applies to many problems, but also to many kinds of problems:

Dot connections requires finding a sum over interleaved pairs.
Onions requires counting over possibly nested pairs.
Crocodiles requires finding a maximum among pairs, with no sense of position at all.

What is the common core of these problems? Counting and pairs. A sliding delta may work for you if your problem shares this core.

The solutions don't look alike, necessarily, at least not at a detailed level. The number and kind of loops and variables used differ from one algorithm to the other. We even saw two distinct versions of sliding delta applied to the dot connections problem last time.

But there is a shared core in these solutions, too: a counter is initialized to 0 and then incremented and decremented in a particular way when we cross boundaries -- that is, when we recognize the beginnings and endings of pairs.

This is what software people call a pattern: a stereotypical solution idea applied to any problem that has a general set of features. The particular solution is individualized to the particular features of the problem.

One of the nice things about Sliding Delta is that it produces a certain kind of efficiency whenever it is used: O(1) space and O(n) time for the counting of pairs. When you need to improve on an algorithm's resource usage, sometimes a Sliding Delta can help -- if you have the right kind of problem.

Note that the whole algorithm may not be O(1) space and O(n) time, just the counting-on-pairs part. When we applied it to the crocodiles problem, we ended up with O(n) space to hold the marker bits and O(n log n) time in order to sort the input before counting. Still, both were improvements on the alternatives.

Other Algorithmic Patterns

Quick Exercise: What other algorithmic patterns have we seen this semester?

The one that comes to my mind almost immediately is partitioning.

Consider binary search. This algorithm takes a sorted list of values and finds the location of a target value efficiently by repeatedly halving the list and focusing the search in a smaller list that must contain the target, if it exists at all.

Computationally, we implement this as a loop or a recursive procedure that examines the position in the middle of the list and then, if necessary, repeats the process on either the "smaller" half or the "larger" half. The mathematical result: a rather small number of repeated steps, log n for a list of size n.

But then consider mergesort... It partitions the problem into equal-sized subproblems, solves them, and then brings the sorted sublists back together into sorted order.

These problem aren't alike at all; indeed, binary search presumes the postcondition of mergesort. Nor do the solutions look alike.

But these algorithms do share a common idea in their solutions: a sequence repeatedly partitioned in half, with the same process applied to one or both halves as is applied to the list as a whole. The common mathematical notion is O(log n) processing time. (Mergesort has to do this process n times.)

We might call this the Binary Partition pattern. (In general, partitioning into more sections cannot improve our time efficiency class.) It underlies not only binary search and merge sort but also a host of other important algorithms, for example,

efficiently computing the max and min of a list in parallel, via recursion and merging
efficiently computing both max and second max, via a "tennis tournament" approach
many algorithms involving trees, heaps, and similar data structures

Learning patterns such as Siding Delta and Binary Partition can help you become a better problem solver because they crystallize recurrent, powerful, and broadly applicable ideas, ones that you can apply in many different ways. You can also use them as a vocabulary for talking -- and thinking -- about problems. These patterns play a role analogous to design patterns when designing software. I can hardly imagine attacking hard design problems in Java without being able to talk about Strategy and Null Object and Composite and Decorator and Observer and Adapter and ...

Maybe by discovering and learning more algorithmic patterns we will be able to shorten the time between first facing a problem and having the a-ha! experience that happens when we see the right and beautiful solution. Sometimes, knowing the right pattern helps us to have the a-ha! experience in the first place.

Puzzle: Making Crocodiles from Dots

We have fine algorithms for solving the Dot Connection and Crocodiles problems.

Now suppose that we are given an input sequence for the dot connection problem and are in the process of designing a physical solution -- a layout of wires connecting the pairs of dots. Our supply technician asks, "What is the widest junction box we need?" That is, what is the largest number of pairs that are "open" any given position on the grid?

We could implement a brand-new solution to this problem, applying the idea from solving Crocodiles with a sliding delta to the inputs of the Dot problem.

Let's consider an alternative. Rather than writing a new algorithm, we can take advantage of the fact that the Crocodiles algorithm can already answer this question for us -- if only the input were in the proper format.

Your first task is this:

Write an algorithm to translate a Dot Connection input sequence into a Crocodiles input sequence.

For example, this Dot Connections problem:

    r4 r2 b2 r3 b4 r1 b3 b1

... might be translated into this Crocodiles problem:

    1 5 2 3 4 7 6 8

The answer to the newly-created Crocodiles problem is 2. That is exactly the same answer to the question, what is the widest junction box we need to hold the connections in our Dots problem.

Because the Crocodiles problem doesn't care which order its crocodiles occur in, there are 4! = 24 different outputs that our transformation algorithm can legally produce. Your algorithm need produce only one.

A Solution

If we approach this with brute force the way we dolved the Dots problem itself, we end up with a nested loop:

    create an array endpoints of size 2n
    for i = 1 to n by 2 do
        pos_red = find_red(i)
        endpoints[i] = pos_red
        pos_blue = find_blue(i)
        endpoints[i] = pos_blue

The outer loop executes n/2 time. I say "outer loop" because the find_red()/find_blue() sequence is another O(n) loop. The result is O(n²). The resulting output preserves the order in which the red dots appear in the original input, but not their dot numbers. The algorithm works in O(n) space, of course.

But we can do better. We don't have to process the red/blue pairs together; we only need to know their positions. (This is one way we solved the Dots problem more efficiently, too.)

    create an array endpoints of size 2n
    for i = 1 to 2n do
        color, number ← input[i]
        if color = red
           endpoints[2*number-1] = i
        else
           endpoints[2*number] = i

This algorithm still requires O(n) space, but now we make a single O(n) pass over the input, storing red dots in odd slots and blue dots in even slots, determined by the dot numbers. The result is O(n) and preserves the order of the dot numbers.

Can we go in the other direction and translate a Crocodiles problem into a Dot Connections problem? There seems to be a meaningful relationship between these two problems, so translating between them makes sense.

What about Onions → Dots, or Dots → Onions? Or Onions → Crocodiles and Crocodiles → Onions? Can we make the translation at all? If so, is there a meaningful relationship between the Onions puzzle and either of the others?

Problem Transformations

In this case, we could adapt one algorithm to work on inputs of the other form, but it is easier to write a translator of one input sequence into the an equivalent one of the other form.

This is a trivial example of representation change, which is a common tactic in transform-and-conquer algorithms. Two other transform-and-conquer tactics are:

instance simplification, in which we change the problem into a simpler form of the same, a lá binary search.
problem reduction, in which we turn a problem of one type into a seemingly very different sort of problem.

Instance simplification sounds a bit like decrease-and-conquer. Problem reduction is a key technique in the study of computational complexity and plays a big part on the whole P=NP? thing.

As an example, consider the problem of determining whether every item in a list is uniqueness.

We can create a simple brute-force algorithm that operates in O(n²) time by comparing all pairs of values.

We can convert the problem into a simple linear search using a "sort and scan" strategy, which operates in O(n log n) + O(n) = O(n log n) time.

In Session 19, we saw another approach: use hashing to turn the problem essentially into an insertion problem! This is O(n) in the best case, O(n log n) in most other cases, and O(n²) in the worst case.

Problem transformation is a fun and powerful idea. We'll begin studying it in detail next week.

Wrap Up

Reading -- Read this short chapter on graphs. It reviews the idea of a graph and walks you through a couple of graph algorithms. Study the notes, work the exercises, and ask any questions you have.

Homework -- Homework 5 was due today. Homework 6 come out later. Enjoy a break.

Eugene Wallingford ..... wallingf@cs.uni.edu ..... April 3, 2014