CS 3530 Session 4

Session 4

Beginning to Analyzing Algorithms

CS 3530
Design and Analysis of Algorithms

Recap: Ways to Design Algorithms

In Session 2 and Session 3, we played a couple of games with numbers as a way to explore three high-level approaches to designing an algorithm:

top-down -- starting with the target problem, decomposing it into sub-problems, perhaps recursively

bottom-up -- starting with simple problems and solving successively more complex problems until we reach a solution to the target problem

zoom-in -- uncovering some natural invariants in the problem that point to a direct solution

(Note that top-down and bottom-up are also common ways to approach the writing of programs...)

Often it is wise to begin by trying to create a top-down solution, which causes us to expose the relevant sub-problems. You may then be able to create a bottom-up approach that combines solutions to sub-problems in a more efficient way. Occasionally, our experience from this attempts may help us to see some features of the problem that zoom in on an immediate solution.

Let's Play a Game!

Consider the Difference Game, simple two-player game played with sets of positive numbers. Ordinarily, the game starts with a set of size two. The players take turns adding a positive number to the set. The only requirements are that:

the number must be the difference between two numbers already in the set, and

no number can be repeated. (It is a set, after all.)

The first player who cannot move loses the game.

For example, consider this starting position:

6 4

Player 1's opening move is 2. But then Player 2 has no moves to make, so Player 1 wins.

Then consider this starting position:

4 16

Player 1's opening move is 12. Player 2 moves 8. Player 1 has no move to make, so Player 2 wins.

Your opening exercise: Play this game three times against a classmate. Take turns going first.

As you play, try to find a some pattern in how the game proceeds, maybe even a strategy for winning. Don't share your ideas with your opponent just yet!

Initial positions:

[24, 18]
[15, 4]
[96, 36]

... much fun ensues ...

Your next exercise: Play this game three more times against another classmate. Take turns choosing who goes first.

If you have a way of determining whether you want to go first or second, great -- but don't share it with your opponent just yet!

Initial positions:

[15, 35]
[20, 21]
[120, 420]

... more fun ensues ...

Now, tell us what you noticed...

Digression. Pragmatics of the game. If you can't find a move, you lose -- even if a move exists. (Eventually, there isn't.) Complexity and variety: large numbers, initial set has mopre than two numbers. "Knowing the truth" doesn't mean the game isn't fun, or a challenge.

Discussion

Did you notice that this game depends on the factors that two numbers share, and that relative primes lead to a particular sort of game?

It turns out that this game is a variation on the theme of Euclid's algorithm for finding the greatest common divisor (GCD) of two numbers. It is straightforward to show that the set of numbers which can be added to the Difference Game's starting pair is identical the set of numbers generated by Euclid's original GCD algorithm, which uses repeated subtraction in place of division. Only the order is changed.

The Big Question: If given the choice, would you choose to move first or second in this game? Why?
Answer: It depends. The number of numbers generated by Euclid's algorithm is equal to m/gcd(m, n), including the original pair. So the number of moves available for any pair is equal to m/gcd(m, n) - 2.

If m/gcd(m, n) is odd, then you want to go first. If it is even, you want to go second.

When the starting positions contain larger numbers and a large number of moves, this can be non-trivial. What sort of algorithm can you use to find moves? Is your algorithm top-down, bottom-up, or zoom-in? Why? How expensive is it -- O(n²)?

The idea of "bottom up" doesn't seem to apply to the problem of finding moves. (Maybe a linear search from 1 up?) The key here is in recognizing the invariant that lets us zoom in on a "choosing to go first" algorithm...

Three Ways to Find the GCD

Here are three algorithms for computing gcd(m, n), the greatest common divisor of two positive integers m and n. We will give an example of each operating on the case of m=70 and n=32.

Euclid's Modified Algorithm

    while n != 0
      r := m mod n 
      m := n
      n := r
    return m

Question: What happens if m < n?
Answer: It swaps them on the first pass!

Linear Search

    t := min(m, n)
    loop
      if m mod t = 0 and n mod t = 0
         return t
      t := t - 1

We might call this an informed brute-force algorithm. It is "brute-force" because it simply tries all possible answers in order. It is "informed" because it is smart enough not to start at 2.

The "Middle-School Procedure"

    f_m := sorted list of prime factors of m
    f_n := sorted list of prime factors of n
    c := common-factors(f_m, f_n)
    return product of c

This algorithm is often derided as a way of finding the GCD. Why? But remember the context in which we learn it... The goal of our middle school math class isn't computational efficiency, but understanding what GCD is and what it means!

~~~~

Look how different these algorithms are. There are always many ways to solve a problem, express an idea, and even implement the same algorithm in code. We encounter options and face trade-offs.

Analyzing the GCD Algorithms

Question: What is the big-Oh run-time efficiency for each of these algorithms in terms of the number of candidates considered?

For the Euclid's algorithm:

Best case: m mod n = 0, so it considers 1 candidate.

Usual case: The remainder is around n/2, so it considers fewer than O(log n) candidates.

Worst case: The remainder is always n-1, so it considers O(n) candidates.

For linear search:

Same best case.
Otherwise, O(n).

For the middle-school procedure... ?

Question: Is this a reasonable way to compare these algorithms? Why not?

The middle-school procedure doesn't search for candidates. It computes an answer in a much different way. But it turns out to be quite expensive computationally, due to the nature of its steps.
Its expense is in finding prime factors. But remember: middle-school students work with small numbers and thus have the set of prime numbers they need in hand.

Exercise: Write an algorithm to find the common elements in two sorted lists. You may assume the existence of standard list operations, such as add, remove, contains, etc.

Solution: Here is one. It takes as arguments two ordered lists, lst₁ and lst₂.

result ← {}
while lst₁ is not empty and lst₂ is not empty
1. if head of lst₁ == head of lst₂, then
  1. add head of lst₁ to result
  2. remove head of both lst₁ and lst₂
2. else remove the smaller of head of lst₁ and head of lst₂

Examples:

    lst₁ = (2, 5, 7)
    lst₂ = (2, 2, 3, 5, 11)

    lst₁ = factors(2¹⁰⁰)
    lst₂ = (11)

What is this algorithms complexity? It is O(n) in the size of the longer list. But what are the best-, average-, and worst- case scenarios for the size of the lists, given particular m and n?

Putting it all together, if we have a list of prime numbers in hand, the complexity of the "middle-school procedure" is:

Factoring m and n is O(log(m)).
Walking the lists is O( max( |factors of m|, |factors of n| ) ).
Overall complexity is O( max( log(m), max( |factors of m|, |factors of n|) ) ). Ordinarily, this will be O( |factors of m| ).

Question: What is a fairer way to compare the complexity of these algorithms? Count basic operations. Use m and n as ways to normalize the results.

Wrap Up

Reading -- Read the Wikipedia page for Euclid's algorithm, through the section labeled Background -- Greatest common divisor. (Go further if you wish!) You can skim over some of the heavier theoretical references such as 'Diophantine equations', but notice the large set of applications.

Homework -- Homework 1 is available and due one week from today.

Eugene Wallingford ..... wallingf@cs.uni.edu ..... January 23, 2014