CS 3530 Session 1

Session 1

A Gentle Introduction to Algorithms

CS 3530
Design and Analysis of Algorithms

I'll be updating this document in the next day or so, to fill in some details and to correct some errors. I'm posting this now so that you can see the reading and homework and start following the course.

An Opening Exercise

I have seven pieces of paper. Keep that in mind. It will be important.

Now, the task ...

This is the 14th day of 14th year of the millennium. Let's find the person in class whose birthday comes 14th in the year among the entire class.

Everyone stand up. Pick a pivot. Write birthday on board.

If birthday is before, raise hand. (Count, record, choose half.)

All in other half, sit down. Repeat: Pick a pivot. Write birthday on board.

If birthday is before, raise hand. (Count, record, choose half.)

Repeat until one person is standing, or we otherwise know the answer.

E-mail me your birthday -- MM/DD, only. I'll check to see if we found the correct answer.

Welcome to the Course

Welcome to CS 3530, Design and Analysis of Algorithms. I am Eugene Wallingford, and I'll be your instructor for this course. Mosty of you have received e-mail from me before...

I have passed out a short sheet of "vital statistics" that contains basic contact information for me and the course. The most important piece of data on that sheet is URL of the course web page:

http://www.cs.uni.edu/~wallingf/teaching/cs3530/

The web site there includes pointers to a full syllabus, all the materials you'll need for the course (including lecture notes, homework assignments, and quizzes), and links to programming material and other resources for the course. Keep this sheet with you at all times, and set a bookmark to the web page in your favorite web browser. You never know when the urge to study algorithms may strike you!

Study the course syllabus carefully, especially if you have never had me for a course. It lists the policies by which we will run this course. You will need to know these policies and when they apply. You will also find a rough schedule for the semester on the last page, including very tentative dates for three to five quizzes.

Some points that you should pay special attention to include:

You can contact me via my web page, via e-mail, and during my office hours.

We have a course mailing list, through which we can discuss course topics, ask and answer questions, and so on. The e-mail address of this list is cs3530@cs.uni.edu. By default, you are subscribed to this listserv from your @uni.edu e-mail address. If you intend to send mail from some other address regularly, you will want to subscribe from that other address. Let me know.

We do not have a required test. The text I've used in the past is pretty good, but it's not substantially different than it was ten years ago while being more expensive than I like. If you would like to buy a copy -- new, used, or an older edition -- it will serve as a useful reference and expand on what we do in class. I will also supply URLs and PDFs on occasion.

This course is tough. I assign challenging work and set a high standard for achievement. But I think that you will find that your effort is adequately reflected in your grade. Warning: You must complete all of the assignments in order to pass the course. To earn a desirable grade, you will want to understand them all as well as possible.

Late work is not accepted for grading.

I encourage discussion of class material and homework problems. I also take very seriously the issue of academic honesty. You may share ideas, but you may never share code or answers.

Now, back to the fun stuff.

What We Did

What did our opening exercise do? Did my actions remind you of anything?

We selected 14th item out of a set of 50-some items. One simple approach to this task uses knowledge that you already have: Sort the birthdays in an array, then goto Slot 14. This is simple but seems like a lot of unnecessary work. Sorting is expensive, and we don't need that much information. Are there other ways to approach?

Our approach borrow ideas from quicksort to do a binary search in an unsorted set!

The algorithm, informally:

We are given a set of items and an integer n between 1 and the size of the set.

While true do:
1. If set size is one, return that item.
2. Select a pivot.
3. Partition the set into two subsets: item's value ≤ pivot's value and item's value ≥ pivot's value.
4. Select the subset that contains the nth item.
5. Goto 1.

This uses the partition and select ideas from quicksort. It uses the target focus and termination ideas from binary search.

Finding the median in a set, or more generally the "kth value", is a common problem in many real-world domains... Data anlytics is one.

Quicksort and binary search are common algorithms. They are also exemplars of common themes in algorithm design. When we know instructive themes and instructive algorithms, we are able to use their ideas to solve new problems.

More About The Course

This course is different than any other I teach. It isn't about a particular programming language or style. It's not "object-oriented" or "functional". It is theoretical, in a sense, not applied -- not about software practice or programs themselves.

In some ways, though, you will find much similar here, too. This course is about design, the act of creating something. And the algorithms we design ultimately have to be implemented in real programs that run on real computers, and it turns out to be not so easy to implement some algorithms well. The course will be driven by problems, and we will seek to find and understand patterns in the solutions we design.

I will look for cool stories, puzzles, and games to illustrate all of our topics, but I make no promises that I'll find one for every day or that, when I do, we will all think they are cool. I do promise that I'll try to select puzzles and games that are easy enough for you to solve or play, at least to some degree of success.

The one we worked on today would have been a bear to solve on your own... Yet it seems obvious after solved. Algorithmic problems and solutions can be like that. You've probably experienced that before in other CS courses.

Some details:

I expect that you know how to write programs in some modern language. Python, Ada, and Java will suffice, as will many others.
(If you know Scheme, Racket, or another functional language, you may consider using it. These languages can give elegant solutions.)

I expect that you know and understand basic data structures. I expect that you are able to implement basic structures and use them as elements in larger programs.

You may program in any language you wish, so long as we have tools for compiling and running them on the CS servers and computers.

Now, back to the fun stuff.

Time, Space, and Code

There was an obvious way to solve this problem using what you already know: sort the set, then select the item you desire. How long will that take?

In 1520, you learn that sorting is generally O(n log n) time. Selecting the kth item is O(1). Doing these in sequence gives O(n log n).

(Quick review of Big O, constants.)

But under what conditions does this hold? Is sorting always O(n log n)? If you are using a linked list, what is the complexity of selecting the kth item?

Environmental conditions matter.

... the relationship of algorithms to data structures. We need to know instructive data structures, too.

How much time does our algorithm take? Quicksort is O(n log n), but we don't have to sort the partitions -- only create them!

If our pivots are good ones, they split the set roughly in half on each pass. The first time, we process ~ n items. On the second, we process ~ n/2 items, then ~ n/4 items, then ~ n/8 items, and so on, down to 1.

The sum (n + n/2 + n/4 + n/8 + ... 1) is less than 2*n. So our algorithm is O(n). (!)

O(n) is the algorithm's best case performance, as well as its average case. What is the worst case?

In the worst case, we decrease the set's size by exactly one element on each pass. This can result from bad pivots, regularities in the data, or a combination of the two. For example, consider the situation where the set is already sorted, k = n, and we use the first element as the pivot each time.

On the first pass, we process ~ n items. On the second, we process ~ n-1 items, then ~ n-2 items, then ~ n-3 items, and so on, down to 1. The sum (n + n-1 + n/-2 + n-3 + ... 1) is approximately n^2 / 2. So, in the worst case, our algorithm is O(n^2). (!)

What about space? If we use an array, we can implement our algorithm "in place", meaning that we don't need to make copies of the array on each pass. So the space we need to store the set is O(n).

What about the space needed to run the algorithm?

If we implement the algorithm using a loop, then the only overhead we need is the set of local variables, which is O(1).

If we implement the algorithm using a recursively, though, it appears that we need a new activation record on the run-time stack for each recursive call. There are ~ log n calls, so we may need O(log n) overhead for the stack. This is true in some languages.
Other languages take advantage of the fact that our algorithm is tail recursive, a special kind of recursion in which there is no computation left to do after the recursive call. Such languages compile such an algorithm in a way that looks a lot like a loop, and as a result the overhead required by our algorithm would still be O(1).

This is an example of what I said earlier about theory coming into contact with real programs, and thus programming languages, operating systems, and human beings....

Ideas to expand:

Some algorithms hard to implement correctly. ... even basic ones, such as binary search. ... especially when all cases are considered.
... We may have the wrong person as #14 due to any number of possible programming error, including the quality of my instructions and run-time mistakes by my actors!

The language you use can matter. ... tail-call elimination. ... but also basic primitives, the number and quality of built-in data structures, level of abstraction. The languages we know matter.

Being able to write correct programs is as important as the theory that drives the design of our algorithms.

A Summary of the Course

Our opening exercise and running example offer a microcosm of the entire course. We will encounter a problem and design one or more algorithms for solving it. We will analyze our algorithms to determine their performance. We will also implement our algorithms in programs and use them, sometimes to run experiments. The experiments will help us learn about our algorithms' behavior under real conditions, and also verify that our analysis.

Hero Worship

"Our" algorithm is called quickselect. It was created by Anthony Hoare, the creator of quicksort. You'll see Hoare's name show up frequently in your study of algorithms, along with a few others: Dikjstra, Knuth, Euler, .... You may grow fond of their work.

The solution to the opening exercise I used the last time I taught this course was first worked out by Robert Floyd. Floyd is one of my favorite computer scientists. I'd even call myself a fan of his.

I am a big sports fan and have always had my favorite sports stars: Larry Bird, George McGinnis, Johnny Bench, Dan Marino... As a child, I had posters of Bench and McGinnis and Walt Frazier on my bedroom walls. But a fan of computer scientists? I've found myself drawn to the work of several: Alan Kay, Ward Cunningham, Herb Simon, B. Chandrasekaran -- and Robert Floyd.

Why Floyd?

He was a master at designing solutions to real problems, across the breadth of computing. We see his influences today in cryptography, programming languages, compilers, operating systems and algorithms.

He always sought to generalize his results and to understand the principles that underlay them. He is a great example of reflective practice and clear writing to teach the rest of us.

He thought and wrote about ideas that foreshadowed modern practices such as patterns and refactoring more than 25 years ago!

Don't be afraid to have computer science heros. It can enrich your study, and drive you to learn more.

References

The idea for our opening exercise comes from Quickselect FTW!, a blog entry by Chris Okasaki. Okasaki is best know for work on algorithms that work on immutable data structures.

You can read more about Quickselect on Wikipedia. The Wikipedia page has much more detail than you may be ready for at this point, but it is still worth reading now. You will a few things and begin to see some of the things we will study this semester.

Wrap Up

Reading -- Read the two web pages linked above in the References section above.

Homework -- There is no "real" homework yet. But to get us started on the course, please send me an e-mail message containing at least two things:
- your birthdate (MM/DD is all I need)
- at least one question you have about the reading
Send me this message by 10 AM on Thursday.

Eugene Wallingford ..... wallingf@cs.uni.edu ..... January 15, 2014