I'll be updating this document in the next day or so, to fill in some details and to correct some errors. I'm posting this now so that you can see the reading and homework and start following the course.
I have seven pieces of paper. Keep that in mind. It will be important.
Now, the task ...
This is the 14th day of 14th year of the millennium. Let's find the person in class whose birthday comes 14th in the year among the entire class.
Everyone stand up. Pick a pivot. Write birthday on board.
If birthday is before, raise hand. (Count, record, choose half.)
All in other half, sit down. Repeat: Pick a pivot. Write birthday on board.
If birthday is before, raise hand. (Count, record, choose half.)
Repeat until one person is standing, or we otherwise know the answer.
E-mail me your birthday -- MM/DD, only. I'll check to see if we found the correct answer.
Welcome to CS 3530, Design and Analysis of Algorithms. I am Eugene Wallingford, and I'll be your instructor for this course. Mosty of you have received e-mail from me before...
I have passed out a short sheet of "vital statistics" that contains basic contact information for me and the course. The most important piece of data on that sheet is URL of the course web page:
http://www.cs.uni.edu/~wallingf/teaching/cs3530/
The web site there includes pointers to a full syllabus, all the materials you'll need for the course (including lecture notes, homework assignments, and quizzes), and links to programming material and other resources for the course. Keep this sheet with you at all times, and set a bookmark to the web page in your favorite web browser. You never know when the urge to study algorithms may strike you!
Study the course syllabus carefully, especially if you have never had me for a course. It lists the policies by which we will run this course. You will need to know these policies and when they apply. You will also find a rough schedule for the semester on the last page, including very tentative dates for three to five quizzes.
Some points that you should pay special attention to include:
Now, back to the fun stuff.
What did our opening exercise do? Did my actions remind you of anything?
We selected 14th item out of a set of 50-some items. One simple approach to this task uses knowledge that you already have: Sort the birthdays in an array, then goto Slot 14. This is simple but seems like a lot of unnecessary work. Sorting is expensive, and we don't need that much information. Are there other ways to approach?
Our approach borrow ideas from quicksort to do a binary search in an unsorted set!
The algorithm, informally:
This uses the partition and select ideas from quicksort. It uses the target focus and termination ideas from binary search.
Finding the median in a set, or more generally the "kth value", is a common problem in many real-world domains... Data anlytics is one.
Quicksort and binary search are common algorithms. They are also exemplars of common themes in algorithm design. When we know instructive themes and instructive algorithms, we are able to use their ideas to solve new problems.
This course is different than any other I teach. It isn't about a particular programming language or style. It's not "object-oriented" or "functional". It is theoretical, in a sense, not applied -- not about software practice or programs themselves.
In some ways, though, you will find much similar here, too. This course is about design, the act of creating something. And the algorithms we design ultimately have to be implemented in real programs that run on real computers, and it turns out to be not so easy to implement some algorithms well. The course will be driven by problems, and we will seek to find and understand patterns in the solutions we design.
I will look for cool stories, puzzles, and games to illustrate all of our topics, but I make no promises that I'll find one for every day or that, when I do, we will all think they are cool. I do promise that I'll try to select puzzles and games that are easy enough for you to solve or play, at least to some degree of success.
The one we worked on today would have been a bear to solve on your own... Yet it seems obvious after solved. Algorithmic problems and solutions can be like that. You've probably experienced that before in other CS courses.
Some details:
(If you know Scheme, Racket, or another functional language, you may consider using it. These languages can give elegant solutions.)
Now, back to the fun stuff.
There was an obvious way to solve this problem using what you already know: sort the set, then select the item you desire. How long will that take?
In 1520, you learn that sorting is generally O(n log n) time. Selecting the kth item is O(1). Doing these in sequence gives O(n log n).
(Quick review of Big O, constants.)
But under what conditions does this hold? Is sorting always O(n log n)? If you are using a linked list, what is the complexity of selecting the kth item?
Environmental conditions matter.
... the relationship of algorithms to data structures. We need to know instructive data structures, too.
How much time does our algorithm take? Quicksort is O(n log n), but we don't have to sort the partitions -- only create them!
If our pivots are good ones, they split the set roughly in half on each pass. The first time, we process ~ n items. On the second, we process ~ n/2 items, then ~ n/4 items, then ~ n/8 items, and so on, down to 1.
The sum (n + n/2 + n/4 + n/8 + ... 1) is less than 2*n. So our algorithm is O(n). (!)
O(n) is the algorithm's best case performance, as well as its average case. What is the worst case?
In the worst case, we decrease the set's size by exactly one element on each pass. This can result from bad pivots, regularities in the data, or a combination of the two. For example, consider the situation where the set is already sorted, k = n, and we use the first element as the pivot each time.
On the first pass, we process ~ n items. On the second, we process ~ n-1 items, then ~ n-2 items, then ~ n-3 items, and so on, down to 1. The sum (n + n-1 + n/-2 + n-3 + ... 1) is approximately n^2 / 2. So, in the worst case, our algorithm is O(n^2). (!)
What about space? If we use an array, we can implement our algorithm "in place", meaning that we don't need to make copies of the array on each pass. So the space we need to store the set is O(n).
What about the space needed to run the algorithm?
Other languages take advantage of the fact that our algorithm is tail recursive, a special kind of recursion in which there is no computation left to do after the recursive call. Such languages compile such an algorithm in a way that looks a lot like a loop, and as a result the overhead required by our algorithm would still be O(1).
This is an example of what I said earlier about theory coming into contact with real programs, and thus programming languages, operating systems, and human beings....
Ideas to expand:
... We may have the wrong person as #14 due to any number of possible programming error, including the quality of my instructions and run-time mistakes by my actors!
Being able to write correct programs is as important as the theory that drives the design of our algorithms.
Our opening exercise and running example offer a microcosm of the entire course. We will encounter a problem and design one or more algorithms for solving it. We will analyze our algorithms to determine their performance. We will also implement our algorithms in programs and use them, sometimes to run experiments. The experiments will help us learn about our algorithms' behavior under real conditions, and also verify that our analysis.
"Our" algorithm is called quickselect. It was created by Anthony Hoare, the creator of quicksort. You'll see Hoare's name show up frequently in your study of algorithms, along with a few others: Dikjstra, Knuth, Euler, .... You may grow fond of their work.
The solution to the opening exercise I used the last time I taught this course was first worked out by Robert Floyd. Floyd is one of my favorite computer scientists. I'd even call myself a fan of his.
I am a big sports fan and have always had my favorite sports stars: Larry Bird, George McGinnis, Johnny Bench, Dan Marino... As a child, I had posters of Bench and McGinnis and Walt Frazier on my bedroom walls. But a fan of computer scientists? I've found myself drawn to the work of several: Alan Kay, Ward Cunningham, Herb Simon, B. Chandrasekaran -- and Robert Floyd.
Why Floyd?
Don't be afraid to have computer science heros. It can enrich your study, and drive you to learn more.
The idea for our opening exercise comes from Quickselect FTW!, a blog entry by Chris Okasaki. Okasaki is best know for work on algorithms that work on immutable data structures.
You can read more about Quickselect on Wikipedia. The Wikipedia page has much more detail than you may be ready for at this point, but it is still worth reading now. You will a few things and begin to see some of the things we will study this semester.
Send me this message by 10 AM on Thursday.