This reading considers brute-force algorithms for some classic computer science problems. There are two goals: to use these algorithms to understand more about brute force in general, and to introduce classic problems that we will try to solve more efficiently in coming weeks.
String matching is an instance of search, a ubiquitous problem. Searching a list brute force is to do a sequential search: looking at each item in turn until we find the target. Such an approach is linear in the size of the list.
How easy is it to think of θ(n) as inefficient?
Many interesting problems demand more. For example, many problems in bioinformatics and text processing systems involve string matching, but the size of the strings involved makes brute-force sequential search unacceptable.
Consider this brute-force string matching algorithm:
input : T[0..n-1], a text P[0..m-1], a pattern output: position of first occurrence of P in T, or -1 for i ← 0 to n-m j ← 0 while j < m and P[j] = T [i+j] increment j if j = m return i return -1
Trace this algorithm for these inputs:
Quick Exercises:
Hints: Use a simple "alphabet" as the set of possible characters. Construct a simple concrete example, and then generalize it to strings of size m and n.
What would a non-brute force algorithm for this problem look like?
In the Knapsack Problem, you have n items that you would like to pack in a container of size W. Each item i has a value, vi, and a size, wi. We sometimes refer to 'size' as cost or weight. The goal is to fill the container with the most valuable set of items possible.
Suppose you have a knapsack capable of holding 10 units of weight, and the following four items:
What is the most valuable knapsack we can fill?
A brute-force algorithm would consider all possible combinations:
input : n, the number of items W, capacity of solution v[0..n-1], item values w[0..n-1], item weights output: m, a subset of n with the maximum value and total weight ≤ capacity m ← empty set for ( s : non-empty subsets of n ) if value(s) > value(m) and weight(s) < W m ← s return m
How many non-empty subsets of our items might this algorithm have to consider?
How many non-empty subsets of our items might this algorithm have to consider?
Quick Exercises:
This is a problem with surprisingly many real-world applications. In the Agile Software Development course, we will encounter it as the release commitment task. Here's another instance of the problem:
Suppose you have n widgets to sell at an Internet auction. Each prospective buyer i can bid on a lot si of widgets for a price pi. Each lot is fixed; that is, buyer i wants si widgets or none. Maximize the amount to be made at auction.
Take out the "Internet" reference, and you have an instance of maximizing the value a corporation's initial public offering, or IPO! (Google took a different approach that simplified the algorithm required...)
What would a non-brute force algorithm for the Knapsack Problem look like?
In the Assignment Problem, we have n people and n jobs to be done. Each person is able to do each job, but at different costs, C[i,j]. Assign each person a unique job such that the total cost is minimized.
Suppose we have this matrix of people and tasks:
W X Y Z Alice 9 2 7 8 Bob 6 4 3 7 Carl 5 8 1 8 Diane 7 6 9 4
Greed probably doesn't help here. There is no reason to believe that assigning Carl to Task Y -- the cheapest assignment -- will be part of an optimal assignment.
Quick Exercise: Construct an example of the assignment problem in which the optimal solution does not contain the smallest value in C.
Hint: a small example suffices...
What would a "brute force" solution to this problem be like? This isn't a subset problem... We could consider all possible permutations of the jobs, with the cost of job[i] being C[i, job[i]]. Here is one attempt:
input : C[0..n-1, 0..n-1], costs of person i doing job j output: {(i, j) : each i and each j occurs once and the sum of all C[i.j] is minimized } min ← infinity for ( p : permutations of [0..n-1] ) if Σ( C[i, p[i]]) < min answer ← p return answer
Applying this algorithm to our cost matrix above, we might consider assignments in order:
assignment to total Alice et al. cost WXYZ 18 best so far WXZY 30 WYXZ 29 WYZX 26 ... XWYZ 13 best so far ...
What would a non-brute force algorithm for the Assignment Problem look like?
String Matching.
Let P = 0m-11 ... and T = 0n ... The algorithm will make m(n-m+1) comparisons!
Knapsack Problem.
Items 1 and 2 fit in the bag perfectly and together have a value of $54. Items 3 and 4 fit with room to spare and have a value of $65. None of the subsets of size three or four fit in the sack, so {3, 4} is the best we can do.The basic operation in this algorithm is the comparison value(s) > value(m). The loop is O(m), where m is the number of non-empty subsets of n items.
But... how many many subsets are there? There are 2n of them. Remember the power set? Because we have to generate them, the overall complexity of the algorithm is O(2n).
Assignment Problem.
Here's a simple case, two people and two tasks:Y Z Alice 1 2 Bob 2 100Get greedy, and Bob will gouge you!