Design and Analysis of Algorithms

Remember graphs...

- G = (V, E)
- V is a set of vertices
- E is a set of edges, pairs of vertices
- such networks used to model all sorts of problems

Most of you studied trees in a data structures course. A tree is a special kind of graph: the edges are directed, there are no cycles, and there are no "tangles". The vocabulary of graphs should sound familiar. The basic search algorithms will, too.

This reading will only scratch the surface of graph algorithms. We will do more in later sessions this semester.

*Develop a divide-and-conquer algorithm to count the number
of leaves in a binary tree.*

Written in the textbook style, this algorithm seems unnatural to me:

algorithm leafCount(T) output number of leaves in T if T = ∅ return 0 else if T_{L}= ∅ and T_{R}= ∅ return 1 else return leafCount( T_{L}) + leafCount( T_{R})

What is the basic operation for this algorithm's efficiency?
**The comparison**. How many comparisons does this algorithm
make?

C(0) = 1 C(T) = 3 + C( T_{L}) + C( T_{R})

We can solve this kind of recurrence relation as we solve any other. The complication is that the two subtrees can have vastly different sizes and shapes. Let's save solving them for later.

Notice how an OOP implementation of trees changes the nature of the algorithm's run-time efficiency. Using polymorphic objects, empty trees will be objects. This shifts the decision of whether a tree is empty to object construction time... which makes algorithms such as these run faster because we don't need to make any comparisons made at all! In that case, what is the basic operation?

Divide/decrease-and-conquer are natural ways to work on
graphs, as the edges out of a vertex provide a natural way to
split of a graph into subgraphs. Processing all vertices --
as the standard pre-, in-, and post-order traversal algorithms
do -- can be implemented with divide-and-conquer. Searches
tend to be decrease-and-conquer. In particular, breadth- and
depth-first search are decrease-by-one algorithms. Search in
a binary search tree is decrease-by-half. Search in a
*k*-ary search tree is
decrease-by-(*k*-1)/*k*).

Consider this depth-first search algorithm for traversing a graph:

DFS(G) 1. for each v in V mark(v) ← 0 2. count ← 0 3. for each v in V if mark(v) = 0 dfs(v) dfs(v) 1. count ← count + 1 2. mark(v) ← count 3. for each w ∈ { vertices adjacent to v } if mark(w) = 0 dfs(w)

This is a decrease-by-one algorithm. It selects one vertex
and then visits the remaining *n*-1 vertices in the
same way.

**Quick Exercise**: Why do we need two procedures?
Why won't `dfs(v)` suffice?

... (Hint: Not all graphs are connected!)

*Use the DFS algorithm on above to traverse this graph:*

f -- b c -- g \ / \ / / d -- a -- e

Start at (a) and break ties in alphabetical order. Draw the DFS tree, and label each node with its order reached (pushed on stack) and order done (popped off stack).

A complex graph can give a simple depth-first traversal. The higher the connectivity of the graph (E/V), the more nodes we can "pick off" on a single DFS descent into the graph.

a 1 7 / \ b 2 3 c 5 6 | | d 3 2 g 6 5 | | f 4 1 e 7 4

In a directed graph, or **digraph**, the edges are "one-way
streets". Each edge is an **ordered** pair of vertices.
We can now speak of the **in-degree** and **out-degree**
of a vertex, the number of edges pointing to and from a vertex,
respectively. The digraph is also an incredibly useful tool
for modeling many problem domains.

How can we "sort" a digraph? We can sort the vertices or
edges according to their values, but that usually isn't all
that useful. What can be useful is to sort the graph in
*topological* order, that is, according the order its
edges impose on its vertices.

The edges of a graph create an ordering on the vertices that
works a bit like ** <**:

v_{i}< v_{j}iff there exists a path of vertices from v_{i}to v_{j}

Visually, this looks like means v_{i} < v_{j}
if and only if there is a path such as:

v_{i}→ a → b ... → v_{j}

A topological sorting of a graph lists its vertices in a way
that any time there is an edge (v_{i}, v_{j})
∈ V then v_{i} appears before v_{j} in our
list.

Here is an example graph and its topological sorting:

V = { c1, c2, c3, c4, c5 } E = { (c1, c3), (c2, c3), (c3, c4), (c3, c5), (c4, c5) } sorted list: c1 c2 c3 c4 c5

Not all digraphs can be sorted in this way, though. Any cycle among the vertices will create a problem. Consider this graph, which has just one more edge than the example above:

V = { c1, c2, c3, c4, c5 } E = { (c1, c3), (c2, c3), (c3, c4), (c3, c5), (c4, c5),(c5, c2)} sorted list: c1 c2 c3 c4 <c5>

Fortunately, many computing applications involve **directed
acyclic graphs** (DAGs)...

What might an algorithm that sorts a graph look like? One simple way is to use DFS to record the order in which vertices are popped from the traversal stack, then reverse that list to get the sorted list. To handle graphs with cycles, our algorithm should fail any time we pop a vertex that has an edge leading to a previously-visited vertex (called a "back" edge).

... *try it on the graphs above* ...

A nice decrease-by-one solution is to select vertices that have an in-degree of 0:

repeat until V is empty v = any vertex in V with degree_{IN}= 0 if none, then fail V = V - { v } E = E - { e | e = (v, v_{k}) for any v_{k}in V }

Will there always be such a vertex in a DAG? Yes. (Try to prove it by induction... It's actually quite fun!) And each time we remove such a vertex, the remaining DAG must have at least one, too.

... *try it on the graphs above* ...