Consider these two tables:
RELATION: Course || RELATION: Student course student uid || uid name yearBorn ------ ------- ------ || ------ ------- ----------- cs1800 Frank 123464 || 123456 Felix 2001 cs1520 Frank 123464 || 123459 Claire 1998 cs1520 Angela 123480 || 123464 Frank 1997 cs1800 Mona 123471 || 123471 Mona 2002 cs1520 Felix 123456 || 123480 Angela 1996 || 123494 Papa 1980
Answer these questions:
Bonus exercise: Can your express the relational algebra queries in Numbers 1, 3, and 4 as simple questions in words?
Solutions for the problems are at the bottom of the page. No peeking until you have your own answers! People learn better when they are invested in the answer that is shown to them.
In class this week, we explored the basic operators of relational algebra: three set operations, two 'removal' operators (projection and selection), and one 'combination' operator (product). We also looked at ways that we can express the algebra's operators in SQL. Relational algebra is the theory that underlies SQL, but SQL does not mimic the algebra directly.
In this session, we explore relational algebra's renaming operator and learn a second, even more useful combination operator. Then we look at ways to combine queries into compound expressions of the sort that reflect the kinds of questions we want to be able to ask of our data.
The session deliverable is described at the bottom of the session. It's another check-in response, not some answer you produce.
When we apply relational operators, we create new relations. There are times when it would be convenient to specify the name of the resulting relation or the resulting relation's attributes.
Consider the example from class this week in which we computed R ⨯ S:
A B C C D ----- R --- S 1 2 3 7 8 4 5 6 3 9 6 0 A B R.C S.C D --------------------- R ⨯ S 1 2 3 7 8 1 2 3 3 9 1 2 3 6 0 4 5 6 7 8 4 5 6 3 9 4 5 6 6 0
The resulting relation has two attributes that are named C in the source relations R and S. To disambiguate the clash, the operator has to invent new names for the attributes in the new relation. We did so using member-style dot notation.
Another option is to give a name to the new attributes. Relational algebra include a renaming operator, ρ, that takes a relation and returns a new relation.
ρ_{R'(A1,...)}(R)The names used for the new relation, R', and its attributes, (A1,...), are given with the operator when it is applied.
In our product example, we can use ρ to rename the result:
ρ_{T(A,B,C1,C2,D)}(R ⨯ S)The operation names the relation that results from the product T, with attributes named A, B, C1, C2, and D.
If we only want to rename one part of a relation, we can omit a new relation name:
ρ_{(A,B,C1,C2,D)}(R)or a new attribute list:
ρ_{name}(R)
Renaming can usually be done at different stages of a compound expression. In our example above, we renamed the result of the cross product:
ρ_{T(A,B,C1,C2,D)}(R ⨯ S)We could just as easily rename the columns of S before computing the product:
R ⨯ ρ_{(E,D)}(S)This renames S's C columns as D, and then computes the product. The result is:
A B C E D --------------------- R ⨯ S 1 2 3 7 8 1 2 3 3 9 1 2 3 6 0 4 5 6 7 8 4 5 6 3 9 4 5 6 6 0
We will look at compound expressions after looking at one more operator, the natural join. In general, there will be multiple ways to construct any compound expression to compute any given result.
Last session, we learned how to combine tables using the product operator, which creates a new relation with all combinations of the tuples in its two operands. More frequently, though, we want only the combinations that match up in some way.
For example, if two tables both have a student number column, then we might want to see combinations where the student numbers match. This would be a way to create a new table with combined information about the student.
Or think back to multi-table query in Online Session 5-2. We had one table of customer data, keyed on social security number, and we had another table of address data that had a social security number column to indicate who lived at each address. Customer ⨯ Address would contain many irrelevant combinations... What we wanted was a table that contained all the tuples with matching social security numbers combined.
Relational algebra provides an operator that does just this, ⋈, which computes the natural join of two relations. R ⋈ S, pairs only the tuples that have the same values for the attributes they have in common.
Let a1, a2, ... an be the attributes that occur in both R and S. The natural join finds tuples r from R and s from S that have identical the values for all of a1, a2, ... an. It then creates a joined tuple containing a1, a2, ... an as well as the other attributes from r and s.
This is a little hard to explain with getting all mathematical, but an example can make the idea clear.
Consider again the example from class in which we computed R ⨯ S:
A B C C D ----- R --- S 1 2 3 7 8 4 5 6 3 9 6 0
The product of these relations contains six tuples. However, both relations have an attribute named C. There are tuples in the two relations with matching values for C (in bold). If this attribute means the same thing in both relations, then we may be interested only in the combinations that match these tuples.
This is the natural join of R and S. The expression R ⋈ S produces this relation:
A B C D --------------------- R ⋈ S 1 2 3 9 4 5 6 0It combines the tuples with matching values for C and thus needs only one column for C in the result.
There is one tuple in S that does not pair with any tuple in R: (7 8). This dangling tuple does not occur in R ⋈ S.
This example is so small that it never pairs a tuple from R with more than one tuple from S, but that is certainly possible in a natural join. Consider the Customer and Address relations from our example database in Week 5:
ssNo name CUSTOMER ----------- ----- 123-45-6789 Eugene Wallingford 321-45-6789 John Smith areaCode number owner PHONE -------- -------- ----------- 319 273-5919 123-45-6789 517 452-9159 321-45-6789 319 273-2618 123-45-6789 street city state resident ADDRESS -------- ----------- ----- ----------- 339 WRT Cedar Falls IA 321-45-6789 1 Elm St Plainfield IN 321-45-6789 305 WRT Cedar Falls IA 123-45-6789Joining Customer with Phone would pair Eugene with two phone numbers. Joining Customer with Address would pair John with two addresses.
One last note: All of our examples thus far have joined relations with only one common attribute. Natural join also handles cases where the overlap between R's attributes and S's attributes contains two or more attributes. In these cases, paired tuples will match on the values of all the common attributes. We'll see examples of this later.
Using relational operators to compute new relations is interesting enough, but the real power of the algebra lies in being able to write compound expressions that compute answers to more challenging queries. As in any algebra, we can use the relation computed by one operator as an argument to another operator.
Conider the movie relations from our opening exercise in class this week.
name year length genre -------------------------------------------- Alice a1 The Big Chill 1983 105 drama a2 Blade Runner 1982 117 scifi a3 The Princess Bride 2020 98 fantasy a4 Lethal Weapon 1987 109 action a5 Black Panther 2018 134 action a6 The Princess Bride 1987 98 fantasy a7 The Replacements 2000 118 comedy a8 Star Trek 1979 132 scifi -------------------------------------------- Kurt k1 Black Panther 2018 134 action k2 The Princess Bride 1987 98 fantasy k3 The Replacements 2000 118 comedy k4 Star Trek 1979 132 scifi k5 Lethal Weapon 1987 109 action k6 Return of the Jedi 1983 131 scifi k7 48 Hrs. 1982 96 comedy k8 The Big Chill 1983 105 drama
Suppose that we wanted to know the title and year of all the scifi movies Alice and Kurt like that are less than two hours long. Let's define the Movies relation as the union of Akices and Kurt's tables: (Alice ⋃ Kurt). Then we can build a query to answer this question using our relational operators in this way:
Each step builds a solution out of the solutions to previous steps:
1. σ_{length < 120}(Movies) 2. σ_{genre = 'scifi'}(Movies) 3. (σ_{length < 120}(Movies)) ⋂ (σ_{genre = 'scifi'}(Movies)) 4. π_{title,year} ((σ_{length < 120}(Movies)) ⋂ (σ_{genre = 'scifi'}(Movies)))
This sequence of expressions forms a tree of expressions and sub-expressions:
Movies Movies | | | | | | σ_{length < 120} σ_{genre = 'scifi'} \ / \ / \ / ⋂ | | | π_{title,year}
Relational algebra builds expressions out of other expressions, without variables or sequences of statements. In this sense, it resembles functional programming, a style of programming you learn in Programming Languages. We could add variables and sequences of statements to our own notation, but that wouldn't be relational algebra -- and, worse, that would undermine many of the advantages of using relational algebra in the first place.
One of those advantages is being able to manipulate expressions using identity rules to create different queries that compute the same result. For example, consider the query we just constructed:
π_{title,year} ((σ_{length < 120}(Movies)) ⋂ (σ_{genre = 'scifi'}(Movies)))We could write a functionally equivalent query using a single selection with a compound query
π_{title,year} (σ_{length < 120 AND genre = 'scifi'}(Movies))The second query describes the same result as the first.
Which one is better? That really depends. As SQL programmers, we often delegate that question to our DBMS and let it decide which query is better given its implementation. Every relational DBMS performs query optimization that translates queries into equivalent expressions that can be evaluated more efficiently. We as programmers can use the same translation ideas to help us write clearer and, yes, more efficient queries of our own.
Do you want more practice? Use SQLite to compute the answers to these queries.
Return to the rest of the session.
If you know the name of the book from which I took the students' names in the opening exercise, let me know! If you've read that book, good for you. We have something in common. (It's one of my favorites; I have read it many times...)
When you finish working through this session, email me to let me know you are done. Let me know which part of the session is least clear. Perhaps that is something you don't think you understand well yet, or it is something you understand but feel like the notes could do a better job explaining. Feel free to include as many questions or comments as you'd like!