Debugged instances of the Movie relation from Exam 1, Problem 8 are tables containing the favorite movies of Alice and Kurt:
name year length genre -------------------------------------------- Alice a1 The Big Chill 1983 105 drama a2 Blade Runner 1982 117 scifi a3 The Princess Bride 2020 98 fantasy a4 Lethal Weapon 1987 109 action a5 Black Panther 2018 134 action a6 The Princess Bride 1987 98 fantasy a7 The Replacements 2000 118 comedy a8 Star Trek 1979 132 scifi -------------------------------------------- Kurt k1 Black Panther 2018 134 action k2 The Princess Bride 1987 98 fantasy k3 The Replacements 2000 118 comedy k4 Star Trek 1979 132 scifi k5 Lethal Weapon 1987 109 action k6 Return of the Jedi 1983 131 scifi k7 48 Hrs. 1982 96 comedy k8 The Big Chill 1983 105 drama
Identify...
Could we use SQLite to help us find our answers?
To find out, I created a database containing the two tables, using this SQL script.
Even with our limited experience, we can write queries to solve the third and fourth problems. For #4:
sqlite> SELECT * FROM Kurt WHERE length > 120; Black Panther|2018|134|action Star Trek|1979|132|scifi Return of the Jedi|1983|131|scifiIf we want only the names, we can select only the names:
sqlite> SELECT name FROM Kurt WHERE length > 120; Black Panther Star Trek Return of the Jedi
For #3, we can select the genre from Alice's set without a condition:
sqlite> SELECT genre from Alice; drama scifi fantasy action action fantasy comedy scifiThat doesn't seem right... Relations are sets. When we select the genres for Alice, we should get a set, but we don't. This is one of those places where SQL does not implement the relational algebra correctly. We need to use a new modifier, DISTINCT:
sqlite> SELECT DISTINCT genre from Alice; drama scifi fantasy action comedy
We'll occasionally learn a little new SQL because SQL does not behave the way we expect it to. Welcome to programming.
Last week, we reviewed the basics of SQL and SQLite, which are the tools we will use to implement our database models in executable code. Then we learned the basic terms of relational algebra, a special-purpose algebra that enables use to construct new relations from existing relations. It is the theory that underlies SQL, the language we will use to query our relational databases.
Today we go a bit deeper with relational algebra, exploring its operators and ways we can express them in SQL.
Relational algebra is limited yet surprisingly powerful language for expressing relational queries. Its limitations both increase programmer productivity the run-time performance of our database management systems. It consists of a set of set operations over the domain of relations, which are a particular kind of set. We will use these operations to construct simple and compound expressions known as queries.
Last time, we saw that there are four broad classes of relational operators:
Let's consider each in more detail and begin to think about how the help us express queries.
We learn three basic set operations in grade school math:
Consider these two small instances of the Movie relation:
name year length genre -------------------------------------------- R The Big Chill 1983 105 drama Blade Runner 1982 117 scifi -------------------------------------------- S Blade Runner 1982 117 scifi The Princess Bride 2020 98 fantasy
What are (R ⋃ S), (R ⋂ S), (R – S), (S – R)?
The first two questions on the opening exercise involve these basic operations: intersection and difference, respectively. What question would a union answer?
Perhaps: the movies liked by either Alice or Kurt. This is the home library containing everyone's favorites.
Relations are two-dimensional tables. Rows are tuples, and columns are attributes. The two basic ways to remove information from a relation are to eliminate rows or eliminate columns.
In relational algebra, projection takes one relation and returns a new relation containing a subset of its columns. We use lowercase pi, π, as the projection operator:
πa1,...(R)
For example, using the relations R and S above, πname,length(R) produces:
name length --------------------------- The Big Chill 105 Blade Runner 117
The third question on the opening exercise asked for a projection: πgenre(Alice). In the relational algebra, all relations are sets. The result is {drama, scifi, fantasy, action, comedy}.
Selection is an operation that takes a relation R and returns a new relation containing a subset of its rows. The tuples in the new relation satisfy some condition C involving R's attributes.
We use lowercase sigma, σ, as the selection operator:
σC(R)
C is a conditional expression of the sort you use in traditional programming languages. It returns true or false when applied to a tuple. In relational algebra, a condition can refer only to constants or to attributes in the relation.
For example, using the relations R and S above, σgenre='scifi'(S) produces:
name year length genre -------------------------------------------- Blade Runner 1982 117 scifiThe same selection applied to the Alice relation from our exercise returns a relation with two tuples:
name year length genre -------------------------------------------- Blade Runner 1982 117 scifi Star Trek 1979 132 scifi
The fourth question on the opening exercise asked for a selection: σlength>120(Kurt).
We can also write compound conditions. To find the set of scifi movies longer then two hours that Kurt likes, we could say σlength>120 AND genre='scifi'(Kurt).
The little bit of SQL we learned last week gave us tools for implementing simple projections and selections:
Back in school, you may have learned another set operation that combines two sets: R ⨯ S, the product of two sets. This is also called the "Cartesian product" or the "cross product". R ⨯ S creates a set of all possible pairs where the first item in the pair is in R and the second item is in S. For example, the product of {a, b} and {c, d, e} is {(a,c), (a,d), (a,e), (b,c), (b,d), (b,e)}.
In the relational algebra, the sets are relations and work much the same. Rather than create an ordered pair, though, the product creates a longer tuple, containing all the attributes of R and all the attributes of S.
To see how this works, consider these relations:
A B C C D ----- R --- S 1 2 3 7 8 4 5 6 3 9 6 0The result of R ⨯ S will be this relation:
A B R.C S.C D --------------------- R ⨯ S 1 2 3 7 8 1 2 3 3 9 1 2 3 6 0 4 5 6 7 8 4 5 6 3 9 4 5 6 6 0Note that this relation pairs each tuple from R with every tuple from S. Both R and S have an attribute named C, so the relation has to invent new names. We will usually disambiguate conflicts of this sort with member-style dot notation.
Note also that this relation has six tuples and five attributes. There are two tuples in R paired with each of the three tuples in S, for |A| * |B| = 6 rows. R has three attributes and S two, which gives us |A attributes| + |B attributes| columns.
We saw a free-range example of the product at the end of Week 5's second session. Here is a simpler version:
sqlite> SELECT * FROM R, S; 1 2 3 7 8 1 2 3 3 9 1 2 3 6 0 4 5 6 7 8 4 5 6 3 9 4 5 6 6 0We'll have to create SQL tables for R and S to see this result!
We could do the same thing with Alice and Kurt's favorite movies:
sqlite> SELECT * FROM Alice, Kurt; The Big Chill|1983|105|drama|Black Panther|2018|134|action The Big Chill|1983|105|drama|The Princess Bride|1987|98|fantasy ... Star Trek|1979|132|scifi|48 Hrs.|1982|96|comedy Star Trek|1979|132|scifi|The Big Chill|1983|105|dramaThat's a long list. How many tuples does the new relation contain? How many attributes?
We will learn about other ways to combine relations in the next few sessions. You will learn about an expecially valuable one in this week's online session: the natural join.
At this point, we have seen simple "atoms" of SQL for expressing three of the relational algebra operators we learned today:
πa1,...(R) ~= SELECT a1,... from R; σC(R) ~= SELECT * FROM R WHERE C; R ⨯ S ~= SELECT * FROM R, S;
What about those basic set operators, union, intersection, and difference? SQL has specific operators for all three!
It would be nice if it were as simple as Alice ⋂ Kurt, but SQL doesn't allow us to refer to relations that way. We have to use SELECT expressions to create a relation to operate on:
sqlite> SELECT * FROM Alice UNION SELECT * FROM Kurt; 48 Hrs.|1982|96|comedy Black Panther|2018|134|action Blade Runner|1982|117|scifi Lethal Weapon|1987|109|action Return of the Jedi|1983|131|scifi Star Trek|1979|132|scifi The Big Chill|1983|105|drama The Princess Bride|1987|98|fantasy The Princess Bride|2020|98|fantasy The Replacements|2000|118|comedy sqlite> SELECT * FROM Alice INTERSECT SELECT * FROM Kurt; Black Panther|2018|134|action Lethal Weapon|1987|109|action Star Trek|1979|132|scifi The Big Chill|1983|105|drama The Princess Bride|1987|98|fantasy The Replacements|2000|118|comedyThe operator for set difference draws more on natural language mathematical terminology:
sqlite> SELECT * FROM Alice EXCEPT SELECT * FROM Kurt; Blade Runner|1982|117|scifi The Princess Bride|2020|98|fantasy sqlite> SELECT * FROM Kurt EXCEPT SELECT * FROM Alice; 48 Hrs.|1982|96|comedy Return of the Jedi|1983|131|scifi
So:
πa1,...(R) ~= SELECT a1,... from R; σC(R) ~= SELECT * FROM R WHERE C; R ⨯ S ~= SELECT * FROM R, S; R ⋃ S ~= SELECT * FROM R UNION SELECT * FROM S; R ⋂ S ~= SELECT * FROM R INTERSECT SELECT * FROM S; R – S ~= SELECT * FROM R EXCEPT SELECT * FROM S;
We won't use some of these expressions often, but perhaps they will help you as you learn and use relational algebra.