## Session 7

### Opening Exercise

Debugged instances of the Movie relation from Exam 1, Problem 8 are tables containing the favorite movies of Alice and Kurt:

name                 year   length   genre
--------------------------------------------     Alice
a1 The Big Chill        1983   105      drama
a2 Blade Runner         1982   117      scifi
a3 The Princess Bride   2020   98       fantasy
a4 Lethal Weapon        1987   109      action
a5 Black Panther        2018   134      action
a6 The Princess Bride   1987   98       fantasy
a7 The Replacements     2000   118      comedy
a8 Star Trek            1979   132      scifi
--------------------------------------------     Kurt
k1 Black Panther        2018   134      action
k2 The Princess Bride   1987   98       fantasy
k3 The Replacements     2000   118      comedy
k4 Star Trek            1979   132      scifi
k5 Lethal Weapon        1987   109      action
k6 Return of the Jedi   1983   131      scifi
k7 48 Hrs.              1982   96       comedy
k8 The Big Chill        1983   105      drama

Identify...

1. the movies that both Alice and Kurt like
2. the movies that Kurt likes, but not Alice
3. the genres that Alice likes
4. the movies longer than two hours that Kurt likes
You can use the tuple names for 1, 2, and 4.

### Opening Solution

1. The movies that both Alice and Kurt like.
a1 and a4 through a8 are on both lists.

2. The movies that Kurt likes, but not Alice.
With so many favorites in common, there are only two that each likes but not the other. k6 and k7 are unique to Kurt.

3. The genres that Alice likes.
There are five: drama, scifi, fantasy, action, and comedy.

4. The movies longer than two hours that Kurt likes.
Only three: k1, k4, and k6.

Could we use SQLite to help us find our answers?

To find out, I created a database containing the two tables, using this SQL script.

Even with our limited experience, we can write queries to solve the third and fourth problems. For #4:

sqlite> SELECT * FROM Kurt WHERE length > 120;
Black Panther|2018|134|action
Star Trek|1979|132|scifi
Return of the Jedi|1983|131|scifi
If we want only the names, we can select only the names:
sqlite> SELECT name FROM Kurt WHERE length > 120;
Black Panther
Star Trek
Return of the Jedi

For #3, we can select the genre from Alice's set without a condition:

sqlite> SELECT genre from Alice;
drama
scifi
fantasy
action
action
fantasy
comedy
scifi
That doesn't seem right... Relations are sets. When we select the genres for Alice, we should get a set, but we don't. This is one of those places where SQL does not implement the relational algebra correctly. We need to use a new modifier, DISTINCT:
sqlite> SELECT DISTINCT genre from Alice;
drama
scifi
fantasy
action
comedy

We'll occasionally learn a little new SQL because SQL does not behave the way we expect it to. Welcome to programming.

### Where Are We?

Last week, we reviewed the basics of SQL and SQLite, which are the tools we will use to implement our database models in executable code. Then we learned the basic terms of relational algebra, a special-purpose algebra that enables use to construct new relations from existing relations. It is the theory that underlies SQL, the language we will use to query our relational databases.

Today we go a bit deeper with relational algebra, exploring its operators and ways we can express them in SQL.

### Review: The Operators of Relational Algebra

Relational algebra is limited yet surprisingly powerful language for expressing relational queries. Its limitations both increase programmer productivity the run-time performance of our database management systems. It consists of a set of set operations over the domain of relations, which are a particular kind of set. We will use these operations to construct simple and compound expressions known as queries.

Last time, we saw that there are four broad classes of relational operators:

• basic set operators
• operators that remove parts of a relation
• operators that combine two relations
• an operator that renames objects

Let's consider each in more detail and begin to think about how the help us express queries.

### Basic Set Operators

We learn three basic set operations in grade school math:

• R ⋃ S : union, the set of all items that are in R, in S, or both
• R ⋂ S : intersection, the set of all items that are both in R and in S
• R – S : difference, the set of items that are in R but not in S

Consider these two small instances of the Movie relation:

name                 year   length   genre
--------------------------------------------     R
The Big Chill        1983   105      drama
--------------------------------------------     S
The Princess Bride   2020   98       fantasy

What are (R ⋃ S), (R ⋂ S), (R – S), (S – R)?

The first two questions on the opening exercise involve these basic operations: intersection and difference, respectively. What question would a union answer?

Perhaps: the movies liked by either Alice or Kurt. This is the home library containing everyone's favorites.

### Operations that Remove Information

Relations are two-dimensional tables. Rows are tuples, and columns are attributes. The two basic ways to remove information from a relation are to eliminate rows or eliminate columns.

In relational algebra, projection takes one relation and returns a new relation containing a subset of its columns. We use lowercase pi, π, as the projection operator:

πa1,...(R)

For example, using the relations R and S above, πname,length(R) produces:

name                 length
---------------------------
The Big Chill        105

The third question on the opening exercise asked for a projection: πgenre(Alice). In the relational algebra, all relations are sets. The result is {drama, scifi, fantasy, action, comedy}.

Selection is an operation that takes a relation R and returns a new relation containing a subset of its rows. The tuples in the new relation satisfy some condition C involving R's attributes.

We use lowercase sigma, σ, as the selection operator:

σC(R)

C is a conditional expression of the sort you use in traditional programming languages. It returns true or false when applied to a tuple. In relational algebra, a condition can refer only to constants or to attributes in the relation.

For example, using the relations R and S above, σgenre='scifi'(S) produces:

name                 year   length   genre
--------------------------------------------
The same selection applied to the Alice relation from our exercise returns a relation with two tuples:
name                 year   length   genre
--------------------------------------------
Star Trek            1979   132      scifi

The fourth question on the opening exercise asked for a selection: σlength>120(Kurt).

We can also write compound conditions. To find the set of scifi movies longer then two hours that Kurt likes, we could say σlength>120 AND genre='scifi'(Kurt).

The little bit of SQL we learned last week gave us tools for implementing simple projections and selections:

• To project, we can write SELECT attribute with no WHERE clause.
• To select, we can write SELECT * WHERE and express the condition C as the condition on the WHERE clause.
We will be extending our knowledge of how to express relational algebra using SQL in the coming sessions. We already extended it once today: to ensure that a projection returns a set, we need to use the DISTINCT clause.

### Operations that Combine Relations

Back in school, you may have learned another set operation that combines two sets: R ⨯ S, the product of two sets. This is also called the "Cartesian product" or the "cross product". R ⨯ S creates a set of all possible pairs where the first item in the pair is in R and the second item is in S. For example, the product of {a, b} and {c, d, e} is {(a,c), (a,d), (a,e), (b,c), (b,d), (b,e)}.

In the relational algebra, the sets are relations and work much the same. Rather than create an ordered pair, though, the product creates a longer tuple, containing all the attributes of R and all the attributes of S.

To see how this works, consider these relations:

A B C           C D
-----  R        ---  S
1 2 3           7 8
4 5 6           3 9
6 0
The result of R ⨯ S will be this relation:
A    B   R.C  S.C   D
---------------------  R ⨯ S
1    2    3    7    8
1    2    3    3    9
1    2    3    6    0
4    5    6    7    8
4    5    6    3    9
4    5    6    6    0
Note that this relation pairs each tuple from R with every tuple from S. Both R and S have an attribute named C, so the relation has to invent new names. We will usually disambiguate conflicts of this sort with member-style dot notation.

Note also that this relation has six tuples and five attributes. There are two tuples in R paired with each of the three tuples in S, for |A| * |B| = 6 rows. R has three attributes and S two, which gives us |A attributes| + |B attributes| columns.

We saw a free-range example of the product at the end of Week 5's second session. Here is a simpler version:

sqlite> SELECT * FROM R, S;
1    2    3    7    8
1    2    3    3    9
1    2    3    6    0
4    5    6    7    8
4    5    6    3    9
4    5    6    6    0
We'll have to create SQL tables for R and S to see this result!

We could do the same thing with Alice and Kurt's favorite movies:

sqlite> SELECT * FROM Alice, Kurt;
The Big Chill|1983|105|drama|Black Panther|2018|134|action
The Big Chill|1983|105|drama|The Princess Bride|1987|98|fantasy
...
Star Trek|1979|132|scifi|48 Hrs.|1982|96|comedy
Star Trek|1979|132|scifi|The Big Chill|1983|105|drama
That's a long list. How many tuples does the new relation contain? How many attributes?

We will learn about other ways to combine relations in the next few sessions. You will learn about an expecially valuable one in this week's online session: the natural join.

### Summary: Expressing Relational Algebra Operators in SQL

At this point, we have seen simple "atoms" of SQL for expressing three of the relational algebra operators we learned today:

πa1,...(R)   ~= SELECT a1,... from R;
σC(R)       ~= SELECT * FROM R WHERE C;
R ⨯ S       ~= SELECT * FROM R, S;

What about those basic set operators, union, intersection, and difference? SQL has specific operators for all three!

It would be nice if it were as simple as Alice ⋂ Kurt, but SQL doesn't allow us to refer to relations that way. We have to use SELECT expressions to create a relation to operate on:

sqlite> SELECT * FROM Alice
UNION
SELECT * FROM Kurt;
48 Hrs.|1982|96|comedy
Black Panther|2018|134|action
Lethal Weapon|1987|109|action
Return of the Jedi|1983|131|scifi
Star Trek|1979|132|scifi
The Big Chill|1983|105|drama
The Princess Bride|1987|98|fantasy
The Princess Bride|2020|98|fantasy
The Replacements|2000|118|comedy

sqlite> SELECT * FROM Alice
INTERSECT
SELECT * FROM Kurt;
Black Panther|2018|134|action
Lethal Weapon|1987|109|action
Star Trek|1979|132|scifi
The Big Chill|1983|105|drama
The Princess Bride|1987|98|fantasy
The Replacements|2000|118|comedy
The operator for set difference draws more on natural language mathematical terminology:
sqlite> SELECT * FROM Alice
EXCEPT
SELECT * FROM Kurt;
The Princess Bride|2020|98|fantasy

sqlite> SELECT * FROM Kurt
EXCEPT
SELECT * FROM Alice;
48 Hrs.|1982|96|comedy
Return of the Jedi|1983|131|scifi

So:

πa1,...(R)   ~= SELECT a1,... from R;
σC(R)       ~= SELECT * FROM R WHERE C;
R ⨯ S       ~= SELECT * FROM R, S;

R ⋃ S       ~= SELECT * FROM R  UNION      SELECT * FROM S;
R ⋂ S       ~= SELECT * FROM R  INTERSECT  SELECT * FROM S;
R – S       ~= SELECT * FROM R  EXCEPT     SELECT * FROM S;

We won't use some of these expressions often, but perhaps they will help you as you learn and use relational algebra.

Eugene Wallingford ..... wallingf@cs.uni.edu ..... September 29, 2020