Online Session 11

Modifying a Database in SQL


CS 3140
Database Systems


We will use this database for the opening exercise and the rest of this session. Fire up SQLite on homework03.db and follow along!


Opening Exercise

Recall our database of movie information that consists of four relations:

    Movie(title, director, year, rating)
    Actor(actor, yearBorn)
    Director(director, yearBorn)
    StarsIn(actor, movie)

Write SQL queries to handle these requests:



Opening Solution

Did the first query give you a sense of déjà vu? Last week, we wrote a nearly identical query. We might try to join Actor and Movie naturally, but that leaves no trace of Tom Hanks, because we have none of his movies in the Movie relation. But a left outer join of the relations, Actor ⟕ StarsIn, does the trick:

    sqlite> SELECT actor, movie
       ...> FROM Actor NATURAL LEFT OUTER JOIN StarsIn
       ...> ORDER BY actor;
    actor      movie          
    ---------  ---------------
    Cage       Raising Arizona
    Hanks                     
    Maguire    Spiderman      
    Maguire    Wonder Boys    
    McDormand  Fargo          
    McDormand  Raising Arizona
    McDormand  Wonder Boys 

The second query would have a similar shape. The Movie relation relates movies and their directors, so it can play the role that StarsIn places in our first query. But there is one more complication: we only care about movies starring McDormand. We can find those movies with:

    sqlite> SELECT movie FROM StarsIn
       ...> WHERE actor='McDormand';
    movie          
    ---------------
    Fargo
    Raising Arizona
    Wonder Boys

This statement can be a subquery in a compound statement that finds the subset of the Movie relation with movies starring McDormand:

    sqlite> SELECT director, title FROM Movie
       ...> WHERE title IN (SELECT movie FROM StarsIn
       ...>                 WHERE actor='McDormand');
    title            director  year  rating
    ---------------  --------  ----  ------
    Fargo            Coen      1996  8.2   
    Raising Arizona  Coen      1987  7.6   
    Wonder Boys      Hanson    2000  7.6   
This query creates something like the StarsIn relation, a set of director/title pairs. Because it contains only McDormand's movies, we might call it DirectsMcDormand.

And that is the relation to which we can NATURAL LEFT OUTER JOIN the Director relation to get our answer:

    sqlite> SELECT director, title
       ...> FROM Director
       ...>        NATURAL LEFT OUTER JOIN
       ...>      (SELECT director, title FROM Movie
       ...>       WHERE title IN (SELECT movie FROM StarsIn
       ...>                     WHERE actor='McDormand'))
       ...> ORDER BY director;
    director  title          
    --------  ---------------
    Coen      Fargo          
    Coen      Raising Arizona
    Hanson    Wonder Boys    
    Raimi                    
If it were okay to omit Raimi from our answer, this would have been a simpler problem to solve. Sometimes, a small change in a request can require us to do a large amount of work.

Here is an SQL source file containing these two queries. Feel free to download it and experiment.

As we saw in class this week, you often know all you need to solve a problem such as this. We just need to look for ways to break the problem down into to pieces we understand better, and then look for ways to put the pieces back together. Having some memory for problems you have solved before can help you start to see patterns.

But what of poor Tom Hanks? We need to add movies to our database that reflect his contributions. Raimi has directed one McDormand film, so we can fill in the second gap if we want. Now for how.



Where Are We?

After several weeks of learning how to query relational databases, we are taking a step back to make sure we have relational schemas that can be used effectively.

Last time, we saw that a sub-optimal schema can result in a number of difficulties working with a database, including update anomalies, deletion anomalies, and insertion anomalies.

However, we have never written SQL statements that update, delete, or insert tuples! All of our statements have been queries: pure functions that compute values but do not change the database.

Next time we will return to the ideas of functional dependency, decomposition of relations, and normal forms on the way to a technique for ensuring that our databases behave well in the face of update, deletion, and insertions. First, let's learn how to perform these actions in SQL, so that we can modify an existing database.



Inserting New Tuples

We've never modified an existing relation, but back in Online Session 5-1, we used the INSERT statement to build a relation. INSERT can be used to add tuples to an existing relation, too. The basic form of the statement is

    INSERT INTO R(A1, A2, ..., An)
    VALUES(V1, V2, ..., Vn);

R is the name of the relation to which we are adding a tuple. A1, A2, ..., An name the attributes whose values we are inserting. V1, V2, ..., Vn are the values we are giving those attributes.

So, to add to our database the fact that Tom Hanks starred in "The Terminal", we would write:

    sqlite> INSERT INTO StarsIn(actor, movie)
       ...> VALUES('Hanks', 'The Terminal');
    sqlite> SELECT * FROM StarsIn;
    actor      movie          
    ---------  ---------------
    Cage       Raising Arizona
    ...
    Hanks      The Terminal

To add a tuple for "The Terminal" to the Movie relation, we might write:

    sqlite> INSERT INTO Movie(title, director, year, rating)
       ...> VALUES('The Terminal', 'Spielberg', 2004, 7.4);
    sqlite> SELECT * FROM Movie;
    title            director   year  rating
    ---------------  ---------  ----  ------
    Fargo            Coen       1996  8.2   
    Raising Arizona  Coen       1987  7.6   
    Spiderman        Raimi      2002  7.4   
    Wonder Boys      Hanson     2000  7.6   
    The Terminal     Spielberg  2004  7.4 

The INSERT statement allows several variations, some of which can be quite handy. Here are three.

We will see one more variation on INSERT at the end of the session...



Deleting Tuples

We delete existing tuples using the DELETE statement, which has a form similar to SELECT:

    DELETE FROM R WHERE condition;
This statement deletes from relation R every tuple that matches the condition.

To delete a specific tuple from the database, we have to define a condition that identifies that tuple. If we know the key for the tuple, we can use that to define our condition. If not, then we have to specify the entire tuple.

For example, if we wanted to remove "Spiderman" from the Movie relation, we could use either of these statements:

    DELETE FROM Movie
    WHERE title='Spiderman' AND year=2002;

    DELETE FROM Movie
    WHERE title='Spiderman' AND
          year=2002 AND
          director='Raimi' AND
          rating=7.4;

Notice that, unlike with INSERT, we can't simply list a tuple in (V1, V2, ..., Vn) form.

We can also remove several rows at once with a more general condition. For example, if we decided to become pickier about the films we watch, we might decide to delete all movies in an entire class:

    sqlite> DELETE FROM Movie WHERE rating < 8.0;
    title            director   year  rating
    ---------------  ---------  ----  ------
    Fargo            Coen       1996  8.2   
    8 Mile           Hanson     2002        

This introduces us to something about NULL values that we have not seen before:

Whenever NULL is used in a comparison expression, the result is the boolean value UNKNOWN.
Because UNKNOWN is neither true nor false, it never satisfies the stated condition. This means that "8 Mile", which does not have a rating, fails the (rating < 8.0) and appears in the result.

I don't really want to delete all but two movies from the relation, so I went back to a version of the database with all seven movies. Always remember: INSERT and DELETE statements change the database. Their effects remain as we move forward in time. This is why SQL calls them modifications.



Updating Values

The third kind of modification statement is UPDATE, which enables us to change the value of one or more tuples in a relation. Its form is:

    UPDATE R
    SET assignment
    WHERE condition;
This statement applies the assignment statement to every tuple in R that matches the condition. The assignment consists of an attribute name, an equal sign, and a value for the attribute;

For example, we can assign rating to "8 Mile" with UPDATE:

    sqlite> UPDATE Movie
       ...> SET rating = 7.1
       ...> WHERE title = '8 Mile';
    title            director   year  rating
    ---------------  ---------  ----  ------
    Fargo            Coen       1996  8.2   
    Raising Arizona  Coen       1987  7.6   
    Spiderman        Raimi      2002  7.4   
    Wonder Boys      Hanson     2000  7.6   
    The Terminal     Spielberg  2004  7.4   
    8 Mile           Hanson     2002  7.1   
    The Natural      Levinson   1984  7.5

We can update several columns at once by separating multiple assignments with commas:

    sqlite> UPDATE Movie
       ...> SET rating = 7.1,
       ...>     director= 'C. Hanson'
       ...> WHERE title = '8 Mile';
    title            director   year  rating
    ---------------  ---------  ----  ------
    Fargo            Coen       1996  8.2   
    Raising Arizona  Coen       1987  7.6   
    Spiderman        Raimi      2002  7.4   
    Wonder Boys      Hanson     2000  7.6   
    The Terminal     Spielberg  2004  7.4   
    8 Mile           C. Hanson  2002  7.1   
    The Natural      Levinson   1984  7.5

We can also update several rows at once with a more general WHERE condition. Suppose that we decided to identify Joel Cohen as the director of the Coen movies. (Joel is usually listed as director, with brother Ethan uncredited.) We can update all the Coen movies with:

    sqlite> UPDATE Movie
       ...> SET director = 'J. Coen'
       ...> WHERE director = 'Coen';
    title            director   year  rating
    ---------------  ---------  ----  ------
    Fargo            J. Coen    1996  8.2   
    Raising Arizona  J. Coen    1987  7.6   
    Spiderman        Raimi      2002  7.4   
    ...



Computing a Set of Tuples to Insert

There is one more variation of the INSERT statement that can be quite useful. We can compute a set of tuples to insert using a subquery.

Consider the state of our current database. We added "The Natural" to the Movie relation, introducing a new director who is not included in the Director relation. Maybe there are other directors lurking in Movie that should be added to Director.

We can find all those directors with this query:

    sqlite> SELECT DISTINCT director
       ...> FROM Movie
       ...> WHERE director NOT IN
       ...>    (SELECT director FROM Director);
    Levinson
This query could produce a relation with more than one tuple. If we had not added Spielberg to Director explicitly earlier after adding "The Terminal" to Movie, Spielberg would be in this relation, too.

SQL allows us to replace the VALUES clause in the INSERT statement with a subquery that computes a set of tuples to be insert to build a relation. The form of this statement is:

    INSERT INTO R(A1, A2, ..., An)
       subquery;

So, we can add all directors from Movie who aren't already in Director with:

    sqlite> INSERT INTO Director(director)
       ...>    SELECT DISTINCT director
       ...>    FROM Movie
       ...>    WHERE director NOT IN
       ...>       (SELECT director FROM Director);
    sqlite> SELECT * FROM Director;
    director   yearBorn
    ---------  --------
    Coen       1954    
    Hanson     1945    
    Raimi      1959    
    Spielberg  1946    
    Levinson           

We couldn't specify a year born for one or more unknown directors, so we settled for NULL values. We can use an UPDATE statement to complete Levinson's tuple:

    sqlite> UPDATE Director
       ...> SET yearBorn = 1942
    sqlite> WHERE director = 'Levinson';
    director   yearBorn
    ---------  --------
    Coen       1954    
    Hanson     1945    
    Raimi      1959    
    Spielberg  1946    
    Levinson   1942    

Quick Exercise: Do the same thing for actors in StarsIn who do not appear in Actor: Write a query to add them all, then update the actors' tuples. (There's only one, Basinger, who was born in 1953.)

These examples show how we can use SELECT, INSERT, and UPDATE together to extend and update a database. We will see more examples in the coming weeks.



The Current State of Our Database

We started this session with homework03.db but, unlike the previous times we used it, we end with a different database. We added one new actor, two new directors, three new movies, and three new actor/movie relationships. This is no longer the database from Homework 3! Nor is homework03.db an appropriate name for this database.

To reflect this, before I quit SQLite, I saved the database with a new name:

    sqlite> .save movie-at-11.db
    sqlite> .quit
This more complete movie database gives us more data to work with in the future. If you didn't follow along and create your updated database, feel free to download movie-at-11.db now and add it to your collection.

I also deleted homework03.db from my directory. If I want to work with the database from from Homework 3 again, I'll download it using the link at the top of this session.



Eugene Wallingford ..... wallingf@cs.uni.edu ..... November 10, 2020