Session 13

Working with Large Queries and Relations


CS 3140
Database Systems


We will use the Homework 7 database for the opening exercise and the rest of this session. Fire up SQLite on homework07.db and follow along!


Opening Exercise

Recall the computer product database from Homeworks 4, 6, and 7. It consists of four relations:

    Product(maker, model, type)
    PC(model, speed, ram, hd, price)
    Laptop(model, speed, ram, hd, screen, price)
    Printer(model, color, type, price)

Write the SQL statements needed to...

This requires more than a simple DELETE statement, because we don't even have a relation containing manufacturers of printers, let alone a relation containing manufacturers who don't make printers.

You might approach this problem either from the bottom up:

Or from the top down:



Opening Solution

Let's try this from the bottom up. To figure out which manufacturers don't make printers, we need to know which ones do. That is simple enough to find:

    sqlite> SELECT DISTINCT maker
       ...> FROM   Product
       ...> WHERE  type = 'printer';
    D
    E
    H

We can use this as a subquery to find the manufacturers that don't make printers:

    sqlite> SELECT DISTINCT maker         How might you use a
       ...> FROM   Product                set minus to do this?
       ...> WHERE  maker NOT IN
       ...>        (SELECT DISTINCT maker
       ...>         FROM   Product
       ...>         WHERE  type = 'printer');
    A
    B
    C
    F
    G

We can use this as a subquery to find all the products made by the manufacturers that don't make printers:

    sqlite> SELECT model
       ...> FROM   Product
       ...> WHERE  maker IN
       ...>        (SELECT DISTINCT maker
       ...>         FROM   Product
       ...>         WHERE  maker NOT IN
       ...>                (SELECT DISTINCT maker
       ...>                 FROM   Product
       ...>                 WHERE  type = 'printer'));
    1001
    1002
    ...
    2009
    2010

And now we are ready to delete the laptops that are in that list of products:

    sqlite> DELETE
       ...> FROM  Laptop
       ...> WHERE model IN
       ...>       (SELECT model
       ...>        FROM   Product
       ...>        WHERE  maker IN
       ...>               (SELECT DISTINCT maker
       ...>                FROM   Product
       ...>                WHERE  maker NOT IN
       ...>                      (SELECT DISTINCT maker
       ...>                       FROM   Product
       ...>                       WHERE  type = 'printer')));
    sqlite> SELECT * FROM Laptop;
    model  speed  ram   hd   screen  price
    -----  -----  ----  ---  ------  -----
    2001   2      2048  240  20.1    3673 
    2002   1.73   1024  80   17      949  
    2003   1.8    512   60   15.4    549  

This deletes all the targeted laptops from the Laptop relation. But those laptops also appear in the Product relation. We can use our three-level subquery, which finds the products made by the manufacturers that don't make printers, as a subquery here, too:

    sqlite> DELETE
       ...> FROM  Product             -- changed line
       ...> WHERE type = 'laptop'     -- new line
       ...>   AND model IN            -- changed line
       ...>       (SELECT model
       ...>        FROM   Product
       ...>        WHERE  maker IN
       ...>               (SELECT DISTINCT maker
       ...>                FROM   Product
       ...>                WHERE  maker NOT IN
       ...>                      (SELECT DISTINCT maker
       ...>                       FROM   Product
       ...>                       WHERE  type = 'printer')));
    sqlite> SELECT * FROM Product;
    maker  model  type   
    -----  -----  -------
    A      1001   pc     
    A      1002   pc     
    ...
    E      2001   laptop 
    E      2002   laptop 
    E      2003   laptop
    ...
    H      3006   printer
    H      3007   printer

E makes printers, so its laptops survived.

Here is an SQL source file containing these two queries. Download it, study it, and experiment. You can also download the queries individually: [ Laptop | Product ].

If you'd like to see the evolution of the solution from the top down, check out this page. -- COMING SOON.

As we have seen several times now, you often know all you need to solve a problem such as this. The challenge is to find ways to break the problem down into to pieces we understand. The bottom-up and top-down strategies can be helpful. In the end, we need to know well the relations we are working with and know basic SQL statements and patterns.

I'm a little concerned with the massive repetition of code between the two modifications we wrote. Are you? Duplicated code is a error trap in SQL as much as in other programming languages. Fortunately SQL gives us a way to work more efficiently and more safely...



Where Are We?

For the last two weeks, we took a break from our study of working with relational databases to learn how we can ensure we have relational databases we can work with effectively. One of the big problems is anomalies that can occur when we modify a database by inserting, deleting, and updating tuples.

Last time, we learned a technique for using functional dependencies to decompose a relation into Boyce-Codd Normal Form. Relations in BCNF are not prone to those modification anomalies.

Now we can finish off the semester looking at how to work with bigger databases and how to write and manage bigger programs. These programs can safely include modifications.



For the rest of the session, I am using a fresh copy of the Homework 7 database with all its tuples still intact.


Views

To solve our opening exercise, we wrote two large queries that both contained the same large subquery. This repeated code presents a challenge and an opportunity, much as repeated code does in most any setting.

The challenge involves maintenance and debugging. If we find an error in the subquery, we have to remember to change it in both places. If our database evolves and we have to modify our statement to fit the new circumstances, we have to remember to change it in both places. Repeated code is an error trap.

The opportunity lies in code we might write. The fact that we used exactly the same three-level subquery in two places is our code telling us something... Our database may not have tables for manufacturers with or without laptops, but it may be helpful if we could act as if it did.

SQL provides us with a tool, the view, that helps us overcome the challenge and capitalize on the opportunity. A view is a relation computed by a query and assigned a name for further use. The basic form for creating views is

    CREATE VIEW name_of_view AS query;

Consider our solution above. We might create a view on the Product relation consisting of manufacturers that don't make laptops:

    CREATE VIEW DoesntMakePrinters AS
      SELECT DISTINCT maker
      FROM   Product
      WHERE  maker NOT IN
             (SELECT DISTINCT maker
              FROM   Product
              WHERE  type = 'printer');

We can query a view just as we query any other relation:

    sqlite> SELECT * FROM DoesntMakePrinters;
    maker
    -----
    A    
    B    
    C    
    F    
    G    

We can also use a view as part of a larger query. For example, this query finds all the products made by the manufacturers that don't make printers:

    sqlite> SELECT model
       ...> FROM   Product
       ...> WHERE  maker IN DoesntMakePrinters;
    1001
    1002
    ...
    2009
    2010

This query can be used as the only subquery in our solution to the opening exercise:

    DELETE
    FROM  Laptop
    WHERE model IN
          (SELECT model
           FROM   Product
           WHERE  maker IN DoesntMakePrinters);
I think that's a lot easier to understand than the large query we wrote. What do you think?

However, views differ in an important way from the tables in the database such as a Laptop and Product. Those tables are persistent in storage. When we leave our SQLite session, they remain on disk in our .db file. A view is a name for a query. Each time we use a view's name, its relation is computed anew.

For this reason, it's helpful to distinguish between views and the relations on which they are based. From now on, we will use the word "table" to refer to relations stored on disk as part of the persistent database and the term "view" to refer to relations stored in memory. We'll use "relation" in an SQL context only when either a table or a view could be used.



Exercise

Use the computer product database:

    Product(maker, model, type)
    PC(model, speed, ram, hd, price)
    Laptop(model, speed, ram, hd, screen, price)
    Printer(model, color, type, price)

to...

Create a view named ComboMaker consisting of the maker, PC model, and laptop model for all PC-laptop combinations made by the same manufacturer.

This requires a join of some sort...



A Solution

We need to join Product with itself so that we can find the PC-laptop combinations for each manufacturer. Joining a relation with itself requires aliases, so that we can refer to columns from the two relations unambiguously.

We could do this with a cross product. If so, we need to select only those tuples where the maker columns agree and where one model is a PC and the other is a laptop:

    CREATE VIEW ComboMaker AS
      SELECT P1.maker, P1.model, P2.model
      FROM Product AS P1, Product AS P2
      WHERE P1.maker = P2.maker
        AND P1.type = 'pc'
        AND P2.type = 'laptop';

We could also do this with a join using the maker attribute:

    CREATE VIEW ComboMaker AS
      SELECT maker, P1.model, P2.model
      FROM Product AS P1
             INNER JOIN
           Product AS P2
           USING (maker)
      WHERE P1.type = 'pc'
        AND P2.type = 'laptop';

Let's see what the view contains:

    sqlite> SELECT *
       ...> FROM   ComboMaker;
    maker  model  model
    -----  -----  -----
    A      1001   2004 
    A      1001   2005
    ...
    E      1013   2002 
    E      1013   2003 

Hmmm. The view relation has two attributes with the same name! This is, of course, a natural result of joining a table with itself. Unfortunately, this hides from the reader which column has the PCs and which column has the laptops, and it requires that all queries using the view qualify the name.

CREATE VIEW statement allows us to circumvent this problem by naming the attributes of the new view:

    CREATE VIEW ComboMaker(maker, pcModel, laptopModel) AS
      SELECT maker, P1.model, P2.model
      FROM Product AS P1
             INNER JOIN
           Product AS P2
           USING (maker)
      WHERE P1.type = 'pc'
        AND P2.type = 'laptop';

Now the resulting relation looks better:

    sqlite> SELECT *
       ...> FROM   ComboMaker;
    maker  pcModel  laptopModel
    -----  -------  -----------
    A      1001     2004       
    A      1001     2005       
    ...
    E      1013     2002       
    E      1013     2003       
and will be easier to work with.

If you'd like to experiment with ComboMaker, here is an SQL source file containing the final version of the CREATE statement.



Using Views

Our solution to the exercise highlights two advantages of using views.

First, a view allows us to encapsulate a complex query. The inner join of Product with itself on a given column and filtered on another is contained within the definition of the view. The view makes it easier to write future queries, and those queries will be easier to understand.

Second, the ability to name the attributes of a view provides an abstraction layer on top of the base tables in the database. The name of the new relation and attributes form an set of tuples that can be treated like any other relation in most settings.

There are some practical details that make working with views seem even more like working with tables.

Deleting Views

We can drop a view in the same way we drop a table, with DROP VIEW. Earlier, I had to drop my first ComboMaker view before creating the one with custom column names:

    sqlite> CREATE VIEW ComboMaker AS ...
    sqlite> DROP VIEW ComboMaker;
    sqlite> CREATE VIEW ComboMaker(maker, pcModel, laptopModel) AS ...

Here we see an advantage of views as an abstraction over tables: Dropping a view does not affect the underlying table and its tuples. Product remained as it was before we created the first view and could be used to create the second.

Note, though, that the relationship is not symmetrical! If we DROP TABLE Product;, not only is the Product table deleted from the database, but we also will no longer be able to use ComboMaker. Any query that refers to ComboMaker refers indirectly to Product.

    sqlite> DROP TABLE Product;
    sqlite> SELECT * FROM ComboMaker;
    Error: no such table: main.Product

Updatable Views

Under a limited set of conditions, SQL allows us to modify a view by inserting, deleting, or updating tuples. This would be a very handy feature, but complex view definitions create real challenges for our DBMS trying to translate the view modification into appropriate modifications to the underlying tables. (Think about ComboMaker, with its inner self join, or even DoesntMakePrinters, which uses SELECT DISTINCT to create sets.)

Roughly speaking, a view is updatable if it selects all tuples from a single relation and includes enough attributes that a partial tuple can include null values. There are a number of other specific restrictions that make updatable views a deep subject. They are beyond the scope of this course. Besides, SQLite does not support updatable views.

Materialized Views

Sometimes a view is expensive to compute or is used often enough that recomputing it every time creates a problem. SQL allows us to materialize a view and store its relation in the database with the physical tables. To store a view as computed relation rather than as a query, we use a new modifier: CREATE MATERIALIZED VIEW.

SQLite does not support materialized views. However, it does store our view in the database file, as a name for a query. We can leave our SQLite session, reopen the database later, and use the view. If you want the view-as-a-name to be temporary, declare it with CREATE TEMPORARY VIEW.



Eugene Wallingford ..... wallingf@cs.uni.edu ..... November 12, 2020