Some of the material at the beginning of this activity is adapted from Chapter 5 of Database Design, in particular the section "Degrees of Data Abstraction". Feel free to read the chapter, but I recommend you so that later; it isn't necessary for this online session.
Do the short exercises in the activity as you come to them. Record your answers and submit your work as before. Note the submission date: by 11:59 PM on Friday, August 28. It is different from Online Session 1.
You may recall from our first session that one of the motivations for database systems was decoupling: the desire to decouple code that uses the database from the code that implements the database. We said that The data model decouples database users from the physical layer. It provides a logical layer to the system
That's not really enough separation. Designing the logical layer of a database application requires a lot of detail, and it typically uses the vocabulary of a particular kind of data model. In the beginning, though, we have a problem in the world, expressed in the vocabulary of a domain and set of users. We need to begin the process of database design closer to the problem.
Data modeling is the first step in the process of database design. It recognizes that even before we design the logical layer of our system, we need to first understand and model the domain. Thus, database design is much like the modeling we do for any kind of program: we start with the problem in the world, define a model for the problem, and then refine model toward an implementation.
The physical layer is the database management system. The logical layer is our data model. This week, we will begin our study of a powerful and popular data model, the relational data model. This session introduces you to conceptual modeling.
The goal of conceptual modeling is to describe:
The existence of constraints reminds us that the data stored in the database is composed of other data. Students have names and student numbers; courses have names and numbers. These are typically defined using constraints from the domain. For example, a UNI student number is six digits, and UNI course numbers consist of alphabetic strings of varying length, a space, and a four-digit number in 1000-7999 range.
The conceptual layer can be modeled in many ways. We will mostly be using the well-established entity relationship (ER) model. It is simple and defines only the minimal set of features we need to create a conceptual model:
The entity relationship model is not a formal language. An ER model of a domain is a semi-formal representation of the database structure. This result is easy to create, understand, and share, which makes it useful in the early stages of database design, when it can be used as reference to make sure that all the user's requirements are met. ER models are usually represented as diagrams:
Consider my Spotify example from our first session:
My flat file makes attributes and relationships look like the same kind of thing. That is an implementation detail! At the conceptual level, I might describe the database using this ER diagram:
Later, we will look at ER models in more detail, including a straightforward technique for converting an ER model into a relational data model. For now, let's get some practice working with the basic ideas of ER modeling and begin to explore some of the practical issues.
Draw an ER diagram for the entities, attributes, and relationships in my student information example above.
After drawing your diagram, check out this attempt. How does it differ from yours? What could we do? Or less?
The idea behind ER modeling is simple, which accounts in large part for its popularity. Designers can begin using it immediately. With experience, we develop the knowledge we need to design good models. This exercise will help you start down the road to experience.
We haven't many of the details of ER models yet, but we have learned enough to begin considering what makes a good design. In practice, the challenge is identifying:
Here are three simple design principles that we can use to evaluate a conceptual design.
Is the model faithful to the domain? The most important characteristic of a model is that it reflect reality. The entities and relationships in our model should express what we know about the domain. We can't represent everything about the domain in a model, but we should represent what is essential to the applications we hope to build.
Is the model parsimonious? That is an old-fashioned word that captures a key feature of good models: they do not contain any redundancies. In the the object-oriented design world, we sometimes say, Say it once and only once. If we express the same idea twice, then our database may end up wasting space. We also need to make sure that, when we change a fact in the database, we change the redundant fact, too.
Consider the three relationships in my Spotify model above. If and artist makes an album and an album contains songs, do we really need to represent the artist-makes-songs relationship? I can imagine a domain in which this relationship is redundant. I can also imagine domains in which it is essential. Can you think of an example where we would need that relationship?
Does the model use the right kind of element for each data item? Some attributes could be represented as entities. For example, we could convert a name attribute into an entity consisting of first name, last name, and suffix. Or we could convert an entity into an attributes or attributes of another entity. For example, if we don't ever need to reason about artists in our application, we could make artist name and genre to be attributes of, say, songs.
In general, attributes are simpler to implement than entities, but representing something as an attribute limits our ability to reason over the data. Eliminating the artist entity from my Spotify model would require me to duplicate information: albums are made by artists, too.
Designers walk a fine line between simplicity and expressiveness. With experience, we learn to make better choices in our initial designs and to recognize ways we can modify our designs to create simpler and more faithful models.
Let's explore these design principles with a real example: the data model your created in the main exercise.
By 11:59 PM on Friday, August 28, create a file containing
Submit your files electronically using the CS 3140 section of the department's electronic submission system, under "online02".
No hard copy is required.
Be sure to submit a plain text file. We will need to work with plain text files this semester, and you'll need to be comfortable creating and using them.
Be sure to use the names specified for the files you submit. This enables an autograder to find and process your submission.
If you need help or have any any questions, please ask promptly.