CS 3140 Online Activity -- Week 2

Introduction

Some of the material at the beginning of this activity is adapted from Chapter 5 of Database Design, in particular the section "Degrees of Data Abstraction". Feel free to read the chapter, but I recommend you so that later; it isn't necessary for this online session.

Do the short exercises in the activity as you come to them. Record your answers and submit your work as before. Note the submission date: by 11:59 PM on Friday, August 28. It is different from Online Session 1.

Data Modeling

You may recall from our first session that one of the motivations for database systems was decoupling: the desire to decouple code that uses the database from the code that implements the database. We said that The data model decouples database users from the physical layer. It provides a logical layer to the system

That's not really enough separation. Designing the logical layer of a database application requires a lot of detail, and it typically uses the vocabulary of a particular kind of data model. In the beginning, though, we have a problem in the world, expressed in the vocabulary of a domain and set of users. We need to begin the process of database design closer to the problem.

Data modeling is the first step in the process of database design. It recognizes that even before we design the logical layer of our system, we need to first understand and model the domain. Thus, database design is much like the modeling we do for any kind of program: we start with the problem in the world, define a model for the problem, and then refine model toward an implementation.

adding the conceptual layer to our system

The physical layer is the database management system. The logical layer is our data model. This week, we will begin our study of a powerful and popular data model, the relational data model. This session introduces you to conceptual modeling.

The goal of conceptual modeling is to describe:

the kinds of data stored in the database,
the relationships between data items, and
any constraints on the data in the domain.

For example, in a student information system, we might want to store students, professors, and courses. These data are related in several ways, such as professors teaching courses and students taking courses.

The existence of constraints reminds us that the data stored in the database is composed of other data. Students have names and student numbers; courses have names and numbers. These are typically defined using constraints from the domain. For example, a UNI student number is six digits, and UNI course numbers consist of alphabetic strings of varying length, a space, and a four-digit number in 1000-7999 range.

The conceptual layer can be modeled in many ways. We will mostly be using the well-established entity relationship (ER) model. It is simple and defines only the minimal set of features we need to create a conceptual model:

entities: the kinds of things stored in the database,
attributes: the properties associated with an entity,
relationships among the entities

In our ER model of a student information system, name and student number are attributes of a student, and name and course number are attributes of a course. Constraints are specified as a part of defining the attributes.

The entity relationship model is not a formal language. An ER model of a domain is a semi-formal representation of the database structure. This result is easy to create, understand, and share, which makes it useful in the early stages of database design, when it can be used as reference to make sure that all the user's requirements are met. ER models are usually represented as diagrams:

Consider my Spotify example from our first session:

artist: name, country, rating
album : name, year of release, rating, artist, songs it contains
song: name, year of release, length, rating, artist

My flat file makes attributes and relationships look like the same kind of thing. That is an implementation detail! At the conceptual level, I might describe the database using this ER diagram:

an ER diagram of my flat file music database

Later, we will look at ER models in more detail, including a straightforward technique for converting an ER model into a relational data model. For now, let's get some practice working with the basic ideas of ER modeling and begin to explore some of the practical issues.

Quick Exercise

Draw an ER diagram for the entities, attributes, and relationships in my student information example above.

After drawing your diagram, check out this attempt. How does it differ from yours? What could we do? Or less?

Main Exercise

The idea behind ER modeling is simple, which accounts in large part for its popularity. Designers can begin using it immediately. With experience, we develop the knowledge we need to design good models. This exercise will help you start down the road to experience.

Tasks

Choose a domain that you are interested in. It can be almost anything: sports, business, government, activism, .... We need databases almost everywhere! Think of a system you might like to use, or one you might like to build.

Create a data model for the domain. Be sure to define enough entities, attributes, relationships, and constraints that you explore all the ideas ER models support. Three or four entities, with three or so attributes each, and two or three relationships would be a good size.

Draw an ER diagram for your data model. Don't worry about making a perfect picture. We are more concerned about the ideas.

Design Principles

We haven't many of the details of ER models yet, but we have learned enough to begin considering what makes a good design. In practice, the challenge is identifying:

the entities and attributes that matter for a specific application
reltionships that capture the requirements of the application without duplicating information or creating ambiguity

The good news is that, with experience, we develop the knowledge we need to design good models.

Here are three simple design principles that we can use to evaluate a conceptual design.

Is the model faithful to the domain? The most important characteristic of a model is that it reflect reality. The entities and relationships in our model should express what we know about the domain. We can't represent everything about the domain in a model, but we should represent what is essential to the applications we hope to build.

Is the model parsimonious? That is an old-fashioned word that captures a key feature of good models: they do not contain any redundancies. In the the object-oriented design world, we sometimes say, Say it once and only once. If we express the same idea twice, then our database may end up wasting space. We also need to make sure that, when we change a fact in the database, we change the redundant fact, too.

Consider the three relationships in my Spotify model above. If and artist makes an album and an album contains songs, do we really need to represent the artist-makes-songs relationship? I can imagine a domain in which this relationship is redundant. I can also imagine domains in which it is essential. Can you think of an example where we would need that relationship?

Does the model use the right kind of element for each data item? Some attributes could be represented as entities. For example, we could convert a name attribute into an entity consisting of first name, last name, and suffix. Or we could convert an entity into an attributes or attributes of another entity. For example, if we don't ever need to reason about artists in our application, we could make artist name and genre to be attributes of, say, songs.

In general, attributes are simpler to implement than entities, but representing something as an attribute limits our ability to reason over the data. Eliminating the artist entity from my Spotify model would require me to duplicate information: albums are made by artists, too.

Designers walk a fine line between simplicity and expressiveness. With experience, we learn to make better choices in our initial designs and to recognize ways we can modify our designs to create simpler and more faithful models.

Follow-up Exercise

Let's explore these design principles with a real example: the data model your created in the main exercise.

Tasks

For each of the three design principles above, apply the principle to the data model you created.

Make a list of questions you have as you evaluate your design. Is there something you don't understand about one of the principles? After thinking about your model in light of the principles, is there something you're not sure about in your domain or in the idea of an ER model?

Submission

By 11:59 PM on Friday, August 28, create a file containing

your ER diagram from the main exercise
your evaluation of that model according the design principles
the questions recorded as you evaluated your design

Submit the ER diagram as a PNG, GIF, or JPG file named er-diagram.[png/gif/jpg].
Submit the other two items in a plain text file named online02.txt.

Submit your files electronically using the CS 3140 section of the department's electronic submission system, under "online02".

No hard copy is required.

Be sure to submit a plain text file. We will need to work with plain text files this semester, and you'll need to be comfortable creating and using them.

Be sure to use the names specified for the files you submit. This enables an autograder to find and process your submission.

If you need help or have any any questions, please ask promptly.

Eugene Wallingford ..... wallingf@cs.uni.edu ..... August 25, 2020

CS 3140 Database Systems

Fall 2020

Online Activity 2

Conceptual Data Modeling

Introduction

Data Modeling

Quick Exercise

Main Exercise

Tasks

Design Principles

Follow-up Exercise

Tasks

Submission