## Session 4

### Opening Exercise

A movie studio is characterized by its name and address. A studio may have many film crews working for it. Each film crew at a studio is identified by its number: 1, 2, 3, and so on. It also has a crew chief.

Draw an ER diagram to model these entities.

State clearly any assumptions you make in your design.

### Review of Opening Exercise

How many entities are there? What are their keys?

How many relationships? What are their cardinalities?

Does the key for the Crew entity create any problems?

Here's what I see in the problem description:

But the description leads to a violation of one of the most important elements of a data model: A key uniquely identifies a tuple. In this case, though, the same crew numbers are used at multiple studios, so they are not unique.

Maybe we need an artificial key:

This works, but it is dissatisfying in another way. The model is no longer quite as faithful to the domain.

Can we do better?

This kind of situation arises more often in data modeling than might seem obvious at first. It is so common that there is a tweak to ER and relational models for recording and handling it: the weak entity.

### Where Are We?

We are still exploring conceptual modeling and logical modeling in parallel, in the form of the Entity Relationship model and the relational data model. This week, we will bring them together and prepare to focus our attention on the theory and practice of the relational data model.

### Weak Entities

A movie studio can have many film crews working for it. Each film crew at a studio is identified by its number: 1, 2, 3, and so on. But other studios use the same numbers to identify their crews, so the crew number attribute cannot be the key. It's not clear that crew number + crew chief can be a compound key, either.

To identify a crew uniquely, we need both the name of the studio and the crew number. Studio name is not an attribute of the crew, though, so it has to be passed through the WorksFor relationship.

It is legal for an entity's key to include attributes that belong to a related entity. This kind of entity is the called a weak entity.

The studio film crew illustrates a form of weak entity that occurs frequently in the world: one entity is part of another, in a hierarchical relationship. A crew is part of a studio. Rather than create a meaningless artificial key, we can uniquely identify a crew by identifying the studio of which it is a part.

In our ER models, we represent weak entities using a special notation:

We use a double rectangle to indicate a weak entity. A double diamond marks the many-to-one relationship that ties a weak entity to the source of the rest of its key.

The result is a model that is faithful to the domain.

Another example is hierarchical classification in biology. Each species takes its name from both the species name itself, say sapiens, and the genus that contains it, say Homo. This supports faithfully models the biological domain: a genus can contain multiple species, and species names can be reused across two genera.

An ER diagram of the relationship between species and genera might look like this:

Can you think of any other examples in domains you're familiar with?

### Another Weak Entity Example

In a university's information system, we will have Student and Course as entities:

We would like to record the fact that a student takes a course. This sounds like a natural place to create a relationship, but we have a problem:

We cannot use a "Takes" relationship, because it needs to have an attribute of its own: the student earns a grade in the course. We need another entity: an EnrollmentRecord:

What will be the key in the new entity? Will the grade be part? No, the key for an EnrollmentRecord will comprise the keys from the corresponding Student and Course.

If we decide to record more information in an EnrollmentRecord -- say, the semester in which the student takes the course -- then EnrollmentRecord may contribute a field to the key. (Our model would now enable us to record that students take a course multiple times.)

EnrollmentRecord is an example of a second source of weak entities: when we create entities that aren't a natural part of the domain. We also see this in the process that creates an artificial entity in order to convert a multi-way relationship into a set of binary relationships. We will see this idea in this week's online session.

### How Weak Entities and Their Relationships Work

After seeing a couple of examples, we can describe a bit more formally how weak entities and their relationships work. This can help us recognize and use weak entities effectively.

If E is a weak entity, then its key consists of:

• 0 or more of its attributes, plus
• the key attributes of its supporting entity F, through the supporting relationship R.

R is a supporting relationship for E from F if:

• R is a binary relationship,
• every E is related to exactly one F, and
• the relevant attributes in F are key attributes.

The bullets in the second lists seem pretty straightforward, except perhaps for the second one: why can every E be related to only one F? Think about it for a second... If one enrollment record could be related to two students, then our model would store the same grade for both students!

The theory of weak entities is bigger than this set of rules. (For example, a supporting entity F can be weak, too! In that case, it will have a supporting entity G that contributes to the key of E.) But this set of rules prepares us to work with most of the situations that we will see in data modeling.

### Converting an ER Model into Relations

You may have noticed that the entities in ER modeling and the relations in the relational data model are quite similar. The correspondence is so strong that, as a first approximation, converting an ER model into a set of relations is simple:

1. For each entity, create a relation that has the same attributes and key.
2. For each relationship, create a relation whose attributes are the keys of the related entities.
These two steps work well for many ER models. Thet are so simple that you are applying them to two basic ER models on Homework 1.

Consider the bank account we modeled in last session's closing exercise:

The first rule tells us to create relations named Customer and Account, with these schema:

```    Customer(ssNo, name, phone, address)
Account(number, type, balance)
```
These schema describe instances such as these:
```    ssNo          name         phone      address
-----------   -----        --------   -------
123-45-6789   Eugene W     273-5919   305 ITTC
321-45-6789   John Smith   452-9159   339 WRT

number   type       balance
------   --------   -------
12345    checking   100
96114    savings    25
54678    savings    1000
```

The second rule tells us to create a relation named Owns, with this schema:

```    Owns(ssNo, number)
```
This schema allows us represent account ownership:
```    ssNo          number
-----------   ------
123-45-6789   12345
321-45-6789   96114
123-45-6789   54678
```

And we are done. This approach works for a broad variety of ER models.

There are a few tweaks we can make to these rules to handle special cases. Let's look at one now and one later.

Combining two relations Sometimes, a relationship results in a table that can be folded into one of its entities' tables.

Consider the Owns relation. In a bank where the relationship between customers and accounts is m:n -- that is, each customer can own multiple accounts, and each account can have multiple owners -- then the Owns relation is necessary for us to record the possible combinations.

However, in our closing exercise, the relationship is 1:n. A customer may have one or more accounts, which have one owner each. This means that the Owns relation can contain many tuples with the same customer, but there will be only one tuple for each account.

```    ssNo          number
-----------   ------
123-45-6789   12345
321-45-6789   96114
123-45-6789   54678
```
The account number is all we need to identify a tuple!

This is a case of an entity E and an n:1 relationship R from E to F. In these cases, it can be clearer and more efficient to fold the relation for R into the relation for E, resulting in a schema that contains:

• all the attributes of the entity E, including its key, and
• the key attribute of entity F.

In our example, E is Account, F is Customer, and R is Owns, an n:1 relationship from Account to Customer. We can fold the Owns relation into the Account relation, resulting in this schema:

```    Customer(ssNo, name, phone, address)
Account(number, type, balance, owner)
```
The attribute owner will be the ssNo of the customer who owns the account.
```    number   type       balance   owner
------   --------   -------   ----------
12345    checking   100       123-45-6789
96114    savings    25        321-45-6789
54678    savings    1000      123-45-6789
```
"Owner" is a better name for the attribute than "ssNo". We will want to define the domain of the attribute to be the set of customer keys.

This conversion may cause a feeling of deja vu... Last time, we looked at some of your questions from an ER modeling exercise. Many of you were wondering about the conditions under which an attribute could be an entity or a relationship could be an attribute. Combining two relations into one when one of them adds no new information illustrates one such condition: a relationship in an ER model can become an attribute in a relational schema!

Handling Weak Entities. You might imagine that weak entities require a slightly different approach. You'd be right. Let's study that approach in this week's online session.

### Up Next

This week's online activity is available. Work through it before class next week. There is a deliverable for this one, so send it to me after you are done.

Homework 1 was due this morning. Homework 2 will be available on Friday. Exam 1 will not be next week.

Eugene Wallingford ..... wallingf@cs.uni.edu ..... September 10, 2020