Lab Exercise 14

Analyzing Movie Data with Dictionaries


CS 1510
Introduction to Computing


Introduction

This week, we use dictionaries and lists to analyze data from IMDB. You will also implement some handy operations for lists.

Create a directory on your USB device for this lab, say, lab14, and launch IDLE. Create a new program file named lab14.py in which to do all your work.

This week, you will not submit your shell window at the end of the session. You will submit a responses.txt file this week. Download this template file and use it to record any answers or predictions asked for in the exercises.



Movie Data

IMDB has built a successful business out of amassing information about popular entertainment such as movies, television shows, and games, and making it available to you and me. You know the model: If you search for a movie on the website, it brings up a web page showing information about the movie, including all of the actors who perform in it. If you click on an actor's name, it displays a web page showing information about the actor including all of the movies he or she has performed in. This assignment should give you some insight into how such websites work.

To do this lab, we need data. IMDB makes much of its data available for use, but those data sets are too large and complex for today's lab. Instead you will use a small data file with information about a few movies and actors. Each line in the file consists of a single actor and a subset of the movies he or she has appeared in. It is in the now-familiar CSV format:

    actor, movie1, movie2, movie3, ... 



Task 1: Build Your Data Model

First, we build a couple of data structures that we can use to explore the data.

Step 1.
Download movies.txt, our data file.


Step 2.
Write a function named build_actor_database(filename) that builds and returns a dictionary where: For example:
       KEYS                       VALUES

       'Brad Pitt'        →  [ 'Sleepers', ... ]
       'Anthony Hopkins'  →  [ 'Hannibal', ... ]
       ...
       'Bruce Willis'     →  [ 'Die Hard', ... ]
       'Kevin Bacon'      →  [ 'A Few Good Men', ... ]

Be sure to strip whitespace from the file's lines and to capitalize all the names.


Step 3.
Write a function named build_movie_database(actor_db) that uses an actor database to build and return a dictionary where: For example:
       KEYS                     VALUES

       'Sleepers'         → [ 'Brad Pitt', ... ]
       'Hannibal'         → [ 'Anthony Hopkins', ... ]
       ...
       'Die Hard'         → [ 'Bruce Willis', ... ]
       'A Few Good Men'   → [ 'Kevin Bacon', ... ]

Building this dictionary takes a a bit more care. For each movie you encounter:



Task 2: Analyze Movies

Next, we use our data structures to find some relationships among movies.

Step 1.
Write a function named compare_movies(movie_1, movie_2), where the arguments are the names of two movies.

The first list is the union of two actor lists, and the second is the intersection of two actor lists. So, write the two helper functions specified in Step 2 before finishing this step.


Step 2.
Write functions named list_union(list_1, list_2) and list_intersect(list_1, list_2) that take two lists as arguments. For example:
       >>> list_1 = [1,2,3,4,5]
       >>> list_2 = [1,3,5,7,9]
       >>> list_union(list_1, list_2)
       [1, 2, 3, 4, 5, 7, 9]
       >>> list_intersect(list_1, list_2)
       [1, 3, 5]

To demonstrate that your functions work, run them for these test cases and two others. Copy the results of your interaction into your responses file.


Step 3.
Run your function for three different pairs of movies. Copy the results of your interaction into your responses file.


Task 3: Analyze Actors

Next, we use our data structures to find some relationships among actors.

Step 1.
Write a function named acting_partners(actor), where the argument is the name of an actor.

This is the union of several actor lists, so use the helper function you wrote for the previous task.


Step 2.
Write a function named compare_actors(actor_1, actor_2), where the arguments are the names of two actors.

This is the intersection of two movie lists, so use the helper function you wrote for the previous task.


Step 3.
Run your functions for three different actors, or pairs of actors. Copy the results of your interaction into your responses file.


Finishing Up

Make sure that your program file is complete and saved. Save your responses.txt file.

Submit your files for grading on the electronic submission system, at lab14 -- Analyzing Movie Data with Dictionaries.

As always, make sure you see the verification screen that says The files listed above were uploaded.

If you need any help, let me know.



Eugene Wallingford ..... wallingf@cs.uni.edu ..... December 3, 2014