This week, we use dictionaries and lists to analyze data from IMDB. You will also implement some handy operations for lists.
Create a directory on your USB device for this lab, say, lab14, and launch IDLE. Create a new program file named lab14.py in which to do all your work.
This week, you will not submit your shell window at the end of the session. You will submit a responses.txt file this week. Download this template file and use it to record any answers or predictions asked for in the exercises.
IMDB has built a successful business out of amassing information about popular entertainment such as movies, television shows, and games, and making it available to you and me. You know the model: If you search for a movie on the website, it brings up a web page showing information about the movie, including all of the actors who perform in it. If you click on an actor's name, it displays a web page showing information about the actor including all of the movies he or she has performed in. This assignment should give you some insight into how such websites work.
To do this lab, we need data. IMDB makes much of its data available for use, but those data sets are too large and complex for today's lab. Instead you will use a small data file with information about a few movies and actors. Each line in the file consists of a single actor and a subset of the movies he or she has appeared in. It is in the now-familiar CSV format:
actor, movie1, movie2, movie3, ...
First, we build a couple of data structures that we can use to explore the data.
KEYS VALUES 'Brad Pitt' → [ 'Sleepers', ... ] 'Anthony Hopkins' → [ 'Hannibal', ... ] ... 'Bruce Willis' → [ 'Die Hard', ... ] 'Kevin Bacon' → [ 'A Few Good Men', ... ]
Be sure to strip whitespace from the file's lines and to capitalize all the names.
KEYS VALUES 'Sleepers' → [ 'Brad Pitt', ... ] 'Hannibal' → [ 'Anthony Hopkins', ... ] ... 'Die Hard' → [ 'Bruce Willis', ... ] 'A Few Good Men' → [ 'Kevin Bacon', ... ]
Building this dictionary takes a a bit more care. For each movie you encounter:
Next, we use our data structures to find some relationships among movies.
The first list is the union of two actor lists, and the second is the intersection of two actor lists. So, write the two helper functions specified in Step 2 before finishing this step.
>>> list_1 = [1,2,3,4,5] >>> list_2 = [1,3,5,7,9] >>> list_union(list_1, list_2) [1, 2, 3, 4, 5, 7, 9] >>> list_intersect(list_1, list_2) [1, 3, 5]
To demonstrate that your functions work, run them for these test cases and two others. Copy the results of your interaction into your responses file.
Next, we use our data structures to find some relationships among actors.
This is the union of several actor lists, so use the helper function you wrote for the previous task.
This is the intersection of two movie lists, so use the helper function you wrote for the previous task.
Make sure that your program file is complete and saved. Save your responses.txt file.
Submit your files for grading on the electronic submission system, at lab14 -- Analyzing Movie Data with Dictionaries.
As always, make sure you see the verification screen that says The files listed above were uploaded.
If you need any help, let me know.