Lab14
Analyze customer data
Overall Background
While true analysis of customer data is normally done in a proper database with
proper database queries, we can ask and answer a remarkable number of simple
questions using a simple text file of data and some basic code that uses lists
and/or dictionaries to help us organize the data in that file.
In this assignment you will be working with a file of 30,000 fake customers.
(Yes, it really is fake information which was generated randomly, so no trying to sell it after class! Heh.)
- FakeCustomerData.zip (right click and choose "save
as". Then unzip the archive and you will find a txt file with all the information inside.)
In order to complete this assignment, you should create a file called lab14.py.
Complete the following steps in the order listed.
- Write the getStateDistribution() method:
- BACKGROUND :
- In order to understand where our customers come from we might want to
search to see how many customers come from each state.
- ACTION :
- Add to lab14.py the method called getStateDistribution().
- This method will be a specialized method that works on our one
file so it takes no parameters.
- This method:
- opens the file "FakeCustomerData.txt" and reads it in line by line
- for each line (not including the header line) it splits the data...
- pulls out the state for that customer...
- and keeps track of the
number of times we have seen each state.
- This method sorts and prints the final counts from each state
(see the screenshots following this set of instructions
- TESTING :
- Consider the screenshot below and compare your counts to the partial
list of counts listed there.
...
- Write the getColumnDistribution(filename,columnNum) method.
- Write the getBirthYearDistribution() method.
- BACKGROUND :
- Method #2 looks like it is going to be really useful, but it isn't as
helpful as we might really like because sometimes the interesting data is
nested inside other data. For example, if I wanted to see how old my
customers are (by looking at the year they were born) I could try to use
method #2 using column 13 (birthdate) but the problem is that this only
stores information for the entire column so I get a LONG dump of data that
starts with:
- ACTION :
- Add to lab14.py the method called getBirthYearDistribution().
- This method will be a specialized method that works on our one
file so it takes no parameters.
- This method:
- opens the file "FakeCustomerData.txt" and reads it in line by line
- for each line (not including the header line) it splits the data...
- pulls out the birthdate for that customer...
- completes another split on the birthdate to pull out the birth YEAR...
- and keeps track of the number
of times we have seen each year.
- This method sorts and prints the final counts from each year (see
the screenshots following this set of instructions
- TESTING :
- Consider the screenshot below and compare your counts to the partial
list of counts listed there.
...
Final Submission
This week I will again ask you to submit your code for electronic grading, using the eLearning submission system.
Follow the directions on the system to select the appropriate course and assignment and submit
If you worked with a partner, make sure that both you and your partner's names are in the comment header at the top of the file. Submit what you have by the end of lab.