Computational Statistics - STAT 4772/STAT 5772 - Fall 2014

  1. VIP: Final Exam Study Guide and practice questions and resources, etc.

    Final Exam period: 3-4:50 pm on Tuesday, December 16th.

  2. 1990s folder - population estimates - United States Census.

  3. Graduate credit presentations: graduateProj5772.txt - Thursday or Friday of finals week?

  4. Week #14 resources... (Excel, R, SPSS, study guide)

    For Excel and R simulation: When I Procrastinate, I Write Code - see November 20th, 2014 blog entry.

  5. Week #13 resources...

  6. Week #12 resources...

  7. R Assignment due on Thursday: Logistic Regression - GPA with GRE held constant and 4 ranks.

    Week #11 resources...

  8. Week 10 - Oct 28-30 classes... Logistic Regression.

  9. Week #9 resources...

    Assignment due on Oct 30: ANOVA and ggplot and Regression...

  10. Quiz #1 data set...

    ...QUIZ on Tuesday, October 21st...

  11. VIP: Study Guide/Outline: Quiz One on Tuesday, October 21st.
    ... QUIZ ONE on Tuesday, October 21st...

    Quiz One - Readings/pages from DSUR, the fall of 2014 textbook:
    You have this as a HANDOUT from Thursday, October 21st.

  12. Week 8: ANOVA and Regression in R. Regression with n - 1 dummy variables == ANOVA with a the factor variable having n categories????

  13. Week #7 - Oct 7 - Oct 9...

    Assignment due Tuesday October 14th: Oct7th2014Assignment.pdf...

  14. September 30th to October 2nd: Week #6...

  15. September 23rd: Week #5...

  16. Bootstrapping and R - watch the 15 minute 11 second long video. We did the BootStrapping.R script in class on Thursday, September 18th.

  17. ISwR library, TTests, Dalgaard book examples.

  18. Data files for SPSS and for R: Predicting Album Sales - linear regression models and scatterplots in R and in SPSS.

    Up and Running with R by Barton Poulson.

    SPSS Statistics Essential Training 2011 from

    1. See 7. Charts for Two Variables: 7.2. Creating scatterplots
    2. See 10. Descriptive Statistics for Three or More Variables: 10.2. Calculating multiple regression.

    What to turn in for the assignment:
      1. Using the AlbumSales.csv file and 
         the R stat software through RStudio, you will produce 4 scatterplots. 
         There are 4 independent variables and 1 dependent variable.
         Investigate visually the relationship of each IV with the DV.
         You will do the regression in R using the 4 IVs and the 1 DV.
         If you need to review the R, go to and use the
         Up and Running with R video tutorial set.
         5. Charts for Associations 
            5.2. Creating scatterplots      4m 15s
         6. Statistics for Associations
            6.2. Computing a regression     6m 33s
      2. Using the AlbumSales.sav SPSS data set and SPSS software,
         produce the 4 different scatterplots (Each IV with the DV).
         At the FIT line for the LINEAR best fitting regression line
         to each of your scatterplot charts.
         Run the SPSS Regression with the 4 IVs and the 1 DV.
         You do NOT need to do anything except choose the 4 IVs and the 1 DV.
         You will just use the DEFAULTS in the Regression dialogue.
         It is set to ENTER.   That is the default.
         I mentioned STEPWISE and BACKWARD and FORWARD methods in class
         just to see who has heard of those and to preview what we will
         get to 8 or so weeks from now, probably sometime in NOVEMBER!

  19. Thursday, 09/11/2014 class: Using the ggplot2 package and qplot():
    # Introduction to ggplot2 and the mpg dataset (from the qqplot2 library)
    # Look at the data from ggplot2 libary that we're going to use - miles per gallon
    # Basic scatterplot
    qplot(displ, hwy, data = mpg)
    # Add an additional variable with aesthetics: colour, shape, size
    qplot(displ, hwy, data = mpg, colour = class)
    qplot(displ, hwy, data = mpg, colour = cyl)
    qplot(displ, hwy, data = mpg, shape = factor(cyl))
    qplot(displ, hwy, data = mpg, shape = factor(cyl), colour = factor(cyl))
    # Add an additional variable with faceting
    qplot(displ, hwy, data = mpg)
    qplot(displ, hwy, data = mpg) + facet_grid(. ~ cyl)
    qplot(displ, hwy, data = mpg) + facet_grid(drv ~ .)
    qplot(displ, hwy, data = mpg) + facet_grid(drv ~ cyl)
    qplot(displ, hwy, data = mpg) + facet_wrap(~ class)
    # Deal with overplotting by using JITTER
    qplot(cty, hwy, data = mpg)
    qplot(cty, hwy, data = mpg, geom = "jitter")
    qplot(cty, hwy, data = mpg, geom = "jitter", colour = year)
    qplot(cty, hwy, data = mpg, geom = "jitter", colour = class)
    # Note: On 09/11/Thursday 
    #      We did NOT do the following two R qplots 
    #      with the added very smooth GEOM method lm (linear model)
    qplot(cty, hwy, data = mpg) + geom_smooth(method = "lm")
    qplot(cty, hwy, data = mpg, geom = "jitter", colour = class) +
             geom_smooth(method = "lm")
    # Reordering + boxplots
    qplot(class, hwy, data = mpg)
    qplot(reorder(class, hwy), hwy, data = mpg)
    qplot(reorder(class, hwy), hwy, data = mpg, geom = "jitter")
    qplot(reorder(class, hwy), hwy, data = mpg, geom = "boxplot")
    qplot(reorder(class, hwy), hwy, data = mpg, geom = c("jitter", "boxplot"))
  20. SPSS example files for week one introduction to SPSS, with one R example for comparision.

    Crosstabs and CHI SQUARE, recoding variables from interval to ordinal (birthrate and female life expectancy for 15 and for 122 countries), linear regression, scatter plots. The best fitting linear regression line goes through the point that is the mean for the DV and the mean for the IV. DV = dependendent variable = y. IV = independent variable = x.

    SPSS Statistics Essential Training with Barton Poulson. This is another resource. (5 hours and 5 minutes).

    In this course, author Barton Poulson takes a practical, visual, and non-mathematical approach to the basics of statistical concepts and data analysis in SPSS, the statistical package for business, government, research, and academic organization. From importing spreadsheets to creating regression models to exporting presentation graphics, this course covers all the basics, with an emphasis on clarity, interpretation, communicability, and application.

  21. Go to and login with your UNI CatID and password. Watch some of the R tutorial material to prepare for class #2. We will use RStudio as the interface to R statistical software.

    Up and Running with R with Barton Poulson. (2 hours 25 minutes).

    Join author Barton Poulson as he introduces the R statistical processing language, including how to install R on your computer, read data from SPSS and spreadsheets, and use packages for advanced R functions.

    The course continues with examples on how to create charts and plots, check statistical assumptions and the reliability of your data, look for data outliers, and use other data analysis tools. Finally, learn how to get charts and tables out of R and share your results with presentations and web pages.

  22. STAT 497C - Topics in R Statistical Language course from Penn State.

  23. Verzani Simple R book - online PDF version.

  24. R and RStudio and data frames...

    1. R statements and script for class #3: class3.R.txt.

    2. worms.txt data file

    3. yvalues.txt data file for Thursday Class #4 practice.
  25. ...

Chapter 12: Graphics - The plot() function: The Art of R Programming: A Tour of Statistical Software Design ... UNI library online book.