Computational Statistics - STAT 4772/STAT 5772 - Fall 2015







  1. Normam Matloff The Art of R Programming: A Tour of Statistical Software Design.

  2. R/SPSS: Albums assignment due on December 28th. SPSS and R and Linaar and Multiple Regression along with Scatterplots. Predicting record album sales.

  3. Logistic Regression with GRE, GPA, rank and admit as the binary outcome variable.

  4. Bootstrapping using R: assignment is to watch the video before the Thursday 11/19 class. 14 minute 11 second video.

  5. The R code for Chapters 6 and 7 - Ch 6 Basic Graphs - Ch 7 Basic Statistics... Group preseetations on Thursday Oct 15th....

  6. R code from Thursday Oct 8th; Choosing Groups for group presentations using sample() and random seeding with 88.

  7. WEKA - Data Mining with Weka. Watch/preview 1.1, 1.2 and 1.3 before Tuesday 10/13, if possible.

    Waikato Environment for Knowledge Analysis = W E K A

  8. Chapter 05 R code for Tuesday, October 6th.

  9. Assignment for Thursday, October 8th (or Friday, October 9th): Problems 2.1 through 2.6 and Problems 6.1 through 6.12.
    From Verzani PDF online textbook.

  10. Tuesday 09/29/2015 Class #11... Comparing means with ANOVA. ANOVA and linear regression compared.


  11. Homework using countries.csv handed out on Thursday, 09/24/2015.

  12. Thursday, September 24th R code on the country.csv data file.
      c <- read.csv("C:/5772/country.csv", header = T)
    
      lifeExpFemale70plus <- subset(c, lifeexpf >= 70, select = c("lifeexpf", "region", "birthrate"))
    
      ?subset()
    

    October 2014: Quiz #1 data set... country.csv Comma Separated Values .csv format.

    Study Guide/Outline: Quiz One and outline of readings from DSUR for homework assigned on 09/24/2015.

    DSUR - Readings from Discovering Statistics Using R, the fall of 2013 textbook.
    Needed for doing the country.csv R HOMEWORK sssignment (due Oct 1st, 2015).


  13. Tuesday, September 22nd class: Multiple Regression models and user defined functions. What is 20 factorial?

  14. 09/03/Thu: PLOT and LINEAR REGRESSION by John Verzani - Pages 77-84... Class #4 Sep 3rd, 2015.

    Monday, Sep 14th: Verzani Section13 Regression Analysis... R code not covered on Thursday, September 10th. NEW and VIP: Assignment One modification (possibly a welcome change).

    Assignment One email: email090315.txt...
    Page 83-84 of Verzani - 13.1 and 13.2

    Simplify usage of lm with simple.lm from UsingR package.

    NEW and VIP: Assignment One modification (possibly a welcome change).

    UsingR package and simple.lm() and predict() examples - the class #6 topic from 09/10/2015 Thursday.


  15. 09/01/Tue: R In Action Chapter 03 code. We started this in class on Tuesday, September 1st (class #3). Study chapter 3 carefully. Read, take notes, try things in R through RStudio interface.

  16. STAT 4772/5772 Syllabus...

  17. Textbook: R in Action: Data Analysis and Graphics with R by Robert Kabcoff (Second Edition edition - June 6th, 2015)

  18. Thursday, 09/11/2014 class: Using the ggplot2 package and qplot():
    
    # Introduction to ggplot2 and the mpg dataset (from the qqplot2 library)
    
    install.packages("ggplot2")
    library(ggplot2)
    
    # Look at the data from ggplot2 libary that we're going to use - miles per gallon
    ?mpg
    head(mpg)
    str(mpg)
    names(mpg)
    
    # Basic scatterplot
    qplot(displ, hwy, data = mpg)
    
    # Add an additional variable with aesthetics: colour, shape, size
    qplot(displ, hwy, data = mpg, colour = class)
    qplot(displ, hwy, data = mpg, colour = cyl)
    qplot(displ, hwy, data = mpg, shape = factor(cyl))
    qplot(displ, hwy, data = mpg, shape = factor(cyl), colour = factor(cyl))
    
    # Add an additional variable with faceting
    qplot(displ, hwy, data = mpg)
    qplot(displ, hwy, data = mpg) + facet_grid(. ~ cyl)
    qplot(displ, hwy, data = mpg) + facet_grid(drv ~ .)
    qplot(displ, hwy, data = mpg) + facet_grid(drv ~ cyl)
    qplot(displ, hwy, data = mpg) + facet_wrap(~ class)
    
    # Deal with overplotting by using JITTER
    qplot(cty, hwy, data = mpg)
    qplot(cty, hwy, data = mpg, geom = "jitter")
    qplot(cty, hwy, data = mpg, geom = "jitter", colour = year)
    qplot(cty, hwy, data = mpg, geom = "jitter", colour = class)
    
    # Note: On 09/11/Thursday 
    #      We did NOT do the following two R qplots 
    #      with the added very smooth GEOM method lm (linear model)
     
    qplot(cty, hwy, data = mpg) + geom_smooth(method = "lm")
    
    qplot(cty, hwy, data = mpg, geom = "jitter", colour = class) +
             geom_smooth(method = "lm")
    
    # Reordering + boxplots
    qplot(class, hwy, data = mpg)
    qplot(reorder(class, hwy), hwy, data = mpg)
    qplot(reorder(class, hwy), hwy, data = mpg, geom = "jitter")
    qplot(reorder(class, hwy), hwy, data = mpg, geom = "boxplot")
    qplot(reorder(class, hwy), hwy, data = mpg, geom = c("jitter", "boxplot"))
    
    
  19. SPSS example files for week one introduction to SPSS, with one R example for comparision.

    Crosstabs and CHI SQUARE, recoding variables from interval to ordinal (birthrate and female life expectancy for 15 and for 122 countries), linear regression, scatter plots. The best fitting linear regression line goes through the point that is the mean for the DV and the mean for the IV. DV = dependendent variable = y. IV = independent variable = x.

    SPSS Statistics Essential Training with Barton Poulson. This is another lynda.uni.edu resource. (5 hours and 5 minutes).

    In this course, author Barton Poulson takes a practical, visual, and non-mathematical approach to the basics of statistical concepts and data analysis in SPSS, the statistical package for business, government, research, and academic organization. From importing spreadsheets to creating regression models to exporting presentation graphics, this course covers all the basics, with an emphasis on clarity, interpretation, communicability, and application.

  20. Go to lynda.uni.edu and login with your UNI CatID and password. Watch some of the R tutorial material to prepare for class #2. We will use RStudio as the interface to R statistical software.

    Up and Running with R with Barton Poulson. (2 hours 25 minutes).

    Join author Barton Poulson as he introduces the R statistical processing language, including how to install R on your computer, read data from SPSS and spreadsheets, and use packages for advanced R functions.

    The course continues with examples on how to create charts and plots, check statistical assumptions and the reliability of your data, look for data outliers, and use other data analysis tools. Finally, learn how to get charts and tables out of R and share your results with presentations and web pages.


    Chapter 12: Graphics - The plot() function: The Art of R Programming: A Tour of Statistical Software Design ... UNI library online book.

    http://www.stat.wisc.edu/~st571-1/...

    ggplot2.org...

    Final Exam period: 3-4:50 pm on Tuesday, December 15th.