Problems are from:
Larose, Daniel T. 2011. Discovering Statistics, Brief Version. Freeman

p.5 (1.1) 2,8
2 - a) no car b) I did not have a car or a way to leave c) your choice, the table is more precise d) it would not fit on the graph
8 - except for sadness and anger, they were in the same rank order (this is why we will study ways to characterize the magnitude of such differences)

p.15 (1.2) 4,7,11,14,21,28,29(population cannot be determined),31,32,33,41,42,45,52
4 - a quantitative variable is characterized with a number, a qualitative variable is not (or if it is, the properties of the real numbers are not used in its analysis)
7 - a sample is a subset of a population; we would like samples to be representative, but that is not part of the definition
11 - a) quantitative b) interval
14 - a) quantitative b) ratio (actually ordinal is a better description, but since the person next to you did not take the SAT test, the question is moot)
21 - a) qualitative b) nominal
28 - the population is women, the sample is her (female) patients
29 - the population is not indicated (PCC students, comunity college students? college students? adults?); the sample is those who answered the question (a set of PCC students)
31 - Inferential, assuming it is intended to give information about housing prices in Newington
32 - Inferential, an inferrance is made from a sample
33 - Neither, there is no way one could infer that everyone will reduce cholesteral by 10% from an average reduction of 10% in a sample
41 - a) the companies b) company name, number of employees, economic sector c) number of employees d) company name, economic sector e) St. John's Health Center, 1755, Health services
42 - a) various states b) state name, proportion of corn which is GE, characterization of most frequent variety c) proportion of corn which is GE d) name of state, characterization of variety e) Texas, .79, Herbicide tolerant
45 - I included this because zip codes could be characterized as quantitative (nominal) or perhaps even quantitative (ordinal), but most people would describe them as qualitative; Also recall as in th eprevious problems thhat the actual hostitals are the elements, while their names are the values of the first variable, we use their names to refer to both thhe elements and the names of the elements
52 - a) it is a statistic, since it is calculated from data from a sample b) The expected life of new bullbs is 2000 hours, based on a sample of 100 bulbs.

p.40 (2.1) 1,2,7,9,13
1 - We are trying to communicate information, it is not easy to comprehend all the data
2 - frequency refers to raw count, relative frequency is percent of the whole
7 - a) Arid - 5, Temperate - 4, Tropical - 1 b) Arid - .5, Temperate - .4, Tropical - .1 c) d) e) f) I am not doing graphics, freuency and relative frequency graphs look thhe same, but are labelled differently
9 - 1213.9+1034.1+1736.0+1939.2=5923.2, 1213.9/5923.2=.205, etc. a)R - .205, C - .175, I - .293, T - .327 b) c) d) e) I do not do graphics here 13 - No, because they do not reflect parts of a whole, individuals may be in several categories (you can construct a pie chart for each activity, e.g., 79% listen to music while doing chores, 21% do not listen to music while doing chores

p.55 (2.2) 1,2,12,13,14,15,16,25,26,27,28,36(also stem-and-leaf plot)
1 - Bar charts and pie charts can be used with qualitative or quantitative data (but require that there are few categories in order to be useful). Stem-and-leaf plots, histograms, and box-and-whisker plots use the structure of the real numbers for their construction.
2 - With a histogram or box-and-whisker plot, information is lost in order to communicate information (one could not comprehend the raw data)
12 - a) There are 11 classes (I include 0-5 because 0 is so close to the body of the graph). b) the class width is 2.5 for each class c) frequency, since the actual number of stocks is listed on the axis
13 - a) divide the numbers on the vertical axis by 19 b) multiply the numbers on the vertical axis by 19 c) I do not think this is designied to give information about a larger population (such as the NYSE), hence I would say the population size is 19
14 - a) at most 2 b) at most 2/19 c) 5 d) 5/19
15 - a) 0 b) 0 c) 25-27.5, 4/19 d)at least 1, at most 3 e) 0
16 - left skewed
25 - a) Asia - 5, NA - 2, Africa - 1, SA - 1, Europe - 1 b) divide the answers in part a by 10 c) histograms are only for quantitative data, d) dotplots are defined with numbers labelling the horizontal axis e) stem-and-leaf displays are only for quantitative data
26 - 900-1100: .4, 1100-1300: .2, 1300-1500: 0, 1500-1700: .2, 1700-1900: .2
27 - I do not do graphs here
28 - I do not do graphs here, but the vertical axis would be relabelled
36 - a) 51-1, 59-1, 60-1, 67-1, 68-2, 70-2, 72-1 75-1, 77-1, 78-1, 81-1, 82-1, 85-1, 86-2, 91-1, 94-1, 98-1 b) 51-.05, 59-.05, 60-.05, 67-.05, 68-.1, 70-.1, 72-.05 75-.05, 77-.05, 78-.05, 81-.05, 82-.05, 85-.05, 86-.1, 91-.05, 94-.05, 98-.05
Stem-and-leaf plot:
5-19
6-0788
7-002578
8-12566
9-145
c) For a histogram I would use 48-62:3, 63-77:8, 78-92:7, 93-107:2 (most people would use the grouping given by the stem-and-leaf plot)

p.71 (2.4) 6,7
6- a) bar chart b) There is a little extra bread added at the bottom of the wheat slice ("bias distortion or embellishment"). The diagonal presentation of the slices distorts bun/roll vs. rye. c) a clearer presentation would have nonoverlapping loaves of varying lengths.
7- a) the zero point that would make the bodel heights proportional would have to be well up the side of the graph.

p.89 (3.1) 3,4(also midrange(from lecture)),13,19,21,22
3- The mean is likely to be changed more by an extreme value than the median
4- a) 28/8=3.5 b) (put in rank order) (3+3)/2=3 midrange) (1+7)/2=4
13- a) English, but it is only 1/5 of the majors, hence no b) no, this is qualitative data c) there are fewer Economics majors than Engllish majors
19- a) skewed to the left, midrange less than mean less than median )I do not care about the mode) b) skewed to right midrange greater than mean greater than median c) symmetric (almost), mean, median, and midrange will be close to each other
21- mean greater than median and max further from either than the min suggests skewed to the right
22- yes, moving all the data to the left will move all the measures of location to the left by that amount

p.105 (3.2) 1,4,5,6,27,36
1- a distance, in this case from the mean
4- 25-0 = 25
5- (10+25+0+15+10)/5=12=x-bar; ((10-12)^2+(25-12)^2+(0-12)^2+(15-12)^2+(10-12)^2)/(5-1) = 82.5 = s^2; ((10-12)^2+(25-12)^2+(0-12)^2+(15-12)^2+(10-12)^2)/5 = 66 = sigma^2
6- 82.5^.5 = 9.08 = s; 66^.5 = 8.12 = sigma
27- a) 3 inches (we are only considering the magnitudes of the deviations, the average (mean) deviation is always 0) b) 2 (the mean is 69, the deviations range from 1 to 3 in magnitude)
36- A = {0, 5, 5, 5, 10} (range = 10, s = 3.54; B = {1, 1, 5, 9, 9} (range = 8, s = 4) (the hint is bogus, the range is more sensative to extremes than the standard deviation)

p.112 (3.3) 1,5,9,11(also median(from lecture))
1- You are approximating the all the data in a category with a single value, hence the calculated statistics will be approximations
5- .25x50+.4x80+.35x70 = 69
9- .995, 2.995, 4.995, 6.995, 8.995
11- (10x5+20x10+20x15+10x20+10x25)/(10+20+20+10+10)=14.29; for the median, there are 70 data, hence the median is halfway between the 35th and 36th data; those data are in the 12.5-17.5 category; assuming a uniform distribution in that category, the data are 5/20 = .25 apart (.125 from the endpoints); the median is between the 5th and 6th data in that category; 12.5+5x.25 = 13.75 is the estimate for the median.

p.120 (3.4) 1,4,7,9,13,16
1- it is a value that 5% of the data is above, and 95% is below
4- if a z-score is 0, the datum is at the mean, if a z-score is positive, the datum is above the mean, if a z-score is negative the datum is below the mean
7- .50x6=3, (20+20)/2=20 (all definitions of the 50th percentile or Q2 are consistent with the definition of median); .75x6=4.5, the fifth datum is 24; .25x6=11.5, the second datum is 18 (you are welcome to use alternative definitions of quartiles and percentiles) (quartiles and percentiles are not of interest for such small data sets)
9- (36+24+20+20+18+17)/6=22.5, s = 7.04; a) (24-22.5)/7.04 = .21 b) (17-22.5)/7.04 = -.78 c) (18-22.5)/7.04 = -.64
13- a) .25x12 = 3, (100+110)/2 = 105 b) .95x12 = 11.4, the 12th datum is 130
16- .05x15=.75, the first datum is 2.0 b) .95x15=14.25, the 15th datum is 14.7 (answers are in millions)

p.127 (3.5) 2,6,7,13ab
2- the magnitude of the datum minus the mean is less than 2s
6- a) 95% b) 99.7%
7- a) 80 to 120 b) 120 c) 80
13ab- a) i- 1602-3x500=102 (recall that all numbers reflects millions) ii- it would indeed be a very slow day, but could occur b) 68% within 1 standard deviation means 32% outside means 16% with z-scores less than -1, hence more shares are traded on the day where trading was at the 25th percentile

p.140 (3.6) 2,5,6,7,9,21,22,24
2- a) e.g., {0,1,1,1,1,5,17} b) e.g., {7,7,7,7,7,7,7,7} c) e.g., {1,1,1,1,1,1,1,1,17} d) not possible, the median is Q2 by definition
5- Q1=68, Q2=76, Q3=85.5
6- 85.5-68=17.5
7- 51 68 76 85.5 98
9- I do not do graphs here 21- 2.0 2.8 4.2 7.1 14.7 (all numbers represent millions)
22- 7.1-2.8=4.3, there is about 4.3 million difference in high usage versus low usage supplements
24- I do not do graphics here

[First Test]

p.205 (5.1) 1-
8- all probabilities must be between 0 and 1.
11-
12- 1/6 (there are 6 equally likely outcomes)
16- 1/3 (= 1/6 + 1/6)
17-
27-
28- the event 'observing a 3' is a subset of te event 'observing a 3 or a 5'

31-
37-
43-
44- they are not mutually exclusive, and you do not know the nature of the overlap (at this stage all you can say is that the probabilities do not sum to one)

p.214 (5.2) 4- a male, because football players are a subset of male students
18- 12/52 = 3/13
19-
20- 1 - 3/13 = 10/13
21-
22- 12/52 + 13/52 - 3/52 = 22/52 = 11/26
23-
27-
34- 6/36 + 2/36 = 8/36 = 2/9
37-
41

p.228 (5.3) 3-
7-
8- 0.6
9-
10- 0.6 + 0.4 - 0.6*0.4 = 0.76
15-
19-
29-
30- no, their intersection is not empty
39-
42- preferring and owning a pet are different. many people pwn more than one type of pet.hence "owns" must be replaced by "prefers" in order for this problem to make any sense. given this interpretation: a) 100/180; b) 50/120; c) 30/180; 20/120
43-
45-

p.241 (5.4) 3-
7-
9-
13-
15-
17-
19-
21-
25-
27-
29-
31-
32- ABC, ABD, ACD, BCD; 4 (choosing 3 is excluding 1)
37-
41-
48- 15C3 = (15*14*13)/(3*2*1) = 455
55-

p.260 (6.1) 5-
7-
15-
16- yes, the probabilities are non-negative and sum to one
21-
25-

p.271 (6.2) 5-
7-
15-
16- ((20*19*18*17*16)/(5*4*3*2*1))*.5^20 = .0148
18- mean = 5, variance = 10*.5*.5 = 2.5, standard deviation = 2.5^.5 = 1.58; if you flip 10 coins many times, the mean number of heads will be 5
19-
21-
24- mean = 3*(1/6)=.5; variance = 3*(1/6)*(5/6) = 5/12 = .42; standard deviation = .42^.5 = .65; if you roll three pairs of dice many times, the average number of pairs wil lbe .5
32- mean: 6*.2 = 1.2; variance 6*.2*.8 = .96; standard deviation .96^.5 = .98; the average score of ignorant students would be 1.2
33-
35-
43-
45-

p.282 (6.3) 1-
5-
6- .5, the distribution is symmetric
7-
9-
10- form the z-scores -1 and positive 1; .8413-.1587 = .6826
12- (785-3285)/500 = -5 ~ 0; (5785-3285)/500 = 5 ~ 1; 1-0 = 1
14- B is more spread out, hence has the larger standard deviation (6)
23-
29

p.293 (6.4) 3-
5-
7-
11-
14- .025 to the left and .10 to the right from known values, hence total = .125 (.1253)
19-
21-
37-
43-
48- 1-.5120 = .4880 to the left; z=-.03 from the table
50- use .7486 or .7517 in the table, hence .67 or .68
57-

p.310 (6.5) 1-
11-
15-
19-
22- .99 between means .005 each tail or .995 to the left of the right value. z = 2.57 or 2.58; 70-2.58*10=44.2, 70+2.58*10=95.8
27-
34- mean = 40*.5=20, standard deviation = (40*.5*.5)^.5=3.16; use 17.5 and 22.5 to include 18 and 22; (17.5-20)/3.16=-.79, (22.5-20)/3.16=.79; from the table .7852-.2148=.5704
35-
41-
47-
49-
55-(is this continuous or discrete?)
59-
61-
69-
71-

p.331 (7.1) 2,5,11,17,25,27,30,32,39,40,43
p.340 (7.2) 3,7,8,13,14,19,20,25,26,31

[Second Test]

p.351 (7.3) 3 -
16 - z = (.011-.01)/(.01×.99/500)^.5 = .22 ~ .59; 1-.59 = .41
21 -
25 -
27 -

p.369 (8.1) 4 -
15 -
17 -
21 - Mbr> 23 -
29 -
31 -
33 -
37 -
41 -

p.382 (8.2) 19 (use s for sigma)

p.392 (8.3) 5 -
9 -
11 -
12 - a) 2.58; b) n×p-hat = 5, n×(1-p-hat) = 20, so their conditions are barely met; c) 2.58 ×(.2×.8/25)^.5 = .2064; d) .2 ± .2064 yields (-.0064, .4064)
13 -
20 - a) 1.96 × (.1×.9/100)^.5 = .0588; b) 1.96 × (.2×.8/100)^.5 = .0784; c)1.96 × (.3×.7/100)^.5 = .0898; d)1.96 × (.4×.6/100)^.5 = .0960; e)1.96 × (.5×.5/100)^.5 = .0980
31 -
43 -
45 -

p.416 (9.1) 5 -
10 - H0: µ (less than or) equal to -4; HA: &$181; greater than -4
11 -
16 - a) H0: µ = 339.1; HA: µ not equal 339.1; b) conclude there was no change when there was no change (actually conclude there was a small change when there was a small change), conclude there was a change when there was a change; c) conclude that there was a change when in fact there was not a change; d) conclude there was no change when in fact there was a change [actually, I am confused how they got a fractional number of accidents in one year]
17 -

p.430 (9.2) 4 - no, alpha is decided before the experiment is done (it is the significance level at which the test is being performed).
5 -
11 -
15 -
16 - H0: µ = 20, HA: µ not equal 20; reject if p less than .01; z = (27-20)/(5/49^.5) = 9.8; c) with z so large, the p-value is essentially 0; d) since 0 is less than .01, reject the null hypothesis, there is evidence that the mean is different from 20
27 -

p.441 (9.3) 8,9,11,15,16,19,22,23
p.464(9.5) 4,7,9,11,15,17,23
p.532 (11.1) 2,5,9,11,13,15,21,27
p.543 (11.2) 4,5,7,11,12,15,20,25
20 - The expected values are 3540.18, 1560.90, 1136.92, 3406.82, 1502.10, 1136.92; which produces X^2 = 17.57, which is large for two degrees of freedom, hence we reject H0 and conclude year and educational level are not independent.

[Third Test]

p.155 (4.1) 4,9,10,21
p.164 (4.2) 3,10,14,15,16,21,22,26,27
p.174 (4.3) 4,10,11,12
p.185 (4.4) 3,7ab,11,15abc,19

[Final]