Proportions

Confidence intervals and tests of hypothesis for count data can be done using the mean and standard deviation for the binomial distribution. However. sometimes it is convenient to use proportions (e.g., the fraction of the population who approve of Clinton) rather than the actual count (the number of people who approve of Clinton). If the sample size is n, the proportion can be obtained from the count by division by n: p-hat = X/n (where X is the number of people who approve of Clinton, and p-hat is defined below).

Notation:We shall use p to denote the proportion of in the entire population (this is µ, the mean for the entire population if you are scoring yes as 1 and no as 0). We shall use p-hat (this should be a lowercase p with a caret (^) circumflex) to denote the proportion in the sample (this is x-bar, the mean of the sample).

Conversion to proportion from count data entails division by n, hence p-hat is X/n as noted above, and the stadard deviation of p-hat (recall that p-hat is a random variable) is (p(1-p)/n)^.5 (this is ((np(1-p))^.5)/n). The standard deviation of p-hat is denoted as sigma-p-hat (a lower case sigma with p-hat as a subscript (this is the same as sigma-x-bar).

Examples:

Confidence interval
If 612 out of 1100 students are males, what is the 95% confidence interval for the proportion of students which are males?
p-hat = 612/1100 = .5564.
Although sigma-p-hat is defined using p, the proportion of the population, we do not know what p is (we are constructing a confidence interval for p). Therefore we must use p-hat in lieu of p. sigma-p-hat = (.56 × .44 /1100)^.5 = .015
The z-score for a 95% confidence interval is 1.96
Therefore, the 95% confidence interval for p is
(.56 - 1.96 × .015, .56 + 1.96 × .015) = (.53, .59)

Two tailed test of hypothesis
If you are told that 50% of rabbits are male, but find 94 out of a sample of 197 are male. Do you question the hypothesis?
p-hat = 94/197 = .4772
sigma-p-hat = (.5 × .5 / 197)^.5 = .0356; Note that if p is known, we should always use it rather than p-hat in sigma-p-hat.
z = (.477 - .5)/.0356 = -.65, which provides a P-value of .52. since this is large, you would not reject the hypothesis.

One tailed test of hypothesis
If President Clinton said 80% of Americans approve of his policies, but you found only 112 out of a sample of 144 approved of his policies, would you question him?
He is really claiming at least 80% approve of his policies, you would only question him if too few people approved, hence this is a one tailed test.
p-hat = .7778
sigma-p-hat = (.8 × .2/144)^.5 =.0333.
z = (.78-.8)/.033 = -.06; the corresponding P-value is .27; since this is large, you would not reject the hypothesis.

Remarks
Recall that 35% = .35; do not get confused with the position of decimal points

The value of sigma-p-hat depends on the value of p. However, it is readily verified that the greatest value is obtained at p=.5. Therefore problems of the form how large must n be to obtain a confidence interval with a radius less than or equal to a given value can be solved by using n=.5.

Competencies: If you get 527 heads when you flip a coin 1000 times, do you question that it is fair (at what significance level?)
If 712 out of 1200 persons like the color blue, what is the 95% confidence interval for the proportion of the population which likes the color blue?

return to index

Questions?