26 7.4 Inference for a Proportion
[latexpage]
If we are working with categorical data our parameter of interest is often the population proportion, p. The point estimate for p is [latex]\hat{p}[/latex] = [latex]\frac{x}{n}[/latex] where x is the number of successes and n is the sample size. It is also sometimes denoted as [latex]\({p}^{\prime }[/latex]. We saw previously that if we meet conditions, np ≥ 10 and n(1 − p) ≥ 10, we can apply the central limit theorem and assume:
[latex]\hat{p}[/latex] ~N[latex]\left(p,\sqrt{\frac{p\cdot q}{n}}\right)[/latex]
How do you know you are dealing with a proportion problem? First, the underlying distribution is a binomial distribution. This will be categorical data with no mention of a mean or average. If X is a binomial random variable, then X ~ B(n, p) where n is the number of trials and p is the probability of a success.
Hypothesis Tests for p
When you perform a hypothesis test of a single population proportion p, the steps are exactly the same as what we have seen before, however we will calculate our Test Statistic differently. When conducting a test for p, our hypotheses will look as follows:
- Ho: p = p0
- Ha: p (<,>,≠) p0
Recall, the general form of a test statistic is:
[latex]\text{Z=}\frac{\text{point estimate - null value}}{\text{SE}}[/latex]
For the normal distribution of proportions, the z-score formula is as follows:
If [latex]\hat{p}[/latex] ~N[latex]\left(p,\sqrt{\frac{p\cdot q}{n}}\right)[/latex]then the z-score formula is:
\(z=\frac{\hat{p}\text{-p}}{\sqrt{\frac{pq}{n}}}\)
Intuitively, you might think we use this as our test statistic but remember two things:
- We do not actually know p
- In a hypothesis test we begin by assuming the null is true
Sure to these facts, we substitute in p0 for p in the standard error which gives us:
[latex]\sigma_{\hat{p}}\text{ = }\sqrt{\frac{p_o\text{(1-} p_o )}{n}}[/latex]
We then can find a p-value and make our decision as normal
Example
Joon believes that 50% of first-time brides in the United States are younger than their grooms. She performs a hypothesis test to determine if the percentage is the same or different from 50%. Joon samples 100 first-time brides and 53 reply that they are younger than their grooms. For the hypothesis test, she uses a 1% level of significance.
You Try It
Confidence Intervals for p
During an election year, we see articles in the newspaper that state confidence intervals in terms of proportions or percentages. For example, a poll for a particular candidate running for president might show that the candidate has 40% of the vote within three percentage points (if the sample is large enough). Often, election polls are calculated with 95% confidence, so, the pollsters would be 95% confident that the true proportion of voters who favored the candidate would be between 0.37 and 0.43: (0.40 – 0.03,0.40 + 0.03).
Investors in the stock market are interested in the true proportion of stocks that go up and down each week. Businesses that sell personal computers are interested in the proportion of households in the United States that own personal computers. Confidence intervals can be calculated for the true proportion of stocks that go up or down each week and for the true proportion of households in the United States that own personal computers.
Constructing Confidence Intervals for p
The structure of, and procedure to find the confidence interval for a proportion is similar to that for the population mean, but the formulas are different.
The general format of a confidence interval is:
\((PE-MoE, PE+MoE)\)
The population parameter is p. ˆThe point estimate for p, is [latex]\hat{p}[/latex], the sample proportion.
The margin of error bound for a proportion is:
\(MoE=\left({z}_{\frac{\alpha }{2}}\right)\left(\sqrt{\frac{\hat{p}\hat{q}}{n}}\right)\) where [latex]\hat{q} \text{= 1 –} \hat{p}[/latex]
This formula is similar to the error bound formula for a mean, except that the “appropriate standard error” is different. For a mean, when the population standard deviation is known, the appropriate standard deviation that we use is [latex]\frac{\sigma }{\sqrt{n}}[/latex]. For a proportion, the appropriate standard deviation is \(\sqrt{\frac{\hat{p}\hat{q}}{n}}\).
However, in the error bound formula, we use \(\sqrt{\frac{\hat{p}\hat{q}}{n}}\) as the standard deviation, instead of \(\sqrt{\frac{pq}{n}}\).
In the error bound formula, the sample proportions [latex]\hat{p}[/latex] and [latex]\hat{q}[/latex] are estimates of the unknown population proportions p and q. The estimated proportions [latex]\hat{p}[/latex] and [latex]\hat{q}[/latex] are used because p and q are not known. The sample proportions p̂ and [latex]\hat{q}[/latex] are calculated from the data: p̂ is the estimated proportion of successes, and [latex]\hat{q}[/latex] is the estimated proportion of failures.
Example
Suppose that a market research firm is hired to estimate the percent of adults living in a large city who have cell phones. Five hundred randomly selected adult residents in this city are surveyed to determine whether they have cell phones. Of the 500 people surveyed, 421 responded yes – they own cell phones. Using a 95% confidence level, compute a confidence interval estimate for the true proportion of adult residents of this city who have cell phones.
Your turn!
Suppose 250 randomly selected people are surveyed to determine if they own a tablet. Of the 250 surveyed, 98 reported owning a tablet. Using a 95% confidence level, compute a confidence interval estimate for the true proportion of people who own tablets.
Data that describes qualities, or puts individuals into categories
The number of individuals that have a characteristic we are interested in divided by the total number in the population
The value that is calculated from a sample used to estimate an unknown population parameter
A random variable that counts the number of successes in a fixed number (n) of independent Bernoulli trials each with probability of a success (p)
A measure of how far what you observed is from the hypothesized (or claimed) value
An interval built around a point estimate for an unknown population parameter
How much a point estimate can be expected to differ from the true population value; made up of the standard error multiplied by the critical value