I. Introduction to Normal Distribution
The normal distribution is a symmetrical, bell-shaped curve characterized by its mean \( \mu \) and standard deviation \( \sigma \). Mathematically, it is defined by the probability density function:
\[
f(x | \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{ -\frac{(x – \mu)^2}{2\sigma^2}}
\]
Empirical Rule
Approximately 68% of the data falls within one standard deviation, 95% within two standard deviations, and 99.7% within three standard deviations from the mean.
II. Importance of Statistical Inference
Statistical inference allows us to make educated guesses about a population based on data from a sample. This is invaluable in various fields, from medicine to politics.
III. Percentiles and the Normal Distribution
A percentile indicates the relative standing of a data point within a data set. To find the 90th percentile, for instance, one can use the TI-84’s invNorm(0.9, mean, standard deviation)
function. To find what percentile a standardized value is, lets say 1.9, we can use the TI-84’s normcdf(1E-99, 1.9, 0, 1)
function.
IV. Sampling Distributions of Sample Means
General Overview
Sampling distributions of sample means represent the range and frequency of possible sample means for a given sample size. It’s a concept pivotal for understanding the Central Limit Theorem. The distribution shows how much sample means will vary around the population mean.
Relationship to Test Statistics and P-values
The sampling distribution serves as the foundation for hypothesis testing. A test statistic is calculated from sample data and is compared to the sampling distribution to determine how extreme the test statistic is. In essence, the test statistic serves a role similar to a z-score; it quantifies how many standard deviations away a sample statistic is from the population parameter, assuming the null hypothesis is true. The p-value is then the probability that the test statistic is as extreme, or more extreme, than what was observed, under the assumption that the null hypothesis is true.
V. Central Limit Theorem
The Central Limit Theorem (CLT) states that for a sufficiently large sample size, the sampling distribution of the sample mean will approximate a normal distribution, regardless of the shape of the population distribution.
VI. Assumptions and Tests
1. Normality
One-Sample T-Test
If the sample size is large (\( n \geq 30 \)), the Central Limit Theorem can be assumed to hold. For smaller sample sizes, make a dot plot or histogram and check the shape of the distribution.
Two-Sample T-Test
If either sample size is smaller than 30, graphical methods such as dot plots or histograms should be used to check for normality. If both sample sizes are large (\( n \geq 30 \)), the CLT can be assumed to hold.
One-Sample Z-Test for Proportion
Both \( np \) and \( n(1-p) \) should be greater than or equal to 10, as per AP guidelines.
Two-Sample Z-Test for Proportion
Both \( n_1p_1 \) and \( n_1(1-p_1) \), and \( n_2p_2 \) and \( n_2(1-p_2) \) should be greater than or equal to 10.
Chi-Squared Test for Independence
Each cell in the contingency table should have an expected frequency of 5 or more.
2. Independence
The sample size should not exceed 10% of the population, as per the 10% rule, to ensure independence.
3. Random Sampling
Data should be randomly sampled from the population.
VII. Exercises
Answer the following questions based on the lecture notes.
- What is the Empirical Rule and why is it important?
- Calculate the z-score for a data point that is 3 units above the mean that’s zero, given a standard deviation of 2.
- How do you find the 90th percentile using a TI-84 calculator?
- Explain the concept of a sampling distribution of sample means.
- What is the Central Limit Theorem, and why is it significant?
- For a One-Sample T-Test, how do we check the assumption of normality?
- For a Two-Sample Z-Test for Proportion, what are the AP guidelines for checking normality?
- Explain the 10% rule in the context of statistical independence.
- How do test statistics relate to z-scores?
- What is a p-value and how is it calculated?