Statistics Memory Tricks

🔢 Statistics

Mean = Add and Divide. Median = Middle. Mode = Most.

Measures of Central Tendency

Mean, median, and mode — locked in 3 seconds

Mean: add all values, divide by count. Median: the middle value when sorted. Mode: the value that appears most often. One sentence each.

🔢 Statistics

"Reject if p is less than alpha"

Hypothesis Testing Decision Rule

When to reject the null hypothesis — always

If your p-value < α (usually 0.05), reject H₀. If p > α, fail to reject. You never "accept" H₀ — you only fail to reject it.

🔢 Statistics

68 · 95 · 99.7

Empirical Rule (Normal Distribution)

The empirical rule — the three numbers every stats student memorizes

68% of data falls within 1 SD. 95% within 2 SD. 99.7% within 3 SD. These three numbers cover virtually every normal distribution question.

🔢 Statistics

Type I = False Alarm. Type II = Missed Call.

Error Types

Type I and Type II errors — impossible to mix up

Type I (α): reject H₀ when it's true — a false alarm. Type II (β): fail to reject H₀ when it's false — a missed call. Think: crying wolf vs. ignoring the wolf.

🔢 Statistics

r closer to ±1 = stronger

Correlation Coefficient

Reading correlation: closer to 1 or −1 is stronger

r = +1 is perfect positive. r = −1 is perfect negative. r = 0 means no linear relationship. The closer to either extreme, the stronger the correlation.

Standard Deviation

Standard deviation = spread of data. Small SD = data clustered near mean. Large SD = spread out.

Standard Deviation

How much the data typically varies from the mean

Variance = average squared deviation from mean. SD = √variance. Low SD: data points cluster tightly around the mean. High SD: data is spread widely. About 68% of data falls within 1 SD of the mean in a normal distribution (68-95-99.7 rule).

Basic Probability Rules

Probability: P(A and B) = P(A) × P(B) if independent. P(A or B) = P(A) + P(B) - P(A and B).

Basic Probability Rules

Two essential probability formulas — AND and OR

AND (both events occur): multiply probabilities if independent. P(heads AND heads) = 0.5 × 0.5 = 0.25. OR (at least one occurs): add probabilities, subtract the overlap. P(A or B) = P(A) + P(B) - P(A∩B). For mutually exclusive events: P(A or B) = P(A) + P(B).

Confidence Intervals

Confidence interval: estimate ± margin of error. Wider CI = less precise but more confident.

Confidence Intervals

A range of plausible values for a population parameter

95% CI means: if you repeated the study 100 times, about 95 of the intervals would contain the true population parameter. Wider interval = more confident but less precise. Increasing sample size narrows the interval without sacrificing confidence.

Correlation Coefficient

Correlation vs causation: r measures linear relationship strength, NOT cause and effect

Correlation Coefficient

What r tells you — and what it doesn't

r ranges from -1 to +1. r = 1: perfect positive linear relationship. r = -1: perfect negative. r = 0: no linear relationship. Strong correlation does NOT mean one variable causes the other. Always look for lurking variables (confounders).

Normal Distribution

Normal distribution: symmetric, bell-shaped. Mean = median = mode. Described by μ and σ.

Normal Distribution

The bell curve — the most important distribution in statistics

Perfectly symmetric around the mean. 68% of data within 1σ, 95% within 2σ, 99.7% within 3σ. Z-score = (x - μ)/σ converts any normal distribution to standard normal (μ=0, σ=1). Use z-table to find probabilities.

Chi-Square Test

Chi-square test: tests whether observed frequencies differ from expected frequencies

Chi-Square Test

Testing whether categorical data fits a pattern or shows an association

χ² = Σ(observed - expected)²/expected. Large χ² → observed data far from expected → more evidence against null hypothesis. Two uses: goodness-of-fit (does data fit a distribution?) and test of independence (are two categorical variables related?).

Linear Regression

Regression line: ŷ = b₀ + b₁x. Slope b₁ = change in y per unit change in x. Intercept b₀ = y when x=0.

Linear Regression

The line of best fit — predicting one variable from another

The regression line minimizes the sum of squared residuals (least squares). Slope: for each 1-unit increase in x, y changes by b₁ units. Only predict within the range of your data (don't extrapolate). R² = proportion of variation in y explained by x.

Stats mnemonics that make probability stick

Memory tricks