🔢 Math · Statistics

Stats mnemonics that make probability stick

Mean, median, mode, hypothesis testing — the concepts every stats student needs cold.

🔢 Statistics

Memory tricks

Proven mnemonics — fast to learn, hard to forget.

🔢 Statistics
"Reject if p is less than alpha"
Hypothesis Testing Decision Rule
When to reject the null hypothesis — always
If your p-value < α (usually 0.05), reject H₀. If p > α, fail to reject. You never "accept" H₀ — you only fail to reject it.
🔢 Statistics
68 · 95 · 99.7
Empirical Rule (Normal Distribution)
The empirical rule — the three numbers every stats student memorizes
68% of data falls within 1 SD. 95% within 2 SD. 99.7% within 3 SD. These three numbers cover virtually every normal distribution question.
🔢 Statistics
Type I = False Alarm. Type II = Missed Call.
Error Types
Type I and Type II errors — impossible to mix up
Type I (α): reject H₀ when it's true — a false alarm. Type II (β): fail to reject H₀ when it's false — a missed call. Think: crying wolf vs. ignoring the wolf.
🔢 Statistics
r closer to ±1 = stronger
Correlation Coefficient
Reading correlation: closer to 1 or −1 is stronger
r = +1 is perfect positive. r = −1 is perfect negative. r = 0 means no linear relationship. The closer to either extreme, the stronger the correlation.
Standard Deviation
Standard deviation = spread of data. Small SD = data clustered near mean. Large SD = spread out.
Standard Deviation
How much the data typically varies from the mean
Variance = average squared deviation from mean. SD = √variance. Low SD: data points cluster tightly around the mean. High SD: data is spread widely. About 68% of data falls within 1 SD of the mean in a normal distribution (68-95-99.7 rule).
Basic Probability Rules
Probability: P(A and B) = P(A) × P(B) if independent. P(A or B) = P(A) + P(B) - P(A and B).
Basic Probability Rules
Two essential probability formulas — AND and OR
AND (both events occur): multiply probabilities if independent. P(heads AND heads) = 0.5 × 0.5 = 0.25. OR (at least one occurs): add probabilities, subtract the overlap. P(A or B) = P(A) + P(B) - P(A∩B). For mutually exclusive events: P(A or B) = P(A) + P(B).
Confidence Intervals
Confidence interval: estimate ± margin of error. Wider CI = less precise but more confident.
Confidence Intervals
A range of plausible values for a population parameter
95% CI means: if you repeated the study 100 times, about 95 of the intervals would contain the true population parameter. Wider interval = more confident but less precise. Increasing sample size narrows the interval without sacrificing confidence.
Correlation Coefficient
Correlation vs causation: r measures linear relationship strength, NOT cause and effect
Correlation Coefficient
What r tells you — and what it doesn't
r ranges from -1 to +1. r = 1: perfect positive linear relationship. r = -1: perfect negative. r = 0: no linear relationship. Strong correlation does NOT mean one variable causes the other. Always look for lurking variables (confounders).
Normal Distribution
Normal distribution: symmetric, bell-shaped. Mean = median = mode. Described by μ and σ.
Normal Distribution
The bell curve — the most important distribution in statistics
Perfectly symmetric around the mean. 68% of data within 1σ, 95% within 2σ, 99.7% within 3σ. Z-score = (x - μ)/σ converts any normal distribution to standard normal (μ=0, σ=1). Use z-table to find probabilities.
Chi-Square Test
Chi-square test: tests whether observed frequencies differ from expected frequencies
Chi-Square Test
Testing whether categorical data fits a pattern or shows an association
χ² = Σ(observed - expected)²/expected. Large χ² → observed data far from expected → more evidence against null hypothesis. Two uses: goodness-of-fit (does data fit a distribution?) and test of independence (are two categorical variables related?).
Linear Regression
Regression line: ŷ = b₀ + b₁x. Slope b₁ = change in y per unit change in x. Intercept b₀ = y when x=0.
Linear Regression
The line of best fit — predicting one variable from another
The regression line minimizes the sum of squared residuals (least squares). Slope: for each 1-unit increase in x, y changes by b₁ units. Only predict within the range of your data (don't extrapolate). R² = proportion of variation in y explained by x.