Statistics — Measures of Spread and Probability
At Year 4 Advanced, statistical analysis moves beyond describing data to evaluating the reliability and limitations of statistical measures, and using formal probability rules to solve multi-step problems. You must justify which measure of spread is appropriate and why.
What You'll Learn
- Calculate and interpret range, interquartile range (IQR), variance, and standard deviation
- Evaluate the strengths and limitations of each measure of spread
- Apply the addition rule (general and mutually exclusive) for probability
- Apply the multiplication rule for independent and dependent events
- Construct and interpret tree diagrams and Venn diagrams
- Distinguish between theoretical and experimental probability
IB Assessment Focus
Criterion A: Select appropriate statistical measures and probability rules for multi-step, unfamiliar problems.
Criterion B: Prove why variance uses squared deviations; prove the addition rule using a Venn diagram argument.
Criterion C: Communicate statistical conclusions with correct notation; interpret findings in context.
Criterion D: Evaluate the reasonableness of statistical conclusions; discuss limitations of sample size and data collection.
Key Vocabulary
| Term | Definition |
|---|---|
| Variance (σ²) | The mean of the squared deviations from the mean; measures spread in squared units |
| Standard deviation (σ) | The square root of variance; measures spread in the original units of the data |
| Interquartile range (IQR) | Q3 − Q1; the spread of the middle 50% of data; resistant to outliers |
| Mutually exclusive events | Events that cannot occur at the same time; P(A ∩ B) = 0 |
| Independent events | Events where the outcome of one does not affect the probability of the other |
| P(A ∩ B) | Probability of both A AND B occurring simultaneously |
| P(A ∪ B) | Probability of A OR B (or both) occurring |
| Complementary event | P(A') = 1 − P(A); the probability that A does NOT occur |
Measures of Spread
A measure of spread describes how dispersed or clustered data values are around the centre. At Year 4, you must choose the appropriate measure and justify your choice.
Range
Limitation: Heavily influenced by outliers; uses only two data values and ignores the distribution of all others.
Interquartile Range (IQR)
- Order the data from smallest to largest.
- Find the median (Q2) — the middle value.
- Q1 = median of the lower half; Q3 = median of the upper half.
- IQR = Q3 − Q1.
Limitation: Ignores the extreme 25% at each end; less sensitive to changes in the bulk of the data.
Comparing Measures of Spread
| Measure | Formula | Use when | Limitation |
|---|---|---|---|
| Range | Max − Min | Quick comparison, no outliers | Distorted by outliers |
| IQR | Q3 − Q1 | Data has outliers or skewed distribution | Ignores extreme values |
| Variance | Σ(x − μ)² / n | Further statistical work needed | Units are squared; harder to interpret |
| Standard deviation | √Variance | Most statistical contexts; normally distributed data | Sensitive to outliers; assumes symmetry |
Variance and Standard Deviation
Standard deviation is the most powerful measure of spread. Understanding why we square deviations — not just add them — requires mathematical justification at Year 4 Advanced.
Why We Square Deviations
Calculating Standard Deviation — Step by Step
Interpreting Standard Deviation
| Scenario | Interpretation |
|---|---|
| Small σ (close to 0) | Data is clustered tightly around the mean; low variability |
| Large σ | Data is spread widely around the mean; high variability |
| Comparing two datasets | The dataset with smaller σ is more consistent |
Probability Rules
At Year 4, probability moves from single events to combined events using formal rules. You must identify which rule applies and justify your selection.
Fundamental Rules
Mutually Exclusive vs Independent
| Concept | Meaning | Example | Key formula |
|---|---|---|---|
| Mutually exclusive | Cannot happen simultaneously | Rolling a 3 AND a 5 on one die | P(A ∩ B) = 0 |
| Independent | One outcome does not affect the other | Flipping heads AND rolling a 6 | P(A ∩ B) = P(A) × P(B) |
- Events A and B can both occur (P(A ∩ B) ≠ 0), so use the general addition rule.
- P(A ∪ B) = 0.4 + 0.5 − 0.2 = 0.7.
- Sanity check: 0.7 ≤ 1 ✓; also P(A ∪ B) ≥ P(A) and P(B) ✓
Combined Events — Tree Diagrams and Venn Diagrams
Tree diagrams and Venn diagrams are tools for organising information about combined events. At Year 4, you must use them to solve multi-step probability problems and interpret results critically.
Tree Diagrams
- Each branch shows a possible outcome with its probability labelled.
- Probabilities along a branch multiply (AND rule).
- To find P(A OR B), add the probabilities of relevant end branches.
- All probabilities at each branch point must sum to 1.
- First draw: P(Red) = 3/5, P(Blue) = 2/5.
- Second draw (given first was red): P(Red | Red) = 2/4 = 1/2 (only 2 red left, 4 balls total).
- P(both red) = P(R) × P(R|R) = 3/5 × 1/2 = 3/10.
Venn Diagrams
- The intersection (overlap) = P(A ∩ B).
- A only (excluding overlap) = P(A) − P(A ∩ B).
- Outside both circles = P(neither) = 1 − P(A ∪ B).
- Total of all regions must sum to 1.
Theoretical vs Experimental Probability
| Type | Definition | Strength | Limitation |
|---|---|---|---|
| Theoretical | Based on equally likely outcomes: P = favourable / total | Exact; no need to experiment | Assumes ideal conditions; may not match reality |
| Experimental | Based on observed results: P ≈ frequency / total trials | Reflects real-world results | Varies with sample size; never exact |
Worked Examples
Multi-step solutions showing the reasoning expected at Year 4 Advanced.
Step 2 — Deviations from mean: −4, −1, −1, 1, 5.
Step 3 — Squared deviations: 16, 1, 1, 1, 25.
Step 4 — Variance: σ² = (16+1+1+1+25)/5 = 44/5 = 8.8.
Step 5 — Standard deviation: σ = √8.8 ≈ 2.97.
Interpretation: On average, data values are approximately 2.97 units from the mean of 6.
Since the given P(A ∩ B) = 0.24 matches, A and B are independent.
P(A ∪ B): Using the general addition rule (events are not mutually exclusive since P(A ∩ B) ≠ 0):
P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = 0.6 + 0.4 − 0.24 = 0.76.
Check: P(neither) = 1 − 0.76 = 0.24. Since P(A') = 0.4 and P(B') = 0.6, P(A' ∩ B') = 0.4 × 0.6 = 0.24 ✓ (independence confirmed).
Dataset B: Deviations: −4, −2, 0, 2, 4. Squared: 16, 4, 0, 4, 16. Variance = 40/5 = 8. σ = √8 ≈ 2.83.
Interpretation: Both datasets have the same mean (5), but A has zero spread — every value is identical — while B has a standard deviation of approximately 2.83, indicating values typically differ from the mean by about 2.83 units. This demonstrates that the mean alone does not adequately describe a dataset; a measure of spread is essential for a complete picture.
(a) P(head AND 6): Use multiplication rule for independent events.
P(H ∩ 6) = P(H) × P(6) = 1/2 × 1/6 = 1/12.
(b) P(head OR 6): These events CAN both occur, so use the general addition rule.
P(H ∪ 6) = P(H) + P(6) − P(H ∩ 6) = 1/2 + 1/6 − 1/12 = 6/12 + 2/12 − 1/12 = 7/12.
Verify: Count directly: 12 equally likely outcomes. Outcomes with head: {H1, H2, H3, H4, H5, H6} = 6. Add outcomes with 6 not already counted: {T6} = 1. Total = 7. P = 7/12 ✓
Range: 98 − 45 = 53. This is inflated by the outlier (45) and does not reflect the spread of the majority of scores.
IQR: Ordered data has 10 values. Q1 = (80+82)/2 = 81, Q3 = (90+93)/2 = 91.5. IQR = 91.5 − 81 = 10.5.
Evaluation: The IQR (10.5) is the more appropriate measure because it focuses on the middle 50% and is not affected by the outlier score of 45. The range of 53 misleadingly suggests high variability when in fact 9 of the 10 scores are within a narrow 20-point band. However, the IQR does not communicate anything about the outlier — a complete analysis should note the outlier separately.
P(R then G): P(R) × P(G|R) = 4/7 × 3/6 = 12/42 = 2/7.
P(G then R): P(G) × P(R|G) = 3/7 × 4/6 = 12/42 = 2/7.
These are mutually exclusive outcomes, so:
P(one of each) = 2/7 + 2/7 = 4/7.
Verify: Total ways to choose 2 from 7 = C(7,2) = 21. Ways to choose 1 red from 4 AND 1 green from 3 = 4 × 3 = 12. P = 12/21 = 4/7 ✓
P(F) = 18/30, P(S) = 14/30, P(F ∩ S) = 7/30.
P(F ∪ S) = 18/30 + 14/30 − 7/30 = 25/30 = 5/6.
P(neither) = 1 − 5/6 = 1/6.
Verify: Students in French only = 18 − 7 = 11; Spanish only = 14 − 7 = 7; Both = 7; Total = 11 + 7 + 7 = 25. Neither = 30 − 25 = 5. P = 5/30 = 1/6 ✓
Practice Q&A
Attempt each question before revealing the model answer. Focus on justifying your method and evaluating your answer in context.
This is why we square deviations — squaring removes sign and gives a useful, non-zero measure of total spread.
P(even ∪ >5) = 4/8 + 3/8 − 2/8 = 5/8.
Verify: {2, 4, 6, 7, 8} = 5 numbers. P = 5/8 ✓
Note: the events are NOT independent (the probability of passing Part 2 depends on whether Part 1 was passed), so we use the conditional multiplication.
Flashcard Review
Tap each card to reveal the answer. Try to answer from memory first.
Take the square root of the mean of squared deviations from the mean.
Must subtract the overlap to avoid double-counting.
Addition rule simplifies to: P(A ∪ B) = P(A) + P(B).
Valid only when the outcome of A does NOT affect B.
The probability that A does NOT occur equals 1 minus the probability it does.