Statistics — Measures of Spread and Probability

At Year 4 Advanced, statistical analysis moves beyond describing data to evaluating the reliability and limitations of statistical measures, and using formal probability rules to solve multi-step problems. You must justify which measure of spread is appropriate and why.

What You'll Learn

  • Calculate and interpret range, interquartile range (IQR), variance, and standard deviation
  • Evaluate the strengths and limitations of each measure of spread
  • Apply the addition rule (general and mutually exclusive) for probability
  • Apply the multiplication rule for independent and dependent events
  • Construct and interpret tree diagrams and Venn diagrams
  • Distinguish between theoretical and experimental probability

IB Assessment Focus

Criterion A: Select appropriate statistical measures and probability rules for multi-step, unfamiliar problems.

Criterion B: Prove why variance uses squared deviations; prove the addition rule using a Venn diagram argument.

Criterion C: Communicate statistical conclusions with correct notation; interpret findings in context.

Criterion D: Evaluate the reasonableness of statistical conclusions; discuss limitations of sample size and data collection.

Key Vocabulary

TermDefinition
Variance (σ²)The mean of the squared deviations from the mean; measures spread in squared units
Standard deviation (σ)The square root of variance; measures spread in the original units of the data
Interquartile range (IQR)Q3 − Q1; the spread of the middle 50% of data; resistant to outliers
Mutually exclusive eventsEvents that cannot occur at the same time; P(A ∩ B) = 0
Independent eventsEvents where the outcome of one does not affect the probability of the other
P(A ∩ B)Probability of both A AND B occurring simultaneously
P(A ∪ B)Probability of A OR B (or both) occurring
Complementary eventP(A') = 1 − P(A); the probability that A does NOT occur

Measures of Spread

A measure of spread describes how dispersed or clustered data values are around the centre. At Year 4, you must choose the appropriate measure and justify your choice.

Range

Range
Range = Maximum value − Minimum value
Strength: Simple to calculate and easy to interpret.
Limitation: Heavily influenced by outliers; uses only two data values and ignores the distribution of all others.

Interquartile Range (IQR)

IQR
IQR = Q3 − Q1
  1. Order the data from smallest to largest.
  2. Find the median (Q2) — the middle value.
  3. Q1 = median of the lower half; Q3 = median of the upper half.
  4. IQR = Q3 − Q1.
Strength: Not affected by outliers; measures the spread of the middle 50% of data — more representative than the range.
Limitation: Ignores the extreme 25% at each end; less sensitive to changes in the bulk of the data.

Comparing Measures of Spread

MeasureFormulaUse whenLimitation
RangeMax − MinQuick comparison, no outliersDistorted by outliers
IQRQ3 − Q1Data has outliers or skewed distributionIgnores extreme values
VarianceΣ(x − μ)² / nFurther statistical work neededUnits are squared; harder to interpret
Standard deviation√VarianceMost statistical contexts; normally distributed dataSensitive to outliers; assumes symmetry
Criterion D Tip: Always state why you chose a particular measure of spread, referencing the data's distribution. If the data has outliers, justify using IQR over standard deviation. Saying "I used the IQR because the data contains outliers that would inflate the standard deviation" earns marks for mathematical reasoning.

Variance and Standard Deviation

Standard deviation is the most powerful measure of spread. Understanding why we square deviations — not just add them — requires mathematical justification at Year 4 Advanced.

Population Variance and Standard Deviation
σ² = Σ(x − μ)²n      σ = √Σ(x − μ)²n
−2σ −σ μ +2σ 68% 95% 95%
Normal distribution — 68% of data within ±1σ, 95% within ±2σ, 99.7% within ±3σ

Why We Square Deviations

Justification: If we simply added (x − μ) without squaring, positive and negative deviations would cancel out and the sum would always equal zero — giving no useful information about spread. Squaring ensures all deviations are positive, and larger deviations are weighted more heavily. Taking the square root at the end restores the units to the original scale.

Calculating Standard Deviation — Step by Step

Example: Dataset: 3, 7, 7, 9, 4. Find σ.
Worked Example — Standard Deviation
μ = 3 + 7 + 7 + 9 + 45 = 305 = 6find the mean
Deviations: −3, 1, 1, 3, −2subtract μ from each value
Squared: 9, 1, 1, 9, 4square each deviation
σ² = 9+1+1+9+45 = 245 = 4.8variance = mean of squares
σ = √4.8standard deviation = √variance
σ ≈ 2.19

Interpreting Standard Deviation

ScenarioInterpretation
Small σ (close to 0)Data is clustered tightly around the mean; low variability
Large σData is spread widely around the mean; high variability
Comparing two datasetsThe dataset with smaller σ is more consistent
Common Mistake: Forgetting to square root at the end gives variance, not standard deviation. Standard deviation and the original data share the same units; variance uses squared units (e.g., cm² vs cm).

Probability Rules

At Year 4, probability moves from single events to combined events using formal rules. You must identify which rule applies and justify your selection.

Fundamental Rules

Addition Rule — General (A and B can both occur)
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
Addition Rule — Mutually Exclusive (A and B cannot both occur)
P(A ∪ B) = P(A) + P(B)     [since P(A ∩ B) = 0]
Multiplication Rule — Independent Events
P(A ∩ B) = P(A) × P(B)
Complementary Events
P(A') = 1 − P(A)

Mutually Exclusive vs Independent

ConceptMeaningExampleKey formula
Mutually exclusiveCannot happen simultaneouslyRolling a 3 AND a 5 on one dieP(A ∩ B) = 0
IndependentOne outcome does not affect the otherFlipping heads AND rolling a 6P(A ∩ B) = P(A) × P(B)
Critical Rule: For the general addition rule, you must subtract P(A ∩ B) to avoid counting the overlap twice. If you apply the mutually exclusive formula when events can both occur, your probability may exceed 1 — which is impossible. Always check whether events can occur simultaneously.
Example: P(A) = 0.4, P(B) = 0.5, P(A ∩ B) = 0.2. Find P(A ∪ B).
  1. Events A and B can both occur (P(A ∩ B) ≠ 0), so use the general addition rule.
  2. P(A ∪ B) = 0.4 + 0.5 − 0.2 = 0.7.
  3. Sanity check: 0.7 ≤ 1 ✓; also P(A ∪ B) ≥ P(A) and P(B) ✓

Combined Events — Tree Diagrams and Venn Diagrams

Tree diagrams and Venn diagrams are tools for organising information about combined events. At Year 4, you must use them to solve multi-step probability problems and interpret results critically.

Tree Diagrams

Key rules for tree diagrams:
  • Each branch shows a possible outcome with its probability labelled.
  • Probabilities along a branch multiply (AND rule).
  • To find P(A OR B), add the probabilities of relevant end branches.
  • All probabilities at each branch point must sum to 1.
Example: A bag has 3 red and 2 blue balls. Two balls are drawn without replacement. Find P(both red).
  1. First draw: P(Red) = 3/5, P(Blue) = 2/5.
  2. Second draw (given first was red): P(Red | Red) = 2/4 = 1/2 (only 2 red left, 4 balls total).
  3. P(both red) = P(R) × P(R|R) = 3/5 × 1/2 = 3/10.

Venn Diagrams

Reading a Venn diagram:
  • The intersection (overlap) = P(A ∩ B).
  • A only (excluding overlap) = P(A) − P(A ∩ B).
  • Outside both circles = P(neither) = 1 − P(A ∪ B).
  • Total of all regions must sum to 1.

Theoretical vs Experimental Probability

TypeDefinitionStrengthLimitation
TheoreticalBased on equally likely outcomes: P = favourable / totalExact; no need to experimentAssumes ideal conditions; may not match reality
ExperimentalBased on observed results: P ≈ frequency / total trialsReflects real-world resultsVaries with sample size; never exact
Law of Large Numbers: As the number of trials increases, experimental probability approaches theoretical probability. This means a small sample is unreliable — a key limitation to address in Criterion D.

Worked Examples

Multi-step solutions showing the reasoning expected at Year 4 Advanced.

EXAMPLE 1Calculate the standard deviation of 2, 5, 5, 7, 11. Show all working.
+
Full Solution
Step 1 — Mean: μ = (2+5+5+7+11)/5 = 30/5 = 6.

Step 2 — Deviations from mean: −4, −1, −1, 1, 5.

Step 3 — Squared deviations: 16, 1, 1, 1, 25.

Step 4 — Variance: σ² = (16+1+1+1+25)/5 = 44/5 = 8.8.

Step 5 — Standard deviation: σ = √8.8 ≈ 2.97.

Interpretation: On average, data values are approximately 2.97 units from the mean of 6.
EXAMPLE 2P(A) = 0.6, P(B) = 0.4, P(A ∩ B) = 0.24. Are A and B independent? Find P(A ∪ B).
+
Full Solution
Testing independence: If A and B are independent, P(A ∩ B) = P(A) × P(B) = 0.6 × 0.4 = 0.24.
Since the given P(A ∩ B) = 0.24 matches, A and B are independent.

P(A ∪ B): Using the general addition rule (events are not mutually exclusive since P(A ∩ B) ≠ 0):
P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = 0.6 + 0.4 − 0.24 = 0.76.

Check: P(neither) = 1 − 0.76 = 0.24. Since P(A') = 0.4 and P(B') = 0.6, P(A' ∩ B') = 0.4 × 0.6 = 0.24 ✓ (independence confirmed).
EXAMPLE 3Two datasets: A = {5, 5, 5, 5, 5} and B = {1, 3, 5, 7, 9}. Both have mean 5. Compare their standard deviations and interpret.
+
Full Solution
Dataset A: All values equal the mean, so all deviations = 0. σ = 0.

Dataset B: Deviations: −4, −2, 0, 2, 4. Squared: 16, 4, 0, 4, 16. Variance = 40/5 = 8. σ = √8 ≈ 2.83.

Interpretation: Both datasets have the same mean (5), but A has zero spread — every value is identical — while B has a standard deviation of approximately 2.83, indicating values typically differ from the mean by about 2.83 units. This demonstrates that the mean alone does not adequately describe a dataset; a measure of spread is essential for a complete picture.
EXAMPLE 4A coin is flipped and a die is rolled. Find: (a) P(head AND 6), (b) P(head OR 6).
+
Full Solution
The coin and die are independent events.

(a) P(head AND 6): Use multiplication rule for independent events.
P(H ∩ 6) = P(H) × P(6) = 1/2 × 1/6 = 1/12.

(b) P(head OR 6): These events CAN both occur, so use the general addition rule.
P(H ∪ 6) = P(H) + P(6) − P(H ∩ 6) = 1/2 + 1/6 − 1/12 = 6/12 + 2/12 − 1/12 = 7/12.

Verify: Count directly: 12 equally likely outcomes. Outcomes with head: {H1, H2, H3, H4, H5, H6} = 6. Add outcomes with 6 not already counted: {T6} = 1. Total = 7. P = 7/12 ✓
EXAMPLE 5Evaluate which measure of spread is more appropriate for this dataset: exam scores 45, 78, 80, 82, 85, 88, 90, 93, 95, 98.
+
Full Solution
The dataset contains an outlier: 45 is far below the rest of the values (which cluster between 78 and 98).

Range: 98 − 45 = 53. This is inflated by the outlier (45) and does not reflect the spread of the majority of scores.

IQR: Ordered data has 10 values. Q1 = (80+82)/2 = 81, Q3 = (90+93)/2 = 91.5. IQR = 91.5 − 81 = 10.5.

Evaluation: The IQR (10.5) is the more appropriate measure because it focuses on the middle 50% and is not affected by the outlier score of 45. The range of 53 misleadingly suggests high variability when in fact 9 of the 10 scores are within a narrow 20-point band. However, the IQR does not communicate anything about the outlier — a complete analysis should note the outlier separately.
EXAMPLE 6A bag contains 4 red and 3 green balls. Two are drawn without replacement. Find P(one of each colour).
+
Full Solution
There are two ways to get one of each: Red then Green, OR Green then Red.

P(R then G): P(R) × P(G|R) = 4/7 × 3/6 = 12/42 = 2/7.

P(G then R): P(G) × P(R|G) = 3/7 × 4/6 = 12/42 = 2/7.

These are mutually exclusive outcomes, so:
P(one of each) = 2/7 + 2/7 = 4/7.

Verify: Total ways to choose 2 from 7 = C(7,2) = 21. Ways to choose 1 red from 4 AND 1 green from 3 = 4 × 3 = 12. P = 12/21 = 4/7 ✓
EXAMPLE 7In a class of 30 students, 18 study French, 14 study Spanish, and 7 study both. Find P(a student studies neither).
+
Full Solution
Use the general addition rule to find P(French OR Spanish), then subtract from 1.

P(F) = 18/30, P(S) = 14/30, P(F ∩ S) = 7/30.

P(F ∪ S) = 18/30 + 14/30 − 7/30 = 25/30 = 5/6.

P(neither) = 1 − 5/6 = 1/6.

Verify: Students in French only = 18 − 7 = 11; Spanish only = 14 − 7 = 7; Both = 7; Total = 11 + 7 + 7 = 25. Neither = 30 − 25 = 5. P = 5/30 = 1/6 ✓

Practice Q&A

Attempt each question before revealing the model answer. Focus on justifying your method and evaluating your answer in context.

CALCULATEFind the IQR of: 3, 7, 8, 12, 14, 19, 21, 25.
+
Model Answer
8 values. Lower half: 3, 7, 8, 12 → Q1 = (7+8)/2 = 7.5. Upper half: 14, 19, 21, 25 → Q3 = (19+21)/2 = 20. IQR = 20 − 7.5 = 12.5.
PROVEWhy does summing (x − μ) without squaring always equal zero?
+
Model Answer
Σ(x − μ) = Σx − nμ. Since μ = Σx / n, we have nμ = Σx. Therefore Σ(x − μ) = Σx − Σx = 0. ∎

This is why we square deviations — squaring removes sign and gives a useful, non-zero measure of total spread.
APPLYP(A) = 0.3, P(B) = 0.5. If A and B are mutually exclusive, find P(A ∪ B).
+
Model Answer
Mutually exclusive: P(A ∩ B) = 0. Therefore P(A ∪ B) = P(A) + P(B) = 0.3 + 0.5 = 0.8.
EVALUATEDataset: {1, 2, 3, 100}. Discuss which measure of spread best represents this data.
+
Model Answer
Range = 99 (dominated entirely by the outlier 100). Standard deviation ≈ 43.2 (also distorted). IQR: Q1 = 1.5, Q3 = 51.5, IQR = 50 (also distorted by 100 being in the upper half). For this dataset, none of the standard measures fully captures the picture. The most honest approach is to report the median and IQR, note the outlier (100) separately, and acknowledge that a dataset of only 4 values is too small for reliable statistical inference.
PROBABILITYA spinner has sections numbered 1–8. What is P(even OR greater than 5)?
+
Model Answer
Even numbers: {2, 4, 6, 8}. P(even) = 4/8. Greater than 5: {6, 7, 8}. P(>5) = 3/8. Numbers that are both even AND >5: {6, 8}. P(both) = 2/8.
P(even ∪ >5) = 4/8 + 3/8 − 2/8 = 5/8.
Verify: {2, 4, 6, 7, 8} = 5 numbers. P = 5/8 ✓
TREE DIAGRAMA test has two parts. P(pass Part 1) = 0.8. If pass Part 1, P(pass Part 2) = 0.9. If fail Part 1, P(pass Part 2) = 0.4. Find P(pass both parts).
+
Model Answer
P(pass both) = P(pass 1) × P(pass 2 | pass 1) = 0.8 × 0.9 = 0.72.
Note: the events are NOT independent (the probability of passing Part 2 depends on whether Part 1 was passed), so we use the conditional multiplication.
COMPARETeam A's scores over a season: mean = 70, σ = 15. Team B's scores: mean = 70, σ = 3. Which team is more consistent and what are the implications?
+
Model Answer
Both teams have the same mean (70), but Team B has a much smaller standard deviation (3 vs 15). Team B is far more consistent — their scores typically fall within 3 points of 70. Team A is highly variable — scores may range widely. Implication: Team B is more predictable and reliable; Team A might produce both very high and very low scores. In practice, consistency (small σ) is often as important as the mean performance level.
DISCUSSA student conducts 20 coin flips and gets 14 heads. They claim "the coin is biased." Evaluate this claim.
+
Model Answer
Experimental probability of heads = 14/20 = 0.7, compared to theoretical probability of 0.5 for a fair coin. However, 20 trials is a very small sample. The law of large numbers states that experimental probability only reliably approaches theoretical probability with a large number of trials. Getting 14/20 heads can occur by chance even with a fair coin. To evaluate bias, the student would need hundreds of flips and a formal statistical test (e.g., a chi-squared test or binomial test). Conclusion: The claim is premature — the result is consistent with normal variation in a small sample.

Flashcard Review

Tap each card to reveal the answer. Try to answer from memory first.

State the formula for population standard deviation.
σ = √(Σ(x − μ)² / n)
Take the square root of the mean of squared deviations from the mean.
Tap to reveal
Why do we square deviations when calculating variance?
To prevent positive and negative deviations from cancelling to zero, and to weight larger deviations more heavily. Without squaring, Σ(x − μ) = 0 always.
Tap to reveal
State the general addition rule for probability.
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
Must subtract the overlap to avoid double-counting.
Tap to reveal
What does mutually exclusive mean?
Events cannot occur simultaneously. P(A ∩ B) = 0.
Addition rule simplifies to: P(A ∪ B) = P(A) + P(B).
Tap to reveal
Multiplication rule for independent events?
P(A ∩ B) = P(A) × P(B)
Valid only when the outcome of A does NOT affect B.
Tap to reveal
What is the IQR and how do you calculate it?
IQR = Q3 − Q1. It measures the spread of the middle 50% of data. Less sensitive to outliers than range or standard deviation.
Tap to reveal
What is the complementary event rule?
P(A') = 1 − P(A)
The probability that A does NOT occur equals 1 minus the probability it does.
Tap to reveal
Strength and limitation of standard deviation.
Strength: uses all data values; most powerful measure of spread. Limitation: sensitive to outliers; assumes data is roughly symmetric.
Tap to reveal
What is the law of large numbers?
As the number of trials increases, experimental probability approaches theoretical probability. Small samples are unreliable.
Tap to reveal
Difference between variance and standard deviation.
Variance (σ²) = mean of squared deviations — units are squared. Standard deviation (σ) = √variance — units match the original data.
Tap to reveal
In a Venn diagram, what does the intersection represent?
P(A ∩ B) — the probability that BOTH events A and B occur simultaneously.
Tap to reveal
How do you know if two events are independent?
Test: P(A ∩ B) = P(A) × P(B). If this equality holds, the events are independent.
Tap to reveal
When should you use IQR instead of standard deviation?
When data is skewed or contains outliers. IQR is resistant to extreme values; standard deviation is inflated by them.
Tap to reveal
In a tree diagram, how do you find the probability of a compound event?
Multiply the probabilities along the branches (AND). To find P(A OR B), add the probabilities of the relevant end branches.
Tap to reveal
What does a standard deviation of 0 mean?
All data values are identical (equal to the mean). There is no spread — zero variability in the dataset.
Tap to reveal

Practice Test — 20 Questions

0Score / 20
Q 1 / 20
Correct
Wrong
Score