Statistics and Probability

Statistics describes and interprets data. Probability measures how likely events are to occur. Together, these tools help us make sense of the world using numbers and evidence.

What You'll Learn

  • Classify and display data using appropriate graphs and charts
  • Calculate mean, median, mode, and range from a data set
  • Understand and apply theoretical probability
  • Calculate experimental probability from trials and compare it to theoretical probability
  • Use the complement rule: P(A') = 1 − P(A)
  • Read and construct two-way tables

IB Assessment Focus

Criterion A: Calculate averages and probabilities correctly in familiar and unfamiliar situations.

Criterion B: Discover patterns by comparing theoretical and experimental results across multiple trials.

Criterion C: Communicate statistical reasoning clearly, including correct notation (e.g., P(A) = 3/10).

Criterion D: Apply probability and statistics to real-world contexts (surveys, games, scientific experiments).

Key Vocabulary at a Glance

TermDefinition
Experimental probabilityP = (times event occurred) ÷ (total trials) — based on observed results
Theoretical probabilityP = (favourable outcomes) ÷ (total outcomes) — based on equally likely outcomes
Sample spaceThe set of all possible outcomes
Complementary eventThe event that A does NOT happen: P(A') = 1 − P(A)
Two-way tableA table showing the frequency of two categorical variables simultaneously
Relative frequencyFrequency expressed as a fraction or percentage of the total

Data Types & Display

Before analysing data, you must understand what type of data you have and choose the best way to display it.

Types of Data

TypeDescriptionExamples
DiscreteCountable, separate values (whole numbers only)Number of goals, number of students
ContinuousAny value within a range (measurable)Height, temperature, time
CategoricalNon-numerical, placed in groupsFavourite colour, genre of book

Choosing the Right Graph

Graph TypeBest Used For
Bar chartComparing discrete or categorical data
Pie chartShowing proportions of a whole
Line graphShowing trends over time (continuous data)
HistogramShowing the distribution of continuous data in intervals
Stem-and-leaf plotDisplaying all individual values while showing shape
Two-way tableComparing two categorical variables simultaneously

Two-Way Tables

A two-way table organises data by two variables at once. Each cell shows the frequency (count) for a combination.

Example: 30 students chose a sport and were categorised by gender.
FootballBasketballSwimmingTotal
Boys85215
Girls34815
Total1191030

P(student chose swimming) = 10/30 = 1/3 ≈ 33.3%

Key Point: Always check that the row and column totals add up correctly. The grand total in the bottom-right corner should equal both the sum of all row totals and the sum of all column totals.

Measures of Average

Measures of average (central tendency) summarise a data set with a single value. The range measures the spread of the data.

Min 10 Q1 25 Median 40 Q3 60 Max 75 IQR = Q3 − Q1 = 35
Box-and-whisker plot — the box spans the interquartile range (IQR); the line inside is the median
Mean
Mean = Sum of all valuesNumber of values
Range
Range = Largest value − Smallest value

Mean, Median, Mode, Range

MeasureHow to Find ItBest Used When
MeanAdd all values, divide by number of valuesData has no extreme outliers
MedianOrder the data; find the middle value (or average of two middle values)Data has outliers that would skew the mean
ModeThe value that appears most often (can be more than one)Categorical data or finding most popular item
RangeLargest value minus smallest valueMeasuring how spread out the data is
Example: Data set: 3, 7, 7, 8, 12, 14, 15
  • Mean = (3 + 7 + 7 + 8 + 12 + 14 + 15) ÷ 7 = 66 ÷ 7 ≈ 9.4
  • Median = 8 (the 4th value when ordered)
  • Mode = 7 (appears twice, more than any other value)
  • Range = 15 − 3 = 12

Finding the Median with an Even Number of Values

Example: Data set: 4, 6, 9, 11, 14, 20
Data is already ordered; 6 values (even) → median = average of 3rd and 4th
3rd value = 9, 4th value = 11
Median = (9 + 11) ÷ 2 = 10
Common Mistake: You MUST order the data first before finding the median. The median is not simply the middle-positioned number in the original list — it must be the middle value after ordering from smallest to largest.

Probability Basics

Theoretical probability is calculated by reasoning about equally likely outcomes, without carrying out an experiment.

Theoretical Probability
P(event) = number of favourable outcomestotal number of equally likely outcomes
  • Probability is always between 0 and 1: 0 ≤ P(A) ≤ 1
  • P(certain event) = 1 (will definitely happen)
  • P(impossible event) = 0 (cannot happen)
  • P(A) + P(A') = 1 (an event and its complement sum to 1)

The Complement Rule

Complement Rule
P(A') = 1 − P(A)    where A' means "A does NOT happen"
Example: A bag contains 4 red and 6 green balls. Find P(not red).
P(red) = 4/10 = 2/5
P(not red) = 1 − 2/5 = 3/5 = 0.6 = 60%

Sample Space and Listing Outcomes

The sample space is the complete list of all possible outcomes. For a fair six-sided die:

Sample space = {1, 2, 3, 4, 5, 6}
  • P(rolling a 4) = 1/6
  • P(rolling an even number) = 3/6 = 1/2 (outcomes: 2, 4, 6)
  • P(rolling a number greater than 4) = 2/6 = 1/3 (outcomes: 5, 6)

Probability on a Scale

ProbabilityDescriptionExample
0ImpossibleRolling a 7 on a standard die
0.1 – 0.3UnlikelyPicking a heart from a deck of cards (P = 1/4)
0.5Even chanceFlipping heads on a fair coin
0.7 – 0.9LikelyNot rolling a 6 on a die (P = 5/6)
1CertainRolling a number between 1 and 6 on a standard die

Experimental Probability

Experimental probability is based on the results of actual trials. It is also called relative frequency.

Experimental Probability
P(event) = number of times event occurredtotal number of trials

Comparing Experimental and Theoretical Probability

Key Principle:
  • Experimental and theoretical probabilities are not always equal
  • As the number of trials increases, experimental probability gets closer to theoretical probability
  • With a small number of trials, results can differ significantly from theoretical predictions
Example: A coin is flipped 100 times. It lands heads 47 times.

Experimental P(heads) = 47/100 = 0.47
Theoretical P(heads) = 1/2 = 0.50

The results are close but not exactly equal. The difference (0.03) is due to chance variation. If the experiment were repeated with 10,000 flips, the experimental probability would be expected to be much closer to 0.50.

Relative Frequency Tables

Example: A spinner is spun 40 times. Results:
ColourFrequencyRelative FrequencyTheoretical P
Red1212/40 = 0.301/4 = 0.25
Blue1111/40 = 0.2751/4 = 0.25
Green99/40 = 0.2251/4 = 0.25
Yellow88/40 = 0.201/4 = 0.25
Total401.001.00

The experimental results are close to the theoretical 0.25, but not identical after only 40 trials.

Critical Rule: Experimental probability and theoretical probability are not always equal — they converge as the number of trials increases. With a small number of trials, experimental results can differ significantly from theoretical predictions. This is called chance variation.

Worked Examples

These examples show the reasoning expected at Grade 7. Lay out your work clearly and state your method.

EXAMPLE 1A bag contains 3 red, 5 blue, and 2 green marbles. What is the probability of picking a blue marble?
+
Full Solution
Step 1: Find the total number of marbles = 3 + 5 + 2 = 10
Step 2: Identify the number of favourable outcomes (blue) = 5
Step 3: Apply the formula: P(blue) = 5/10 = 1/2 = 0.5 = 50%
EXAMPLE 2Find the mean, median, mode, and range of: 8, 3, 12, 8, 5, 10, 8
+
Full Solution
Ordered data: 3, 5, 8, 8, 8, 10, 12
Mean = (3+5+8+8+8+10+12) ÷ 7 = 54 ÷ 7 ≈ 7.7
Median = 4th value = 8 (middle of 7 values)
Mode = 8 (appears 3 times, more than any other)
Range = 12 − 3 = 9
EXAMPLE 3A fair six-sided die is rolled. Find: (a) P(even) (b) P(greater than 4) (c) P(not a 3)
+
Full Solution
Sample space = {1, 2, 3, 4, 5, 6}; total outcomes = 6

(a) Even numbers: {2, 4, 6} → P(even) = 3/6 = 1/2
(b) Greater than 4: {5, 6} → P(>4) = 2/6 = 1/3
(c) P(3) = 1/6, so P(not 3) = 1 − 1/6 = 5/6
EXAMPLE 4A coin is tossed 200 times and lands heads 94 times. Calculate the experimental probability and compare to theoretical probability.
+
Full Solution
Experimental P(heads) = 94/200 = 0.47
Theoretical P(heads) = 1/2 = 0.50

The experimental probability (0.47) is close to but not exactly equal to the theoretical probability (0.50). The difference of 0.03 is due to chance variation. With more trials, the experimental probability would be expected to get closer to 0.50.
EXAMPLE 5Use the two-way table to find (a) P(girl who swims) and (b) P(football player given they are a boy).
+
Full Solution
Using the earlier table (30 students: Boys: Football=8, Basketball=5, Swimming=2; Girls: Football=3, Basketball=4, Swimming=8):

(a) Girls who swim = 8 out of 30 total students
P(girl who swims) = 8/30 = 4/15 ≈ 0.267

(b) Boys who play football = 8; total boys = 15
P(football | boy) = 8/15 ≈ 0.533
EXAMPLE 6Scores: 14, 22, 19, 22, 31, 18, 22, 27. Find the mean and median. Which is the better measure of average and why?
+
Full Solution
Ordered: 14, 18, 19, 22, 22, 22, 27, 31
Mean = (14+18+19+22+22+22+27+31) ÷ 8 = 175 ÷ 8 = 21.875
Median = average of 4th and 5th values = (22+22)/2 = 22

The data has no extreme outliers, so the mean (21.875) is a suitable measure, but the median (22) is also very close. Both represent the data well. If there were an extreme value like 100, the median would be preferred as it is more resistant to outliers.
EXAMPLE 7A pack of 52 cards has 4 suits (13 cards each). A card is drawn at random. Find P(Ace) and P(not a heart).
+
Full Solution
P(Ace) = 4 aces out of 52 cards = 4/52 = 1/13 ≈ 0.077

P(heart) = 13/52 = 1/4
P(not a heart) = 1 − 1/4 = 3/4 = 0.75

Practice Q&A

Attempt each question before revealing the model answer. Show your working clearly.

CALCULATEFind the mean of: 15, 22, 18, 30, 10, 25
+
Model Answer
Mean = (15+22+18+30+10+25) ÷ 6 = 120 ÷ 6 = 20
CALCULATEFind the median of: 7, 14, 3, 9, 21, 6
+
Model Answer
Ordered: 3, 6, 7, 9, 14, 21 (6 values)
Median = (7 + 9) ÷ 2 = 16 ÷ 2 = 8
CALCULATEA bag has 4 red, 3 blue, and 5 yellow sweets. If one is picked at random, find P(yellow).
+
Model Answer
Total = 4 + 3 + 5 = 12 sweets.
P(yellow) = 5/12 ≈ 0.417 (41.7%)
CALCULATEUsing the bag above, find P(not blue).
+
Model Answer
P(blue) = 3/12 = 1/4
P(not blue) = 1 − 1/4 = 3/4 = 0.75 (75%)
DESCRIBEA die is rolled 60 times and a 6 appears 8 times. Describe the experimental probability and compare to theoretical probability.
+
Model Answer
Experimental P(6) = 8/60 = 2/15 ≈ 0.133
Theoretical P(6) = 1/6 ≈ 0.167
The experimental probability is lower than the theoretical. This difference (0.033) is due to chance variation. With more trials, experimental probability would converge towards 1/6.
IDENTIFYFind the mode of: 5, 3, 8, 5, 9, 2, 5, 3, 7, 3
+
Model Answer
Tally: 5 appears 3 times, 3 appears 3 times. Both 5 and 3 are modes. This dataset is bimodal with modes 3 and 5.
APPLYThe probability that it rains on any day in April is 0.35. What is the probability that it does NOT rain on a randomly chosen day in April?
+
Model Answer
P(rain) = 0.35
P(no rain) = 1 − 0.35 = 0.65
APPLYA student scored: 68, 75, 80, 68, 72 on five tests. The teacher says the typical score is 68. Is the teacher using the mean, median, or mode? Is this a fair summary?
+
Model Answer
The teacher is using the mode (68 appears twice).
Mean = (68+75+80+68+72) ÷ 5 = 363 ÷ 5 = 72.6
Median (ordered: 68, 68, 72, 75, 80) = 72
The mode of 68 is not the fairest summary — both the mean (72.6) and median (72) are higher and more representative of the student's actual performance. The teacher's use of the mode understates the student's ability.

Flashcard Review

Tap each card to reveal the answer. Try to answer from memory first.

What is theoretical probability?
P(event) = favourable outcomes ÷ total equally likely outcomes. Calculated by reasoning, not by experiment.
Tap to reveal
What is experimental probability?
P(event) = times event occurred ÷ total trials. Based on actual results from an experiment.
Tap to reveal
What is the complement rule?
P(A') = 1 − P(A). The probability that an event does NOT happen equals 1 minus the probability it does happen.
Tap to reveal
What is the sample space?
The complete set of all possible outcomes. For a fair die: {1, 2, 3, 4, 5, 6}.
Tap to reveal
How do you calculate the mean?
Add all values together, then divide by the number of values. Mean = sum ÷ count.
Tap to reveal
How do you find the median?
Order the data. The median is the middle value. For even number of values, average the two middle values.
Tap to reveal
What is the mode?
The value that appears most often in a data set. A data set can have no mode, one mode, or multiple modes (bimodal, multimodal).
Tap to reveal
What is the range?
The largest value minus the smallest value. It measures the spread of the data.
Tap to reveal
What is the probability range?
0 ≤ P(A) ≤ 1. Impossible events have P = 0. Certain events have P = 1. All probabilities are between these.
Tap to reveal
As the number of trials increases, what happens to experimental probability?
It gets closer to (converges toward) the theoretical probability. With very many trials, the two should be nearly equal.
Tap to reveal
What is a two-way table?
A table that organises data by two categorical variables simultaneously. Each cell shows the frequency for that combination.
Tap to reveal
What is relative frequency?
Frequency divided by total; expressed as a fraction, decimal, or percentage. Same as experimental probability.
Tap to reveal
When should you use the median instead of the mean?
When the data contains outliers (extreme values) that would distort the mean. The median is more resistant to outliers.
Tap to reveal
A bag has 3 red and 7 green balls. Find P(red).
Total = 10. P(red) = 3/10 = 0.3 = 30%.
Tap to reveal
What does it mean for a data set to be bimodal?
It has two modes — two values that both appear more often than any others with the same frequency.
Tap to reveal

Practice Test — 20 Questions

0Score / 20
Q 1 / 20
Correct
Wrong
Score