Statistics and Probability
Statistics describes and interprets data. Probability measures how likely events are to occur. Together, these tools help us make sense of the world using numbers and evidence.
What You'll Learn
- Classify and display data using appropriate graphs and charts
- Calculate mean, median, mode, and range from a data set
- Understand and apply theoretical probability
- Calculate experimental probability from trials and compare it to theoretical probability
- Use the complement rule: P(A') = 1 − P(A)
- Read and construct two-way tables
IB Assessment Focus
Criterion A: Calculate averages and probabilities correctly in familiar and unfamiliar situations.
Criterion B: Discover patterns by comparing theoretical and experimental results across multiple trials.
Criterion C: Communicate statistical reasoning clearly, including correct notation (e.g., P(A) = 3/10).
Criterion D: Apply probability and statistics to real-world contexts (surveys, games, scientific experiments).
Key Vocabulary at a Glance
| Term | Definition |
|---|---|
| Experimental probability | P = (times event occurred) ÷ (total trials) — based on observed results |
| Theoretical probability | P = (favourable outcomes) ÷ (total outcomes) — based on equally likely outcomes |
| Sample space | The set of all possible outcomes |
| Complementary event | The event that A does NOT happen: P(A') = 1 − P(A) |
| Two-way table | A table showing the frequency of two categorical variables simultaneously |
| Relative frequency | Frequency expressed as a fraction or percentage of the total |
Data Types & Display
Before analysing data, you must understand what type of data you have and choose the best way to display it.
Types of Data
| Type | Description | Examples |
|---|---|---|
| Discrete | Countable, separate values (whole numbers only) | Number of goals, number of students |
| Continuous | Any value within a range (measurable) | Height, temperature, time |
| Categorical | Non-numerical, placed in groups | Favourite colour, genre of book |
Choosing the Right Graph
| Graph Type | Best Used For |
|---|---|
| Bar chart | Comparing discrete or categorical data |
| Pie chart | Showing proportions of a whole |
| Line graph | Showing trends over time (continuous data) |
| Histogram | Showing the distribution of continuous data in intervals |
| Stem-and-leaf plot | Displaying all individual values while showing shape |
| Two-way table | Comparing two categorical variables simultaneously |
Two-Way Tables
A two-way table organises data by two variables at once. Each cell shows the frequency (count) for a combination.
| Football | Basketball | Swimming | Total | |
|---|---|---|---|---|
| Boys | 8 | 5 | 2 | 15 |
| Girls | 3 | 4 | 8 | 15 |
| Total | 11 | 9 | 10 | 30 |
P(student chose swimming) = 10/30 = 1/3 ≈ 33.3%
Measures of Average
Measures of average (central tendency) summarise a data set with a single value. The range measures the spread of the data.
Mean, Median, Mode, Range
| Measure | How to Find It | Best Used When |
|---|---|---|
| Mean | Add all values, divide by number of values | Data has no extreme outliers |
| Median | Order the data; find the middle value (or average of two middle values) | Data has outliers that would skew the mean |
| Mode | The value that appears most often (can be more than one) | Categorical data or finding most popular item |
| Range | Largest value minus smallest value | Measuring how spread out the data is |
- Mean = (3 + 7 + 7 + 8 + 12 + 14 + 15) ÷ 7 = 66 ÷ 7 ≈ 9.4
- Median = 8 (the 4th value when ordered)
- Mode = 7 (appears twice, more than any other value)
- Range = 15 − 3 = 12
Finding the Median with an Even Number of Values
Probability Basics
Theoretical probability is calculated by reasoning about equally likely outcomes, without carrying out an experiment.
- Probability is always between 0 and 1: 0 ≤ P(A) ≤ 1
- P(certain event) = 1 (will definitely happen)
- P(impossible event) = 0 (cannot happen)
- P(A) + P(A') = 1 (an event and its complement sum to 1)
The Complement Rule
P(red) = 4/10 = 2/5
P(not red) = 1 − 2/5 = 3/5 = 0.6 = 60%
Sample Space and Listing Outcomes
The sample space is the complete list of all possible outcomes. For a fair six-sided die:
- P(rolling a 4) = 1/6
- P(rolling an even number) = 3/6 = 1/2 (outcomes: 2, 4, 6)
- P(rolling a number greater than 4) = 2/6 = 1/3 (outcomes: 5, 6)
Probability on a Scale
| Probability | Description | Example |
|---|---|---|
| 0 | Impossible | Rolling a 7 on a standard die |
| 0.1 – 0.3 | Unlikely | Picking a heart from a deck of cards (P = 1/4) |
| 0.5 | Even chance | Flipping heads on a fair coin |
| 0.7 – 0.9 | Likely | Not rolling a 6 on a die (P = 5/6) |
| 1 | Certain | Rolling a number between 1 and 6 on a standard die |
Experimental Probability
Experimental probability is based on the results of actual trials. It is also called relative frequency.
Comparing Experimental and Theoretical Probability
- Experimental and theoretical probabilities are not always equal
- As the number of trials increases, experimental probability gets closer to theoretical probability
- With a small number of trials, results can differ significantly from theoretical predictions
Experimental P(heads) = 47/100 = 0.47
Theoretical P(heads) = 1/2 = 0.50
The results are close but not exactly equal. The difference (0.03) is due to chance variation. If the experiment were repeated with 10,000 flips, the experimental probability would be expected to be much closer to 0.50.
Relative Frequency Tables
| Colour | Frequency | Relative Frequency | Theoretical P |
|---|---|---|---|
| Red | 12 | 12/40 = 0.30 | 1/4 = 0.25 |
| Blue | 11 | 11/40 = 0.275 | 1/4 = 0.25 |
| Green | 9 | 9/40 = 0.225 | 1/4 = 0.25 |
| Yellow | 8 | 8/40 = 0.20 | 1/4 = 0.25 |
| Total | 40 | 1.00 | 1.00 |
The experimental results are close to the theoretical 0.25, but not identical after only 40 trials.
Worked Examples
These examples show the reasoning expected at Grade 7. Lay out your work clearly and state your method.
Step 2: Identify the number of favourable outcomes (blue) = 5
Step 3: Apply the formula: P(blue) = 5/10 = 1/2 = 0.5 = 50%
Mean = (3+5+8+8+8+10+12) ÷ 7 = 54 ÷ 7 ≈ 7.7
Median = 4th value = 8 (middle of 7 values)
Mode = 8 (appears 3 times, more than any other)
Range = 12 − 3 = 9
(a) Even numbers: {2, 4, 6} → P(even) = 3/6 = 1/2
(b) Greater than 4: {5, 6} → P(>4) = 2/6 = 1/3
(c) P(3) = 1/6, so P(not 3) = 1 − 1/6 = 5/6
Theoretical P(heads) = 1/2 = 0.50
The experimental probability (0.47) is close to but not exactly equal to the theoretical probability (0.50). The difference of 0.03 is due to chance variation. With more trials, the experimental probability would be expected to get closer to 0.50.
(a) Girls who swim = 8 out of 30 total students
P(girl who swims) = 8/30 = 4/15 ≈ 0.267
(b) Boys who play football = 8; total boys = 15
P(football | boy) = 8/15 ≈ 0.533
Mean = (14+18+19+22+22+22+27+31) ÷ 8 = 175 ÷ 8 = 21.875
Median = average of 4th and 5th values = (22+22)/2 = 22
The data has no extreme outliers, so the mean (21.875) is a suitable measure, but the median (22) is also very close. Both represent the data well. If there were an extreme value like 100, the median would be preferred as it is more resistant to outliers.
P(heart) = 13/52 = 1/4
P(not a heart) = 1 − 1/4 = 3/4 = 0.75
Practice Q&A
Attempt each question before revealing the model answer. Show your working clearly.
Median = (7 + 9) ÷ 2 = 16 ÷ 2 = 8
P(yellow) = 5/12 ≈ 0.417 (41.7%)
P(not blue) = 1 − 1/4 = 3/4 = 0.75 (75%)
Theoretical P(6) = 1/6 ≈ 0.167
The experimental probability is lower than the theoretical. This difference (0.033) is due to chance variation. With more trials, experimental probability would converge towards 1/6.
P(no rain) = 1 − 0.35 = 0.65
Mean = (68+75+80+68+72) ÷ 5 = 363 ÷ 5 = 72.6
Median (ordered: 68, 68, 72, 75, 80) = 72
The mode of 68 is not the fairest summary — both the mean (72.6) and median (72) are higher and more representative of the student's actual performance. The teacher's use of the mode understates the student's ability.
Flashcard Review
Tap each card to reveal the answer. Try to answer from memory first.