Statistics

Comparing & Analysing Data Sets

1

Summary Statistics Vocabulary

Draw a line from each measure to its correct definition.

Mean
Median
Mode
Range
Interquartile range (IQR)
The sum of all values divided by the number of values
The middle value when data is arranged in order
The most frequently occurring value
The difference between the maximum and minimum values
The difference between the upper and lower quartiles
2

Calculate Summary Statistics

Calculate the mean, median, mode and range for each data set.

Test scores: 72, 85, 91, 68, 85, 77, 90, 62, 85, 74 Mean = ___ Median = ___ Mode = ___ Range = ___

Daily temperatures (deg C): 18, 22, 25, 30, 22, 19, 21, 28, 22, 25 Mean = ___ Median = ___ Mode = ___ Range = ___

3

Comparing Two Data Sets

Use the statistics provided to compare two classes.

Class A: Mean = 72, Median = 74, Range = 40, IQR = 18 Class B: Mean = 71, Median = 73, Range = 20, IQR = 10 Which class performed more consistently? Justify using the statistics.

Which statistic (mean or median) is more useful when comparing these classes? Explain why.

4

Shape of a Distribution

Sort each description into the correct distribution shape column.

Mean > Median > Mode
Mean approx Median approx Mode
Mode > Median > Mean
Long tail to the right
Long tail to the left
Bell-shaped curve
Symmetric / Normal
Positively Skewed
Negatively Skewed
5

Identifying Outliers

An outlier is a data value far from the rest of the data.

Data: 12, 14, 15, 13, 16, 14, 15, 47, 13, 14 Identify the outlier: ___ Mean WITH outlier = ___ Mean WITHOUT outlier = ___

How does the outlier affect the mean? Which measure (mean or median) is more resistant to outliers? Explain.

6

Back-to-Back Stem-and-Leaf Plot

Interpret the back-to-back stem-and-leaf plot showing sprint times (seconds) for two athletes.

Athlete A | Stem | Athlete B 9 8 7 | 11 | 2 3 5 6 5 3 2 | 12 | 1 4 7 4 1 | 13 | 0 3 Median for Athlete A = ___ Median for Athlete B = ___ Who is the faster athlete overall? Justify your answer.

7

Sampling and Conclusions

Read each scenario and answer the questions.

A school surveys 20 students near the tuck shop at lunch about their favourite food. The results are used to plan the canteen menu for all 800 students. Describe TWO problems with this sampling method.

A news headline says 8 out of 10 dentists recommend Brand X toothpaste. What question would you ask before trusting this statistic?

8

Real Data Investigation

Collect and analyse your own data.

  • 1Record the daily maximum temperature in your area for 10 days. Calculate the mean, median, mode and range. Does the distribution appear symmetric or skewed?
  • 2Compare two types of cereal using nutritional labels (sugar per 100 g). Use statistics to decide which is healthier.
  • 3Survey 10 family members or friends: How many hours of screen time did you have yesterday? Compare two groups (e.g. under 18 vs adults) using a back-to-back stem-and-leaf plot.
9

Back-to-Back Stem-and-Leaf Plots -- Building One

Create a back-to-back stem-and-leaf plot to display both data sets on the same diagram.

Class A scores: 52, 67, 71, 73, 75, 78, 82, 85, 88, 91 Class B scores: 44, 58, 63, 67, 70, 72, 79, 83, 86, 90 Draw the back-to-back stem-and-leaf plot using stems 4, 5, 6, 7, 8, 9:

Draw here

Compare the two classes: which has the higher median? Which is more spread out?

10

Five-Number Summary

Calculate the five-number summary (minimum, Q1, median, Q3, maximum) for each data set.

Data: 5, 8, 10, 12, 14, 16, 18, 22, 24, 30 Min = ___ Q1 = ___ Median = ___ Q3 = ___ Max = ___

Data: 31, 35, 40, 42, 44, 48, 50, 53, 58, 60 Min = ___ Q1 = ___ Median = ___ Q3 = ___ Max = ___

11

IQR and Outlier Detection

Use the IQR rule: a value is an outlier if it is below Q1 - 1.5 x IQR or above Q3 + 1.5 x IQR.

Data set: 10, 12, 14, 15, 16, 18, 20, 22, 55 Q1 = 12.5, Q3 = 21, IQR = 8.5 Lower fence = Q1 - 1.5 x IQR = ___ Upper fence = Q3 + 1.5 x IQR = ___ Is 55 an outlier? ___

Explain how you would report a data set to someone if there is one outlier: should you include it or remove it? Give a reason for each choice.

12

Comparing Spread Using Standard Deviation Concept

Standard deviation measures how spread out values are from the mean. A low standard deviation means values are clustered closely; a high standard deviation means they are spread out.

Data A: 48, 50, 50, 51, 51 (mean = 50). Data B: 30, 40, 50, 60, 70 (mean = 50). Which data set has the higher standard deviation? Explain without calculating.

Two athletes run 100 m five times each. Athlete X times (seconds): 12.1, 12.0, 11.9, 12.2, 12.0. Athlete Y times: 11.5, 12.8, 11.9, 13.1, 11.7. Which athlete is more consistent? Justify using spread.

13

Choosing the Best Summary Statistic

Circle the most appropriate measure for each situation.

A data set contains one very large outlier. The best measure of centre is:

Mean
Median
Mode

To measure spread that is not affected by extreme values, use:

Range
IQR
Standard deviation

To find the most common shoe size sold in a store, use:

Mean
Median
Mode

To compare how spread out two data sets are overall, use:

Range or IQR
Median
Mean
15

Interpret Box Plots

Answer questions based on box plots for two schools.

School A: Min=40, Q1=55, Median=65, Q3=78, Max=95. School B: Min=50, Q1=62, Median=70, Q3=80, Max=90. Which school has the higher median score?

Calculate the IQR for each school. Which school has greater spread in the middle 50% of scores?

School A has a larger range than School B. What does this tell you?

Which school performed more consistently? Justify using at least two statistics.

TipWhen comparing box plots, always comment on centre (median), spread (IQR and range), and any outliers.
19

Calculating IQR Step by Step

Show full working for finding the IQR.

Data: 4, 7, 8, 10, 12, 15, 18, 21, 25, 30. Sort and find Q1, Q3 and IQR.

Does this data set contain any outliers? Use the IQR rule to check.

Create your own data set of 8 values with an IQR of exactly 10. Show your working.

TipWhen there are an even number of values, split the list exactly in half and find the median of each half.
23

Back-to-Back Stem-and-Leaf — Comparison Writing

Write a full statistical comparison using this back-to-back stem-and-leaf plot.

Group X | Stem | Group Y 8 7 5 | 2 | 1 4 6 9 6 4 3 | 3 | 2 5 8 7 5 2 | 4 | 0 3 7 1 | 5 | 2 5 Find the median for each group.

Find the range for each group.

Write a paragraph comparing the two groups using median, range and shape.

TipA full comparison mentions centre, spread, shape and any unusual features.
26

Box Plot Construction

Construct a box plot from the given data.

Data: 15, 18, 22, 25, 28, 30, 32, 35, 40, 45. Find the five-number summary.

Construct a box plot for this data on the number line below.

Draw here

Describe the shape of this distribution based on the box plot.

TipUse the five-number summary as the skeleton, then draw the box and whiskers to scale.
27

Data Display — Match to Purpose

Match each display to its best use.

Back-to-back stem-and-leaf plot
Side-by-side box plots
Dot plot
Histogram
Compare two small data sets in detail
Compare centre and spread of two large data sets
Show individual values for small data sets
Show frequency distribution of continuous data
TipChoosing the right data display is part of effective statistical communication.
28

Real Data — Australian Weather

Use the following monthly rainfall data (mm) to answer questions.

City A rainfall (mm): 45, 52, 38, 60, 90, 120, 130, 125, 95, 70, 55, 48. Calculate mean, median and range.

City B rainfall: 80, 85, 78, 82, 79, 83, 80, 82, 81, 79, 80, 84. Calculate mean, median and range.

Which city has more consistent rainfall? Which has more seasonal variation? Justify using statistics.

TipReal data is rarely perfect — it may have gaps, rounding, or outliers.
31

Evaluating Statistical Claims

Read each statistical claim and identify possible problems.

'9 out of 10 customers prefer Brand X.' Describe two questions you would ask before trusting this claim.

A school reports its average NAPLAN score improved this year. A student says the outlier scores of high achievers probably raised the mean. Explain how you would investigate this claim.

An internet poll asks visitors to a news website to vote on a political issue. Why might the results not represent the general population?

TipBeing a critical consumer of statistics is an important life skill — always ask WHO collected the data, HOW, and WHY.
34

Comparing Spreads — Standard Deviation Concept

Explore what standard deviation tells us about spread.

Class A test scores: 70, 72, 68, 71, 69. Class B: 55, 80, 65, 90, 60. Both have mean = 70. Which class has greater spread? Explain.

Without calculating, which data set has a higher standard deviation? Data X: 10, 10, 10, 10 or Data Y: 5, 8, 12, 15? Justify.

In science, what would it mean if repeated measurements of the same object had a very high standard deviation?

TipStandard deviation is the most commonly used measure of spread in science, medicine, and economics.
35

Centre and Spread — Circle the Best Description

Circle the correct description.

A data set where most values are clustered near the mean has:

Low spread
High spread
Large IQR
Many outliers

Which measure of spread is most resistant to outliers?

IQR
Range
Standard deviation
Mean absolute deviation

Two distributions have the same median but different IQRs. The one with the larger IQR is:

More spread out in the middle
More concentrated
The same as the other
Always skewed

A box plot with very long whiskers compared to the box suggests:

High variability in the tails
Low spread
Symmetric distribution
All values are close to the median
37

Full Comparison Task

Write a full statistical comparison of two data sets.

Team A points per game: 85, 88, 92, 78, 95, 100, 82, 75, 98, 91. Team B: 70, 72, 74, 76, 78, 80, 82, 84, 86, 88. Calculate mean, median, range and IQR for each team.

Write a full statistical comparison paragraph. Include comments on centre, spread, shape and consistency.

TipAim to write 4–6 sentences that each make a specific, evidence-based claim.
40

Interpreting Box Plots — Written Analysis

Describe what each box plot tells you.

A box plot shows: Min=10, Q1=20, Median=22, Q3=40, Max=90. Describe the distribution's shape and comment on the likely presence of outliers.

Compare two box plots: A (Min=5, Q1=15, Median=20, Q3=25, Max=35) and B (Min=5, Q1=10, Median=20, Q3=30, Max=35). Which has greater spread? Is there a difference in centre?

TipBox plots are used in medicine, education, and business to compare groups at a glance.
41

Steps for Comparing Two Data Sets

Sort these steps into the correct order for a full statistical comparison.

Calculate summary statistics for each data set
Write a comparison paragraph with named statistics
Identify any outliers and decide whether to investigate them
Describe the shape of each distribution
Step 1
Step 2
Step 3
Step 4
TipA systematic approach ensures you never miss an important comparison.
42

Outliers — Effect on Statistics

Investigate how removing an outlier changes summary statistics.

Data: 23, 25, 26, 27, 28, 29, 30, 31, 32, 85. Calculate mean and median WITH the outlier (85).

Remove 85 and recalculate mean and median.

How did removing the outlier change each measure? Which changed more — mean or median?

Give a real-world example where you would include an outlier in your analysis and one where you would exclude it.

TipAlways calculate statistics both with and without outliers and report both values.
45

Distribution Shape — Real Contexts

For each context, predict the likely shape of the distribution and explain why.

The annual salary of all employees in a large company (including the CEO). What shape? Why?

The age at which people in Australia learn to ride a bike. What shape? Why?

The number of minutes students study each day. What shape? Why?

The heights of adult women in Australia. What shape? Why?

TipConnecting statistics to real contexts develops deep understanding.
48

Designing a Statistical Investigation

Plan and describe a statistical investigation comparing two groups.

State a question you could investigate by comparing two groups (e.g. Year 9 boys vs Year 9 girls, two Australian cities).

Describe how you would collect your data. What is your sample? How will you ensure it is representative?

Which summary statistics and displays will you use? Justify your choices.

What conclusions might you reach? What limitations should you acknowledge?

TipGood investigations have a clear question, appropriate sampling, and honest reporting of limitations.
49

Statistical Investigation at Home

Collect real data and perform a comparison.

  • 1Record how long it takes you to fall asleep each night for two weeks. Split the data into school nights vs weekend nights and compare the distributions using median, range and IQR.
  • 2Compare the nutritional information (kilojoules per 100 g) of two types of snack food (e.g. chips vs biscuits) across 10 brands each. Calculate summary statistics and create side-by-side box plots.
  • 3Visit bom.gov.au and download one month of daily temperature data for two cities. Compare using five-number summaries.
  • 4Compare your reading speed on two different types of text (fiction vs non-fiction) over 10 timed sessions each. Are the distributions similar?
  • 5Ask family members to estimate the length of one minute without looking at a clock. Compare estimates for adults vs children using back-to-back stem-and-leaf plots.
52

Statistical Comparison — Exam Style

Write a full comparison using these summary statistics.

Group A: Mean=72, Median=75, IQR=14, Range=38. Group B: Mean=72, Median=68, IQR=28, Range=50. Write a paragraph comparing the two groups. Mention centre, spread, shape, and which group you would say performed better.

TipIn exams, always name both groups explicitly and link each statistic to a conclusion.
54

Choosing Summary Statistics — Justification Task

For each scenario, state which statistics you would use and why.

You want to compare the test performance of two classes fairly. Which statistics would you report and why?

You want to decide which suburb has more affordable house prices. Which statistic is better — mean or median? Why?

You want to describe the consistency of a factory's production quality. Which measure of spread would you choose?

TipThe choice of statistics is as important as calculating them correctly.
56

Statistics Review — Circle the Correct Answer

Circle the best answer for each question.

Which display best compares two small data sets at the individual value level?

Back-to-back stem-and-leaf plot
Histogram
Pie chart
Line graph

Which measure of spread is best for skewed data?

IQR
Standard deviation
Range
Mean

A data set has IQR = 12. A value 20 below Q1 is:

Likely an outlier (> 1.5 × IQR below Q1)
Not an outlier
The minimum value
Equal to the median

When mean > median, the distribution is:

Positively skewed
Negatively skewed
Symmetric
Bimodal
TipThese review questions cover the full range of this worksheet's content.
57

Comparing Data Sets — Your Own Investigation

Collect, display and compare your own two data sets.

State your investigation question (comparing two groups).

Collect at least 10 values for each group and record them here.

Draw here

Calculate mean, median, range and IQR for each group.

Create a back-to-back stem-and-leaf plot or side-by-side box plots.

Draw here

Write a statistical conclusion answering your investigation question.

TipThe best statistical investigations start with a genuine question you find interesting.
62

Misleading Statistics in the Media

Analyse each statistical claim.

'Crime fell by 50% last year — from 2 cases to 1.' Why is this claim potentially misleading?

A graph shows home prices rising sharply, but the y-axis starts at $700 000 rather than $0. How does this affect the impression the graph gives?

Find or make up your own example of a misleading statistical claim. Explain why it is misleading and how to fix it.

TipCritical thinking about statistics is one of the most valuable life skills you can develop.
64

Matching Summary Statistics to Context

For each context, identify which summary statistic is most appropriate and justify.

You are reporting the 'typical' weekly wage in Australia. Choose mean or median and justify.

A quality control engineer checks the consistency of a production line. Which measure of spread would be most useful?

A teacher wants to report on the 'most common' mark in a class test. Which measure is most appropriate?

A weather bureau wants to describe how variable summer temperatures are in Darwin compared to Hobart. Which statistic should they compare?

TipThe choice of statistic shapes the story being told — always choose thoughtfully.
65

Statistical Reasoning — Circle the Best Answer

Circle the most statistically correct answer.

When reporting house prices to give buyers a realistic expectation:

Use median — less affected by very expensive properties
Use mean — easier to calculate
Use mode — most common price
Use range — shows all prices

A factory has target output = 500 units/day. They want to check consistency. Use:

IQR or standard deviation to measure spread
Mode
Median alone
Minimum value

The most complete comparison of two data sets includes:

Centre, spread, shape, and any outliers
Only the mean
Only a graph
The maximum and minimum

Which survey design is MOST likely to produce a biased result?

A voluntary online survey
Randomly selecting names from a school roll
Stratified random sampling
Systematic sampling from a complete list
66

Comparing Data — Extended Investigation

Conduct a statistical investigation comparing two groups.

Choose two groups to compare (examples: Year 9 maths scores vs Year 8, sleep duration on school vs weekend nights). State your question clearly.

Collect at least 15 data values per group. Write them here.

Draw here

Calculate the five-number summary for each group.

Create appropriate displays (back-to-back stem-and-leaf or side-by-side box plots).

Draw here

Write a 5-sentence statistical conclusion comparing centre, spread, shape, and any outliers.

TipAn extended investigation typically takes 1–2 hours and produces a rich piece of mathematical writing.
67

Statistics Terminology — Final Match

Match each term to its definition.

Population
Sample
Representative sample
Voluntary response sample
Sampling bias
A sample where every subgroup is proportionally included
The entire group being studied
Bias that arises when the sample does not represent the population
A subset selected from the population
A sample where participants choose to respond
TipA strong statistical vocabulary allows you to communicate findings precisely.
69

Statistics — Peer Teaching Task

Explain key concepts as if teaching a Year 7 student.

Explain what the median is and why it is sometimes better than the mean. Use a simple example.

Explain what the IQR measures and how to calculate it. Use the data set 5, 8, 10, 12, 15 as an example.

Explain what distribution shape means and give an example of each shape.

TipTeaching someone else is the deepest form of learning — it reveals gaps in your own understanding.
72

Year 9 Statistics — Self-Assessment

Reflect on everything you have learned in this worksheet.

List five statistical concepts you now understand well.

List two concepts you found challenging. What will you do to improve?

How has your understanding of data comparison changed through this worksheet?

Where might you use data comparison skills in your future study or career?

TipHonest self-assessment is the starting point for effective study planning.
74

Dot Plot Construction and Interpretation

Use dot plots to display and compare data.

Data: 12, 14, 14, 15, 16, 16, 16, 17, 18, 20. Create a dot plot for this data.

Draw here

Identify any clusters, gaps, or outliers in the dot plot.

Find the median and mode directly from the dot plot.

How does the dot plot help you see distribution shape more clearly than a list of numbers?

TipDot plots are especially useful for noticing clusters, gaps and outliers.
78

Reading a Histogram

Interpret a frequency histogram for exam scores.

A histogram has bars: 40–50 (3 students), 50–60 (8 students), 60–70 (14 students), 70–80 (10 students), 80–90 (4 students), 90–100 (1 student). What is the modal class interval?

How many students scored below 70?

Describe the shape of this distribution.

Can you calculate the exact mean from a histogram? Explain.

TipHistograms show the shape of the distribution most clearly when the data is large.
80

Statistics and Society

Explore how statistics influence decisions that affect people.

Describe a situation where statistics were used to make an important public health decision (e.g. vaccine rollout, road safety policy).

How could a government use statistics about unemployment to support two opposite political arguments?

Why is it important for citizens to understand basic statistics?

TipStatistical literacy is a form of civic power — it allows you to evaluate the claims of politicians, businesses, and media.
81

Statistical Thinking — Circle the Correct Approach

Circle the most statistically sound approach.

To decide which of two teaching methods is more effective:

Compare class median scores using randomised groups
Ask teachers which method they prefer
Use the higher mean regardless of sample size
Choose the method with the larger sample

An outlier is found in a data set. You should:

Investigate its cause before deciding to include or exclude it
Always remove it to improve the mean
Always keep it to show the full range
Ignore it completely

When comparing two skewed data sets, the best pair of statistics is:

Median and IQR
Mean and standard deviation
Mode and range
Min and max

Which measure best describes the 'typical' value in a symmetric distribution?

Mean and median (they are approximately equal)
Mode only
Range
IQR
82

Statistics — Written Test Preparation

Answer these exam-style questions with full written responses.

Data set A: 45, 50, 52, 54, 56, 60, 62, 65, 70, 90. Data set B: 48, 50, 51, 53, 55, 57, 59, 61, 63, 65. Calculate mean, median, IQR and range for each. Show all working.

Write a paragraph comparing the two data sets. Mention centre, spread, shape and any outliers.

TipIn tests, always use statistical language precisely and support claims with specific values.
85

Statistics — Create Your Own Questions

Write and solve your own statistics questions at three different levels.

Write a FOUNDATIONAL question about finding the median from a list of values. Include your answer.

Write a DEVELOPING question involving comparing two data sets using IQR. Include your answer.

Write an EXTENDING question involving identifying outliers using the IQR rule and discussing their effect on statistics. Include a full solution.

TipCreating your own questions is the highest form of mathematical understanding.
88

Statistics from a Frequency Table

Calculate summary statistics from the frequency table.

Hours of TV | Freq: 0→4, 1→7, 2→9, 3→5, 4→3, 5→2. Total: 30 students. Calculate the mean hours of TV watched.

Find the median hours of TV watched.

What is the mode?

Describe the shape of this distribution.

TipFind the total number of values (sum of frequencies) first — this tells you what n is.
90

Types of Data — Match to Description

Match each data type to its description.

Categorical (nominal)
Categorical (ordinal)
Numerical (discrete)
Numerical (continuous)
Data that can take any value in a range (e.g. height)
Data with categories that have a natural order (e.g. survey ratings)
Data with distinct, countable values (e.g. number of siblings)
Data with categories but no natural order (e.g. eye colour)
TipKnowing whether data is categorical or numerical determines which statistics and displays you can use.
91

Classifying Data and Choosing Statistics

Classify each variable and choose appropriate statistics.

For each variable, state whether it is categorical or numerical, and whether numerical data is discrete or continuous: (a) Number of pets; (b) Favourite sport; (c) Height in cm; (d) Survey satisfaction rating 1–5.

Which summary statistics would you calculate for numerical data? Which would you use for categorical data?

TipNumerical data has statistics like mean and median; categorical data uses counts and percentages.
93

Statistics — Final Reflection

Reflect on your statistical learning journey.

Explain in your own words why we need more than just the mean to compare two data sets.

Describe a real-world situation where you could apply data comparison skills.

What is the most important idea you learned in this worksheet? Why?

What questions do you still have about statistics? What would you like to explore next?

TipDeep learners always connect new concepts back to big ideas and think about how to use what they have learned.
95

Measures of Spread — Which Is Correct?

Circle the correct statement.

The range is:

Maximum − minimum
Q3 − Q1
Mean − median
Q3 + Q1

The IQR is:

Q3 − Q1
Max − min
The middle value
Mean ÷ median

Which measure of spread is MOST affected by outliers?

Range
IQR
Both equally
Neither