Comparing & Analysing Data Sets
Summary Statistics Vocabulary
Draw a line from each measure to its correct definition.
Calculate Summary Statistics
Calculate the mean, median, mode and range for each data set.
Test scores: 72, 85, 91, 68, 85, 77, 90, 62, 85, 74 Mean = ___ Median = ___ Mode = ___ Range = ___
Daily temperatures (deg C): 18, 22, 25, 30, 22, 19, 21, 28, 22, 25 Mean = ___ Median = ___ Mode = ___ Range = ___
Comparing Two Data Sets
Use the statistics provided to compare two classes.
Class A: Mean = 72, Median = 74, Range = 40, IQR = 18 Class B: Mean = 71, Median = 73, Range = 20, IQR = 10 Which class performed more consistently? Justify using the statistics.
Which statistic (mean or median) is more useful when comparing these classes? Explain why.
Shape of a Distribution
Sort each description into the correct distribution shape column.
Identifying Outliers
An outlier is a data value far from the rest of the data.
Data: 12, 14, 15, 13, 16, 14, 15, 47, 13, 14 Identify the outlier: ___ Mean WITH outlier = ___ Mean WITHOUT outlier = ___
How does the outlier affect the mean? Which measure (mean or median) is more resistant to outliers? Explain.
Back-to-Back Stem-and-Leaf Plot
Interpret the back-to-back stem-and-leaf plot showing sprint times (seconds) for two athletes.
Athlete A | Stem | Athlete B 9 8 7 | 11 | 2 3 5 6 5 3 2 | 12 | 1 4 7 4 1 | 13 | 0 3 Median for Athlete A = ___ Median for Athlete B = ___ Who is the faster athlete overall? Justify your answer.
Sampling and Conclusions
Read each scenario and answer the questions.
A school surveys 20 students near the tuck shop at lunch about their favourite food. The results are used to plan the canteen menu for all 800 students. Describe TWO problems with this sampling method.
A news headline says 8 out of 10 dentists recommend Brand X toothpaste. What question would you ask before trusting this statistic?
Real Data Investigation
Collect and analyse your own data.
- 1Record the daily maximum temperature in your area for 10 days. Calculate the mean, median, mode and range. Does the distribution appear symmetric or skewed?
- 2Compare two types of cereal using nutritional labels (sugar per 100 g). Use statistics to decide which is healthier.
- 3Survey 10 family members or friends: How many hours of screen time did you have yesterday? Compare two groups (e.g. under 18 vs adults) using a back-to-back stem-and-leaf plot.
Back-to-Back Stem-and-Leaf Plots -- Building One
Create a back-to-back stem-and-leaf plot to display both data sets on the same diagram.
Class A scores: 52, 67, 71, 73, 75, 78, 82, 85, 88, 91 Class B scores: 44, 58, 63, 67, 70, 72, 79, 83, 86, 90 Draw the back-to-back stem-and-leaf plot using stems 4, 5, 6, 7, 8, 9:
Compare the two classes: which has the higher median? Which is more spread out?
Five-Number Summary
Calculate the five-number summary (minimum, Q1, median, Q3, maximum) for each data set.
Data: 5, 8, 10, 12, 14, 16, 18, 22, 24, 30 Min = ___ Q1 = ___ Median = ___ Q3 = ___ Max = ___
Data: 31, 35, 40, 42, 44, 48, 50, 53, 58, 60 Min = ___ Q1 = ___ Median = ___ Q3 = ___ Max = ___
IQR and Outlier Detection
Use the IQR rule: a value is an outlier if it is below Q1 - 1.5 x IQR or above Q3 + 1.5 x IQR.
Data set: 10, 12, 14, 15, 16, 18, 20, 22, 55 Q1 = 12.5, Q3 = 21, IQR = 8.5 Lower fence = Q1 - 1.5 x IQR = ___ Upper fence = Q3 + 1.5 x IQR = ___ Is 55 an outlier? ___
Explain how you would report a data set to someone if there is one outlier: should you include it or remove it? Give a reason for each choice.
Comparing Spread Using Standard Deviation Concept
Standard deviation measures how spread out values are from the mean. A low standard deviation means values are clustered closely; a high standard deviation means they are spread out.
Data A: 48, 50, 50, 51, 51 (mean = 50). Data B: 30, 40, 50, 60, 70 (mean = 50). Which data set has the higher standard deviation? Explain without calculating.
Two athletes run 100 m five times each. Athlete X times (seconds): 12.1, 12.0, 11.9, 12.2, 12.0. Athlete Y times: 11.5, 12.8, 11.9, 13.1, 11.7. Which athlete is more consistent? Justify using spread.
Choosing the Best Summary Statistic
Circle the most appropriate measure for each situation.
A data set contains one very large outlier. The best measure of centre is:
To measure spread that is not affected by extreme values, use:
To find the most common shoe size sold in a store, use:
To compare how spread out two data sets are overall, use:
Interpret Box Plots
Answer questions based on box plots for two schools.
School A: Min=40, Q1=55, Median=65, Q3=78, Max=95. School B: Min=50, Q1=62, Median=70, Q3=80, Max=90. Which school has the higher median score?
Calculate the IQR for each school. Which school has greater spread in the middle 50% of scores?
School A has a larger range than School B. What does this tell you?
Which school performed more consistently? Justify using at least two statistics.
Calculating IQR Step by Step
Show full working for finding the IQR.
Data: 4, 7, 8, 10, 12, 15, 18, 21, 25, 30. Sort and find Q1, Q3 and IQR.
Does this data set contain any outliers? Use the IQR rule to check.
Create your own data set of 8 values with an IQR of exactly 10. Show your working.
Back-to-Back Stem-and-Leaf — Comparison Writing
Write a full statistical comparison using this back-to-back stem-and-leaf plot.
Group X | Stem | Group Y 8 7 5 | 2 | 1 4 6 9 6 4 3 | 3 | 2 5 8 7 5 2 | 4 | 0 3 7 1 | 5 | 2 5 Find the median for each group.
Find the range for each group.
Write a paragraph comparing the two groups using median, range and shape.
Box Plot Construction
Construct a box plot from the given data.
Data: 15, 18, 22, 25, 28, 30, 32, 35, 40, 45. Find the five-number summary.
Construct a box plot for this data on the number line below.
Describe the shape of this distribution based on the box plot.
Data Display — Match to Purpose
Match each display to its best use.
Real Data — Australian Weather
Use the following monthly rainfall data (mm) to answer questions.
City A rainfall (mm): 45, 52, 38, 60, 90, 120, 130, 125, 95, 70, 55, 48. Calculate mean, median and range.
City B rainfall: 80, 85, 78, 82, 79, 83, 80, 82, 81, 79, 80, 84. Calculate mean, median and range.
Which city has more consistent rainfall? Which has more seasonal variation? Justify using statistics.
Evaluating Statistical Claims
Read each statistical claim and identify possible problems.
'9 out of 10 customers prefer Brand X.' Describe two questions you would ask before trusting this claim.
A school reports its average NAPLAN score improved this year. A student says the outlier scores of high achievers probably raised the mean. Explain how you would investigate this claim.
An internet poll asks visitors to a news website to vote on a political issue. Why might the results not represent the general population?
Comparing Spreads — Standard Deviation Concept
Explore what standard deviation tells us about spread.
Class A test scores: 70, 72, 68, 71, 69. Class B: 55, 80, 65, 90, 60. Both have mean = 70. Which class has greater spread? Explain.
Without calculating, which data set has a higher standard deviation? Data X: 10, 10, 10, 10 or Data Y: 5, 8, 12, 15? Justify.
In science, what would it mean if repeated measurements of the same object had a very high standard deviation?
Centre and Spread — Circle the Best Description
Circle the correct description.
A data set where most values are clustered near the mean has:
Which measure of spread is most resistant to outliers?
Two distributions have the same median but different IQRs. The one with the larger IQR is:
A box plot with very long whiskers compared to the box suggests:
Full Comparison Task
Write a full statistical comparison of two data sets.
Team A points per game: 85, 88, 92, 78, 95, 100, 82, 75, 98, 91. Team B: 70, 72, 74, 76, 78, 80, 82, 84, 86, 88. Calculate mean, median, range and IQR for each team.
Write a full statistical comparison paragraph. Include comments on centre, spread, shape and consistency.
Interpreting Box Plots — Written Analysis
Describe what each box plot tells you.
A box plot shows: Min=10, Q1=20, Median=22, Q3=40, Max=90. Describe the distribution's shape and comment on the likely presence of outliers.
Compare two box plots: A (Min=5, Q1=15, Median=20, Q3=25, Max=35) and B (Min=5, Q1=10, Median=20, Q3=30, Max=35). Which has greater spread? Is there a difference in centre?
Steps for Comparing Two Data Sets
Sort these steps into the correct order for a full statistical comparison.
Outliers — Effect on Statistics
Investigate how removing an outlier changes summary statistics.
Data: 23, 25, 26, 27, 28, 29, 30, 31, 32, 85. Calculate mean and median WITH the outlier (85).
Remove 85 and recalculate mean and median.
How did removing the outlier change each measure? Which changed more — mean or median?
Give a real-world example where you would include an outlier in your analysis and one where you would exclude it.
Distribution Shape — Real Contexts
For each context, predict the likely shape of the distribution and explain why.
The annual salary of all employees in a large company (including the CEO). What shape? Why?
The age at which people in Australia learn to ride a bike. What shape? Why?
The number of minutes students study each day. What shape? Why?
The heights of adult women in Australia. What shape? Why?
Designing a Statistical Investigation
Plan and describe a statistical investigation comparing two groups.
State a question you could investigate by comparing two groups (e.g. Year 9 boys vs Year 9 girls, two Australian cities).
Describe how you would collect your data. What is your sample? How will you ensure it is representative?
Which summary statistics and displays will you use? Justify your choices.
What conclusions might you reach? What limitations should you acknowledge?
Statistical Investigation at Home
Collect real data and perform a comparison.
- 1Record how long it takes you to fall asleep each night for two weeks. Split the data into school nights vs weekend nights and compare the distributions using median, range and IQR.
- 2Compare the nutritional information (kilojoules per 100 g) of two types of snack food (e.g. chips vs biscuits) across 10 brands each. Calculate summary statistics and create side-by-side box plots.
- 3Visit bom.gov.au and download one month of daily temperature data for two cities. Compare using five-number summaries.
- 4Compare your reading speed on two different types of text (fiction vs non-fiction) over 10 timed sessions each. Are the distributions similar?
- 5Ask family members to estimate the length of one minute without looking at a clock. Compare estimates for adults vs children using back-to-back stem-and-leaf plots.
Statistical Comparison — Exam Style
Write a full comparison using these summary statistics.
Group A: Mean=72, Median=75, IQR=14, Range=38. Group B: Mean=72, Median=68, IQR=28, Range=50. Write a paragraph comparing the two groups. Mention centre, spread, shape, and which group you would say performed better.
Choosing Summary Statistics — Justification Task
For each scenario, state which statistics you would use and why.
You want to compare the test performance of two classes fairly. Which statistics would you report and why?
You want to decide which suburb has more affordable house prices. Which statistic is better — mean or median? Why?
You want to describe the consistency of a factory's production quality. Which measure of spread would you choose?
Statistics Review — Circle the Correct Answer
Circle the best answer for each question.
Which display best compares two small data sets at the individual value level?
Which measure of spread is best for skewed data?
A data set has IQR = 12. A value 20 below Q1 is:
When mean > median, the distribution is:
Comparing Data Sets — Your Own Investigation
Collect, display and compare your own two data sets.
State your investigation question (comparing two groups).
Collect at least 10 values for each group and record them here.
Calculate mean, median, range and IQR for each group.
Create a back-to-back stem-and-leaf plot or side-by-side box plots.
Write a statistical conclusion answering your investigation question.
Misleading Statistics in the Media
Analyse each statistical claim.
'Crime fell by 50% last year — from 2 cases to 1.' Why is this claim potentially misleading?
A graph shows home prices rising sharply, but the y-axis starts at $700 000 rather than $0. How does this affect the impression the graph gives?
Find or make up your own example of a misleading statistical claim. Explain why it is misleading and how to fix it.
Matching Summary Statistics to Context
For each context, identify which summary statistic is most appropriate and justify.
You are reporting the 'typical' weekly wage in Australia. Choose mean or median and justify.
A quality control engineer checks the consistency of a production line. Which measure of spread would be most useful?
A teacher wants to report on the 'most common' mark in a class test. Which measure is most appropriate?
A weather bureau wants to describe how variable summer temperatures are in Darwin compared to Hobart. Which statistic should they compare?
Statistical Reasoning — Circle the Best Answer
Circle the most statistically correct answer.
When reporting house prices to give buyers a realistic expectation:
A factory has target output = 500 units/day. They want to check consistency. Use:
The most complete comparison of two data sets includes:
Which survey design is MOST likely to produce a biased result?
Comparing Data — Extended Investigation
Conduct a statistical investigation comparing two groups.
Choose two groups to compare (examples: Year 9 maths scores vs Year 8, sleep duration on school vs weekend nights). State your question clearly.
Collect at least 15 data values per group. Write them here.
Calculate the five-number summary for each group.
Create appropriate displays (back-to-back stem-and-leaf or side-by-side box plots).
Write a 5-sentence statistical conclusion comparing centre, spread, shape, and any outliers.
Statistics Terminology — Final Match
Match each term to its definition.
Statistics — Peer Teaching Task
Explain key concepts as if teaching a Year 7 student.
Explain what the median is and why it is sometimes better than the mean. Use a simple example.
Explain what the IQR measures and how to calculate it. Use the data set 5, 8, 10, 12, 15 as an example.
Explain what distribution shape means and give an example of each shape.
Year 9 Statistics — Self-Assessment
Reflect on everything you have learned in this worksheet.
List five statistical concepts you now understand well.
List two concepts you found challenging. What will you do to improve?
How has your understanding of data comparison changed through this worksheet?
Where might you use data comparison skills in your future study or career?
Dot Plot Construction and Interpretation
Use dot plots to display and compare data.
Data: 12, 14, 14, 15, 16, 16, 16, 17, 18, 20. Create a dot plot for this data.
Identify any clusters, gaps, or outliers in the dot plot.
Find the median and mode directly from the dot plot.
How does the dot plot help you see distribution shape more clearly than a list of numbers?
Reading a Histogram
Interpret a frequency histogram for exam scores.
A histogram has bars: 40–50 (3 students), 50–60 (8 students), 60–70 (14 students), 70–80 (10 students), 80–90 (4 students), 90–100 (1 student). What is the modal class interval?
How many students scored below 70?
Describe the shape of this distribution.
Can you calculate the exact mean from a histogram? Explain.
Statistics and Society
Explore how statistics influence decisions that affect people.
Describe a situation where statistics were used to make an important public health decision (e.g. vaccine rollout, road safety policy).
How could a government use statistics about unemployment to support two opposite political arguments?
Why is it important for citizens to understand basic statistics?
Statistical Thinking — Circle the Correct Approach
Circle the most statistically sound approach.
To decide which of two teaching methods is more effective:
An outlier is found in a data set. You should:
When comparing two skewed data sets, the best pair of statistics is:
Which measure best describes the 'typical' value in a symmetric distribution?
Statistics — Written Test Preparation
Answer these exam-style questions with full written responses.
Data set A: 45, 50, 52, 54, 56, 60, 62, 65, 70, 90. Data set B: 48, 50, 51, 53, 55, 57, 59, 61, 63, 65. Calculate mean, median, IQR and range for each. Show all working.
Write a paragraph comparing the two data sets. Mention centre, spread, shape and any outliers.
Statistics — Create Your Own Questions
Write and solve your own statistics questions at three different levels.
Write a FOUNDATIONAL question about finding the median from a list of values. Include your answer.
Write a DEVELOPING question involving comparing two data sets using IQR. Include your answer.
Write an EXTENDING question involving identifying outliers using the IQR rule and discussing their effect on statistics. Include a full solution.
Statistics from a Frequency Table
Calculate summary statistics from the frequency table.
Hours of TV | Freq: 0→4, 1→7, 2→9, 3→5, 4→3, 5→2. Total: 30 students. Calculate the mean hours of TV watched.
Find the median hours of TV watched.
What is the mode?
Describe the shape of this distribution.
Types of Data — Match to Description
Match each data type to its description.
Classifying Data and Choosing Statistics
Classify each variable and choose appropriate statistics.
For each variable, state whether it is categorical or numerical, and whether numerical data is discrete or continuous: (a) Number of pets; (b) Favourite sport; (c) Height in cm; (d) Survey satisfaction rating 1–5.
Which summary statistics would you calculate for numerical data? Which would you use for categorical data?
Statistics — Final Reflection
Reflect on your statistical learning journey.
Explain in your own words why we need more than just the mean to compare two data sets.
Describe a real-world situation where you could apply data comparison skills.
What is the most important idea you learned in this worksheet? Why?
What questions do you still have about statistics? What would you like to explore next?
Measures of Spread — Which Is Correct?
Circle the correct statement.
The range is:
The IQR is:
Which measure of spread is MOST affected by outliers?