Bivariate Data & Scatter Plots
Correlation Type Match
Draw a line from each correlation type to its description.
Positive or Negative Correlation
Circle the correct correlation direction for each scenario.
As temperature increases, ice cream sales increase. This is:
As the age of a car increases, its resale value decreases. This is:
As altitude increases, air temperature generally decreases. This is:
As hours of study increase, exam marks tend to increase. This is:
Strong or Weak Correlation
Circle whether each description suggests a strong or weak correlation.
Data points cluster very tightly around an upward-sloping line:
Data points are widely scattered but show a slight downward trend:
Almost every increase in x leads to a predictable decrease in y:
Points form a rough cloud with only a vague upward drift:
Classify the Correlation Strength
Sort each scenario into the correct column: Strong, Weak, or No Correlation.
Variable Pairs & Expected Correlation
Draw a line from each variable pair to the expected correlation direction.
Independent vs Dependent Variable
Circle the correct identification of the independent variable (the one we control or expect to influence the other).
Investigating how study time affects exam results. The independent variable is:
Testing whether temperature affects plant growth. The independent variable is:
Exploring the link between age and reaction time. The independent variable is:
Does distance from school affect travel time? The independent variable is:
Reading Scatter Plot Axes
Circle the correct answer about what each axis represents on a scatter plot.
The horizontal axis (x-axis) of a scatter plot typically shows:
The vertical axis (y-axis) of a scatter plot typically shows:
On a scatter plot of 'Hours of sunlight vs Plant growth', the x-axis should show:
Each point on a scatter plot represents:
Categorical vs Numerical Bivariate Data
Sort each variable pair into the correct column: both variables are numerical (suitable for a scatter plot) or at least one is categorical (not suitable for a scatter plot).
Line of Best Fit Properties
Circle the correct statement about lines of best fit.
A line of best fit should:
If a scatter plot shows a strong negative correlation, the line of best fit:
A line of best fit is most useful when:
If the data has no correlation, a line of best fit:
Interpolation vs Extrapolation
Circle the correct answer about using a line of best fit for predictions.
Using a line of best fit to predict a value within the range of the collected data is called:
Using a line of best fit to predict a value beyond the range of the collected data is called:
Which type of prediction is generally more reliable?
Data was collected for students who studied between 1 and 6 hours. Predicting the mark of a student who studied 12 hours is:
Correlation vs Causation
Circle the correct answer about the difference between correlation and causation.
Correlation means:
Causation means:
Ice cream sales and drowning rates both increase in summer. This is an example of:
A randomised controlled experiment can help establish:
Identifying Outliers
Circle the correct answer about outliers in scatter plots.
An outlier on a scatter plot is:
What effect can an outlier have on a line of best fit?
Before removing an outlier, you should:
A student recorded the heights and weights of 20 classmates. One point is far from all others. The best first step is:
Steps in a Bivariate Investigation
Put the steps for conducting a bivariate data investigation in the correct order.
Scatter Plot Pattern Match
Draw a line from each scatter plot description to its correlation type.
Predict Using a Line of Best Fit
Use the described line of best fit to circle the best prediction.
A line of best fit for 'Hours studied (x) vs Exam mark (y)' passes through (2, 50) and (6, 80). Predict the mark for 4 hours of study:
Using the same line, predict the mark for 5 hours of study:
The line of best fit for 'Temperature (x) vs Hot drinks sold (y)' passes through (10°C, 60) and (30°C, 20). Predict sales at 20°C:
Using the same line, would you trust a prediction for sales at 50°C?
Interpreting r-values
The correlation coefficient (r) measures the strength and direction of a linear association. Circle the correct interpretation.
An r-value of +0.95 indicates:
An r-value of −0.82 indicates:
An r-value of +0.15 indicates:
An r-value of 0 indicates:
An r-value of −1 indicates:
Valid vs Invalid Conclusions
Sort each conclusion into the correct column: Valid or Invalid based on scatter plot data.
Confounding (Third) Variables
Circle the most likely confounding variable that could explain the observed correlation.
Correlation: cities with more fire stations have more crime. The confounding variable is likely:
Correlation: children who eat breakfast score higher on tests. A possible confounding variable is:
Correlation: people who sleep more tend to weigh less. A confounding variable could be:
Correlation: countries with higher chocolate consumption per capita have more Nobel Prize winners. The confounding variable is likely:
Design a Bivariate Investigation
Design a detailed bivariate data investigation.
Design a statistical investigation to test whether there is a relationship between the number of hours people exercise per week and their resting heart rate. In your response, describe: (a) the variables and which is independent/dependent, (b) how you would collect data (sample size, method, potential bias), (c) what you would expect the scatter plot to look like and what correlation you predict, (d) how you would draw and use a line of best fit, and (e) whether finding a correlation would prove that exercise causes a lower resting heart rate. Explain your reasoning.
Analyse a Dataset
Analyse the following bivariate dataset and describe the association.
A teacher recorded the number of hours each student spent on their phone per day and their average test score: Phone hours: 1, 2, 2, 3, 3, 4, 4, 5, 6, 7 Test score: 88, 82, 85, 75, 78, 70, 65, 60, 55, 50 (a) What type of correlation does this data suggest? (b) Estimate the strength of the correlation (strong, moderate, or weak). (c) If you drew a line of best fit, would its gradient be positive or negative? (d) Predict the test score for a student who uses their phone for 3.5 hours per day. (e) Would it be appropriate to predict the score for a student who uses their phone for 15 hours per day? Why or why not?
Correlation vs Causation Explained
Explain the difference between correlation and causation using examples.
Using your own examples, explain the difference between correlation and causation. Include: (a) one example where two variables are correlated AND one causes the other, (b) one example where two variables are correlated but neither causes the other (identify the confounding variable), and (c) an explanation of why scientists use controlled experiments rather than observational studies to establish causation.
Critique a Study's Conclusions
Read the study summary and critique its conclusions.
A newspaper reports: 'A study of 500 adults found that people who drink more coffee tend to live longer. Researchers concluded that coffee extends your lifespan.' Critique this conclusion by addressing: (a) Does correlation prove causation here? (b) What confounding variables might explain this relationship? (c) What type of study would be needed to establish whether coffee actually extends lifespan? (d) How might the sample or data collection method affect the reliability of the findings?
Compare Two Scatter Plots
Compare two bivariate datasets and their scatter plots.
Two investigations were conducted at a school: Investigation A — Hours of sleep vs Reaction time (ms): Sleep: 5, 6, 6, 7, 7, 8, 8, 9, 9, 10 Reaction: 420, 380, 400, 340, 350, 300, 310, 270, 280, 250 Investigation B — Hours of TV vs Reaction time (ms): TV: 1, 2, 2, 3, 3, 4, 5, 5, 6, 7 Reaction: 310, 280, 350, 300, 370, 320, 290, 340, 360, 300 (a) Describe the correlation you would expect in each investigation. (b) Which investigation would likely show a stronger correlation? Explain why. (c) For the investigation with the stronger correlation, describe what the line of best fit would look like. (d) Can either investigation prove causation? Why or why not?
True or False — Statistics Concepts
Circle TRUE or FALSE for each statement about bivariate data and scatter plots.
A correlation coefficient (r) can have a value of 1.5.
A scatter plot can only show positive correlations.
If r = 0, there is definitely no relationship between the variables.
Interpolation is more reliable than extrapolation.
An outlier should always be removed from a dataset.
The independent variable is placed on the x-axis of a scatter plot.
Collect Data and Predict
Plan a real data collection and make predictions.
You want to investigate whether there is a relationship between the distance students live from school and the time it takes them to travel to school. (a) Which variable is independent and which is dependent? (b) Describe how you would collect data from at least 15 students. (c) What type of correlation do you predict? Explain your reasoning. (d) Sketch what you think the scatter plot might look like (describe it in words). (e) Identify one potential source of bias in your data collection and how you would minimise it.
Identify Confounding Variables
Identify and explain confounding variables in real-world correlations.
For each of the following correlations, identify at least one confounding variable and explain how it could account for the observed relationship: (a) Students who eat breakfast tend to get better grades. (b) Countries with more televisions per household have longer life expectancies. (c) People who own more books tend to earn higher salaries. (d) Suburbs with more parks have lower rates of obesity. For one of these examples, describe how you could design a study to test whether the relationship is causal rather than just a correlation.
Collect Bivariate Data at Home
Collect your own bivariate data and create a scatter plot.
- 1Record the temperature and the number of people at a local park over several days. Create a scatter plot — is there a correlation?
- 2Survey family members: compare their height with their arm span. Plot the data and describe the association.
- 3Track your screen time and hours of sleep for a week. Create a scatter plot and describe any pattern you observe.
- 4Measure the length and width of 10 different leaves from the same type of tree. Plot the data and describe the association.
Find Correlations in Daily Life
Look for examples of correlation (and possible causation) in your everyday life and in the media.
- 1Find a news article that claims one thing causes another. Identify whether the evidence shows correlation or causation. What confounding variables might be involved?
- 2Over a week, record two variables you think might be related (e.g., time spent outdoors vs mood rating 1–10). Create a scatter plot and describe what you find.
- 3Look at the nutrition labels on 10 food items. Plot sugar content vs calorie count. Is there a correlation? Is it what you expected?
- 4Ask five people to estimate how far they live from the nearest shop (in km) and how often they visit per week. Plot the data and describe any pattern.
Correlation — Describe and Classify
Describe scatter plot correlations accurately.
For each pair of variables, state whether you would expect a positive correlation, negative correlation, or no correlation, and give a brief reason: (a) Study hours and exam score (b) Temperature and hot chocolate sales (c) Shoe size and intelligence (d) Height and weight of adults (e) Daily exercise and resting heart rate
Scatter Plot Description to Correlation Type
Draw a line from each scatter plot description to the correct correlation type.
Line of Best Fit — Equation and Interpretation
Find and interpret the equation of a line of best fit.
A scatter plot shows study hours (x) and exam scores (y) for 10 students. The line of best fit passes through (2, 55) and (8, 85). Find: (a) The gradient (m) of the line of best fit. (b) The y-intercept. (c) The equation of the line. (d) Predict the score for a student who studies 5 hours. (e) Explain what the gradient means in context.
Correlation Coefficient r — Interpret
Circle the correct interpretation of each correlation coefficient.
r = 0.92 means:
r = −0.15 means:
r = 0 means:
r = −0.85 means:
Causation vs Correlation
Sort each example: Correlation implies Causation (likely), or Correlation does NOT imply Causation.
Residuals and Goodness of Fit
Calculate and interpret residuals from a line of best fit.
Using the model: Exam score = 5 × (study hours) + 45, calculate the residual for each student: (a) Studied 3 hrs, scored 62 (b) Studied 6 hrs, scored 72 (c) Studied 9 hrs, scored 92 For each, state whether the line overestimates or underestimates the actual score. What does a pattern of large residuals suggest about the model?
Extrapolation — When to Be Careful
Critique the use of extrapolation beyond the data range.
A model for plant height over time gives h = 1.5t + 3 (h in cm, t in weeks) based on data from weeks 1–8. A student uses this to predict the height at week 52. (a) What prediction does the model give? (b) Explain why this prediction is likely unreliable. (c) What factors might limit the plant's actual growth?
Explain the difference between interpolation and extrapolation. Which is more reliable and why? Give an example of each using a scatter plot context.
Scatter Plot Correlations in a Research Study
Record correlation types observed across 20 variable pairs in a dataset.
| Item | Tally | Total |
|---|---|---|
Strong positive correlation (r > 0.7) | ||
Moderate positive correlation (0.3 < r ≤ 0.7) | ||
Weak/no correlation (−0.3 ≤ r ≤ 0.3) | ||
Moderate negative correlation (−0.7 ≤ r < −0.3) | ||
Strong negative correlation (r < −0.7) |
Two-Way Tables — Bivariate Categorical Data
Construct and analyse a two-way frequency table.
100 students were surveyed about sport preferences and gender: • 60 are female: 25 prefer netball, 20 prefer swimming, 15 prefer soccer • 40 are male: 5 prefer netball, 10 prefer swimming, 25 prefer soccer (a) Construct the two-way table. (b) What percentage of females prefer swimming? (c) What percentage of soccer players are male? (d) Is there an association between gender and sport preference? Justify.
Outliers in Bivariate Data
Identify and analyse outliers in scatter plots.
Explain what an outlier means in the context of bivariate data. How does an outlier differ from a point that is merely an extreme value on one axis? Describe how outliers can affect the line of best fit and the correlation coefficient r.
In a scatter plot of shoe size vs reading level for 30 children aged 5–15, there is a strong positive correlation (r = 0.82). Does this mean bigger feet cause better reading? Identify the confounding variable and explain how it creates a spurious correlation.
Scatter Plot Variables — Independent vs Dependent
Sort each variable: which is the independent variable (x-axis) and which is dependent (y-axis)?
Collect Your Own Bivariate Data
Design and conduct a data collection activity to investigate bivariate relationships.
- 1Measure your reaction time (use an online reaction time test) 10 times at different times of day (morning, afternoon, evening). Record time-of-day and reaction time. Create a scatter plot and describe any pattern you see.
- 2Record the outside temperature and the number of people wearing jackets when you go out for 7 different days. Create a scatter plot. Is there a negative correlation?
- 3Survey at least 15 people on two numerical variables (e.g. hours of sleep vs energy rating out of 10). Plot the scatter graph and calculate the correlation coefficient using a spreadsheet.
Pearson's Correlation Coefficient — Calculation
Calculate and interpret Pearson's correlation coefficient.
For the 5 data points: (1,2), (2,4), (3,5), (4,4), (5,7): (a) Calculate the mean of x and the mean of y. (b) Calculate Σ(x − x̄)(y − ȳ), Σ(x − x̄)², and Σ(y − ȳ)². (c) Use r = Σ(x−x̄)(y−ȳ) / √[Σ(x−x̄)² × Σ(y−ȳ)²] to find r. (d) Interpret the value of r you found.
Steps to Draw a Line of Best Fit
Put the steps in the correct order for drawing a line of best fit by eye.
Critique a Statistical Claim
Critically evaluate a statistical claim involving correlation.
A newspaper headline reads: 'Research shows children who eat breakfast score higher on tests — proof that breakfast improves brain function.' Critically evaluate this claim. Identify: (a) what type of study this might be, (b) at least two confounding variables, (c) why correlation does not prove causation, (d) what type of study would be needed to establish causation.
Correlation Strength — Match the Description
Draw a line from each correlation coefficient to its description.
Least Squares Regression Line
Understand and apply the line of best fit equation.
Explain what the least squares regression line minimises. Why is it called 'least squares'?
The regression line for study hours (x) vs test score (y) is ŷ = 42 + 8x. Interpret the slope and y-intercept in context.
Predict the test score for a student who studies 6 hours. Is this interpolation or extrapolation?
Predict the score for a student who studies 15 hours. Why should this prediction be treated with caution?
Correlation or Causation?
Sort each claim as showing genuine causation or merely correlation.
Collecting and Graphing Bivariate Data
Design and carry out a small bivariate data investigation.
Choose two variables you believe might be correlated (e.g. temperature and ice cream sales, hours of sleep and concentration). State a hypothesis about their relationship.
Describe how you would collect data for your two variables. How many data points would you collect? What controls would you apply?
Sketch the shape of the scatter plot you would expect to see if your hypothesis is correct.
How would you calculate r for your data? What value of r would support your hypothesis?
Scatter Plot Patterns Identified
Tally each type of correlation pattern observed in the scatter plots you studied.
| Item | Tally | Total |
|---|---|---|
Strong positive | ||
Weak positive | ||
Strong negative | ||
Weak negative | ||
No correlation |
Identify the Correct Interpretation
Circle the best interpretation of each statistical statement.
r = 0.85 between height and shoe size means:
The slope of the regression line is 2.5. This means:
An outlier in a scatter plot:
Extrapolation beyond the data range is unreliable because:
Residuals and Model Quality
Assess how well a regression model fits the data.
Define a residual in the context of regression analysis.
A student scores 68 on a test. The regression model predicts 74. Calculate and interpret the residual.
If residuals are randomly scattered above and below the regression line, what does this suggest about the model?
If residuals show a curved pattern, what does this suggest? What model might be better?
Bivariate Data Investigation at Home
Design and conduct a small bivariate data study using household data.
- 1Collect data on two variables for at least 10 observations (e.g. temperature vs electricity bill for 10 months). Draw a scatter plot and estimate the correlation.
- 2Research a real Australian dataset (e.g. ABS website). Find two related variables and describe their correlation.
- 3Look at a health or fitness app on your phone or family member's phone. Find two variables that are tracked and describe any pattern you see.
- 4Research Simpson's Paradox — a situation where a trend appears in groups of data but disappears or reverses when groups are combined. Write a short summary.
- 5Find a scatter plot in a scientific journal or newspaper. Write three observations about the data shown, including the direction, strength, and any outliers.
Non-Linear Relationships in Data
Recognise when a linear model is not appropriate.
Sketch scatter plots showing: (a) a linear relationship, (b) a curved (quadratic) relationship, (c) no relationship. Label each.
Population data for a city over 10 years shows exponential growth. Why would a linear regression model be inappropriate here?
What transformations (e.g. log, square root) could linearise an exponential relationship in data? Explain how you would apply them.
Pearson's Correlation Coefficient
Understand and calculate Pearson's r.
Explain what Pearson's correlation coefficient r measures. What are its maximum and minimum values?
For data: x = {2, 4, 6, 8, 10}, y = {5, 9, 13, 16, 21}. Calculate the mean of x and mean of y. Then calculate r using the formula or technology. Interpret the result.
Can two variables have r ≈ 0 but still have a strong non-linear relationship? Explain and give an example.
Scatter Plot Vocabulary
Match each scatter plot term to its correct description.
Confounding Variables and Study Design
Identify confounding variables and distinguish study types.
Define a confounding variable. Give an example of how a confounder could lead to a misleading correlation.
A study finds that areas with more hospitals have higher death rates. Does this mean hospitals cause death? Identify the confounding variable.
Explain the difference between an observational study and a randomised controlled experiment. Which one can establish causation?
Design a controlled experiment to test whether lack of sleep causes lower test scores. Describe your key controls.