Statistics

Bivariate Data & Scatter Plots

Correlation Type Match

Draw a line from each correlation type to its description.

Strong positive

Weak positive

Strong negative

No correlation

Weak negative

Points scattered randomly with no pattern

Points cluster tightly along an upward line

Points loosely trend upward with much scatter

Points cluster tightly along a downward line

Points loosely trend downward with much scatter

Positive or Negative Correlation

Circle the correct correlation direction for each scenario.

As temperature increases, ice cream sales increase. This is:

Positive correlation

Negative correlation

No correlation

As the age of a car increases, its resale value decreases. This is:

Negative correlation

Positive correlation

No correlation

As altitude increases, air temperature generally decreases. This is:

Negative correlation

Positive correlation

No correlation

As hours of study increase, exam marks tend to increase. This is:

Positive correlation

Negative correlation

No correlation

Strong or Weak Correlation

Circle whether each description suggests a strong or weak correlation.

Data points cluster very tightly around an upward-sloping line:

Strong correlation

Weak correlation

No correlation

Data points are widely scattered but show a slight downward trend:

Weak correlation

Strong correlation

No correlation

Almost every increase in x leads to a predictable decrease in y:

Strong correlation

Weak correlation

No correlation

Points form a rough cloud with only a vague upward drift:

Weak correlation

Strong correlation

No correlation

Classify the Correlation Strength

Sort each scenario into the correct column: Strong, Weak, or No Correlation.

Height vs shoe size

Hours of study vs exam mark

Shoe size vs favourite colour

Temperature vs ice cream sales

Age of car vs car value

Number of pets vs maths score

Distance driven vs fuel used

Hours of sleep vs number of siblings

Rainfall vs umbrella sales

Strong Correlation

Weak Correlation

No Correlation

Variable Pairs & Expected Correlation

Draw a line from each variable pair to the expected correlation direction.

Hours of exercise per week & fitness level

Number of cigarettes smoked & lung capacity

Shoe size & IQ

Advertising spend & product sales

Distance from equator & average temperature

No correlation expected

Positive — more exercise, higher fitness

Negative — more cigarettes, lower lung capacity

Positive — more advertising, more sales

Negative — further from equator, lower temperature

Independent vs Dependent Variable

Circle the correct identification of the independent variable (the one we control or expect to influence the other).

Investigating how study time affects exam results. The independent variable is:

Study time (hours)

Exam result (%)

Student name

Testing whether temperature affects plant growth. The independent variable is:

Temperature (°C)

Plant height (cm)

Type of soil

Exploring the link between age and reaction time. The independent variable is:

Age (years)

Reaction time (ms)

Number of trials

Does distance from school affect travel time? The independent variable is:

Distance from school (km)

Travel time (minutes)

Mode of transport

Reading Scatter Plot Axes

Circle the correct answer about what each axis represents on a scatter plot.

The horizontal axis (x-axis) of a scatter plot typically shows:

The independent variable

The dependent variable

The frequency

The vertical axis (y-axis) of a scatter plot typically shows:

The dependent variable

The independent variable

The sample size

On a scatter plot of 'Hours of sunlight vs Plant growth', the x-axis should show:

Hours of sunlight

Plant growth (cm)

Number of plants

Each point on a scatter plot represents:

One data pair (one observation of both variables)

The average of all data

A single variable measurement

Categorical vs Numerical Bivariate Data

Sort each variable pair into the correct column: both variables are numerical (suitable for a scatter plot) or at least one is categorical (not suitable for a scatter plot).

Height (cm) vs Weight (kg)

Favourite sport vs Gender

Temperature (°C) vs Rainfall (mm)

Eye colour vs Hair colour

Age (years) vs Reaction time (ms)

Brand of phone vs Satisfaction rating (1–5)

Hours of screen time vs Hours of sleep

State of residence vs Annual income ($)

Both Numerical (scatter plot)

Includes Categorical (not scatter plot)

Line of Best Fit Properties

Circle the correct statement about lines of best fit.

A line of best fit should:

Pass through or near as many points as possible with roughly equal points above and below

Connect the first and last data points

Pass through every data point

If a scatter plot shows a strong negative correlation, the line of best fit:

Slopes downward from left to right

Slopes upward from left to right

Is horizontal

A line of best fit is most useful when:

There is a clear linear trend in the data

The data points form a curved pattern

There is no correlation between the variables

If the data has no correlation, a line of best fit:

Is not meaningful and should not be drawn

Should still be drawn through the middle

Will always be perfectly horizontal

Interpolation vs Extrapolation

Circle the correct answer about using a line of best fit for predictions.

Using a line of best fit to predict a value within the range of the collected data is called:

Interpolation

Extrapolation

Correlation

Using a line of best fit to predict a value beyond the range of the collected data is called:

Extrapolation

Interpolation

Regression

Which type of prediction is generally more reliable?

Interpolation, because the trend is supported by nearby data

Extrapolation, because it extends the known pattern

Both are equally reliable

Data was collected for students who studied between 1 and 6 hours. Predicting the mark of a student who studied 12 hours is:

Extrapolation and may be unreliable

Interpolation and is reliable

Not possible with a line of best fit

Correlation vs Causation

Circle the correct answer about the difference between correlation and causation.

Correlation means:

Two variables tend to change together in a predictable pattern

One variable directly causes the other to change

The variables are always related by a formula

Causation means:

A change in one variable directly produces a change in the other

Two variables happen to change at the same time

The correlation coefficient is close to 1

Ice cream sales and drowning rates both increase in summer. This is an example of:

Correlation without causation — a third variable (hot weather) drives both

Causation — ice cream causes drowning

No correlation at all

A randomised controlled experiment can help establish:

Causation

Only correlation

Neither correlation nor causation

Identifying Outliers

Circle the correct answer about outliers in scatter plots.

An outlier on a scatter plot is:

A data point that lies far from the overall pattern of the other points

The point closest to the line of best fit

Any point on the x-axis

What effect can an outlier have on a line of best fit?

It can pull the line toward itself, making the fit less accurate for the rest of the data

It has no effect on the line of best fit

It always improves the accuracy of the line

Before removing an outlier, you should:

Investigate whether it is a data entry error or a genuine unusual observation

Always remove it because outliers are mistakes

Ignore it completely

A student recorded the heights and weights of 20 classmates. One point is far from all others. The best first step is:

Check whether the data was recorded correctly for that student

Delete the point immediately

Draw the line of best fit through it

Steps in a Bivariate Investigation

Put the steps for conducting a bivariate data investigation in the correct order.

Formulate a question about the relationship between two numerical variables

Plan data collection: decide on sample size, method, and how to record both variables

Collect the data systematically, recording paired values

Organise the data in a table of ordered pairs

Construct a scatter plot with the independent variable on the x-axis

Describe the association: direction, form, and strength

Draw a line of best fit if the trend is approximately linear

Use the line to make predictions (interpolation) and draw conclusions

Scatter Plot Pattern Match

Draw a line from each scatter plot description to its correlation type.

Points rise steeply from left to right in a tight band

Points fall gradually from left to right with wide scatter

Points form a random cloud with no trend

Points fall steeply from left to right in a tight band

Points rise gradually from left to right with wide scatter

Strong positive correlation

Weak negative correlation

No correlation

Strong negative correlation

Weak positive correlation

Predict Using a Line of Best Fit

Use the described line of best fit to circle the best prediction.

A line of best fit for 'Hours studied (x) vs Exam mark (y)' passes through (2, 50) and (6, 80). Predict the mark for 4 hours of study:

Using the same line, predict the mark for 5 hours of study:

72.5

The line of best fit for 'Temperature (x) vs Hot drinks sold (y)' passes through (10°C, 60) and (30°C, 20). Predict sales at 20°C:

Using the same line, would you trust a prediction for sales at 50°C?

No — 50°C is far outside the data range (extrapolation)

Yes — the line can be extended indefinitely

Yes — as long as we use the equation

Interpreting r-values

The correlation coefficient (r) measures the strength and direction of a linear association. Circle the correct interpretation.

An r-value of +0.95 indicates:

Strong positive linear correlation

Weak positive correlation

No correlation

An r-value of −0.82 indicates:

Strong negative linear correlation

Weak negative correlation

Strong positive correlation

An r-value of +0.15 indicates:

Weak positive correlation (close to no linear relationship)

Strong positive correlation

Perfect correlation

An r-value of 0 indicates:

No linear correlation (but a non-linear relationship may still exist)

A perfect negative correlation

The data has no variability

An r-value of −1 indicates:

A perfect negative linear correlation — all points lie exactly on a downward line

No correlation

A weak negative correlation

Valid vs Invalid Conclusions

Sort each conclusion into the correct column: Valid or Invalid based on scatter plot data.

There is a positive association between hours studied and exam marks

Studying more hours causes higher exam marks

As temperature increases, hot drink sales tend to decrease

Hot weather causes people to stop drinking hot drinks entirely

There appears to be no linear relationship between shoe size and IQ

Countries that eat more chocolate win more Nobel Prizes, so chocolate makes people smarter

The data suggests a strong negative correlation between car age and resale value

Since ice cream sales and sunburn rates are correlated, eating ice cream causes sunburn

Valid Conclusion

Invalid Conclusion

Confounding (Third) Variables

Circle the most likely confounding variable that could explain the observed correlation.

Correlation: cities with more fire stations have more crime. The confounding variable is likely:

City population size

Number of firefighters

Colour of fire trucks

Correlation: children who eat breakfast score higher on tests. A possible confounding variable is:

Overall family socioeconomic status and home support

The brand of cereal eaten

The colour of the breakfast bowl

Correlation: people who sleep more tend to weigh less. A confounding variable could be:

Overall health habits (exercise, diet, stress levels)

Pillow type

Bedroom wall colour

Correlation: countries with higher chocolate consumption per capita have more Nobel Prize winners. The confounding variable is likely:

National wealth and investment in education and research

Type of chocolate preferred

Average temperature of the country

Design a Bivariate Investigation

Design a detailed bivariate data investigation.

Design a statistical investigation to test whether there is a relationship between the number of hours people exercise per week and their resting heart rate. In your response, describe: (a) the variables and which is independent/dependent, (b) how you would collect data (sample size, method, potential bias), (c) what you would expect the scatter plot to look like and what correlation you predict, (d) how you would draw and use a line of best fit, and (e) whether finding a correlation would prove that exercise causes a lower resting heart rate. Explain your reasoning.

Analyse a Dataset

Analyse the following bivariate dataset and describe the association.

A teacher recorded the number of hours each student spent on their phone per day and their average test score: Phone hours: 1, 2, 2, 3, 3, 4, 4, 5, 6, 7 Test score: 88, 82, 85, 75, 78, 70, 65, 60, 55, 50 (a) What type of correlation does this data suggest? (b) Estimate the strength of the correlation (strong, moderate, or weak). (c) If you drew a line of best fit, would its gradient be positive or negative? (d) Predict the test score for a student who uses their phone for 3.5 hours per day. (e) Would it be appropriate to predict the score for a student who uses their phone for 15 hours per day? Why or why not?

Correlation vs Causation Explained

Explain the difference between correlation and causation using examples.

Using your own examples, explain the difference between correlation and causation. Include: (a) one example where two variables are correlated AND one causes the other, (b) one example where two variables are correlated but neither causes the other (identify the confounding variable), and (c) an explanation of why scientists use controlled experiments rather than observational studies to establish causation.

Critique a Study's Conclusions

Read the study summary and critique its conclusions.

A newspaper reports: 'A study of 500 adults found that people who drink more coffee tend to live longer. Researchers concluded that coffee extends your lifespan.' Critique this conclusion by addressing: (a) Does correlation prove causation here? (b) What confounding variables might explain this relationship? (c) What type of study would be needed to establish whether coffee actually extends lifespan? (d) How might the sample or data collection method affect the reliability of the findings?

Compare Two Scatter Plots

Compare two bivariate datasets and their scatter plots.

Two investigations were conducted at a school: Investigation A — Hours of sleep vs Reaction time (ms): Sleep: 5, 6, 6, 7, 7, 8, 8, 9, 9, 10 Reaction: 420, 380, 400, 340, 350, 300, 310, 270, 280, 250 Investigation B — Hours of TV vs Reaction time (ms): TV: 1, 2, 2, 3, 3, 4, 5, 5, 6, 7 Reaction: 310, 280, 350, 300, 370, 320, 290, 340, 360, 300 (a) Describe the correlation you would expect in each investigation. (b) Which investigation would likely show a stronger correlation? Explain why. (c) For the investigation with the stronger correlation, describe what the line of best fit would look like. (d) Can either investigation prove causation? Why or why not?

True or False — Statistics Concepts

Circle TRUE or FALSE for each statement about bivariate data and scatter plots.

A correlation coefficient (r) can have a value of 1.5.

FALSE — r always lies between −1 and +1

TRUE

A scatter plot can only show positive correlations.

FALSE — scatter plots can show positive, negative, or no correlation

TRUE

If r = 0, there is definitely no relationship between the variables.

FALSE — r = 0 means no linear relationship, but a non-linear relationship may exist

TRUE

Interpolation is more reliable than extrapolation.

TRUE — interpolation predicts within the data range where the trend is supported

FALSE

An outlier should always be removed from a dataset.

FALSE — outliers should be investigated before deciding whether to keep or remove them

TRUE

The independent variable is placed on the x-axis of a scatter plot.

TRUE

FALSE

Collect Data and Predict

Plan a real data collection and make predictions.

You want to investigate whether there is a relationship between the distance students live from school and the time it takes them to travel to school. (a) Which variable is independent and which is dependent? (b) Describe how you would collect data from at least 15 students. (c) What type of correlation do you predict? Explain your reasoning. (d) Sketch what you think the scatter plot might look like (describe it in words). (e) Identify one potential source of bias in your data collection and how you would minimise it.

Identify Confounding Variables

Identify and explain confounding variables in real-world correlations.

For each of the following correlations, identify at least one confounding variable and explain how it could account for the observed relationship: (a) Students who eat breakfast tend to get better grades. (b) Countries with more televisions per household have longer life expectancies. (c) People who own more books tend to earn higher salaries. (d) Suburbs with more parks have lower rates of obesity. For one of these examples, describe how you could design a study to test whether the relationship is causal rather than just a correlation.

Collect Bivariate Data at Home

Collect your own bivariate data and create a scatter plot.

1Record the temperature and the number of people at a local park over several days. Create a scatter plot — is there a correlation?
2Survey family members: compare their height with their arm span. Plot the data and describe the association.
3Track your screen time and hours of sleep for a week. Create a scatter plot and describe any pattern you observe.
4Measure the length and width of 10 different leaves from the same type of tree. Plot the data and describe the association.

Find Correlations in Daily Life

Look for examples of correlation (and possible causation) in your everyday life and in the media.

1Find a news article that claims one thing causes another. Identify whether the evidence shows correlation or causation. What confounding variables might be involved?
2Over a week, record two variables you think might be related (e.g., time spent outdoors vs mood rating 1–10). Create a scatter plot and describe what you find.
3Look at the nutrition labels on 10 food items. Plot sugar content vs calorie count. Is there a correlation? Is it what you expected?
4Ask five people to estimate how far they live from the nearest shop (in km) and how often they visit per week. Plot the data and describe any pattern.

Correlation — Describe and Classify

Describe scatter plot correlations accurately.

For each pair of variables, state whether you would expect a positive correlation, negative correlation, or no correlation, and give a brief reason: (a) Study hours and exam score (b) Temperature and hot chocolate sales (c) Shoe size and intelligence (d) Height and weight of adults (e) Daily exercise and resting heart rate

Scatter Plot Description to Correlation Type

Draw a line from each scatter plot description to the correct correlation type.

Points cluster tightly from bottom-left to top-right

Points are randomly scattered with no pattern

Points cluster loosely from top-left to bottom-right

Points curve upward steeply then level off

Points cluster very tightly along a nearly perfect line (upward)

No correlation

Non-linear relationship

Strong positive linear correlation

Weak negative linear correlation

Moderate positive linear correlation

Line of Best Fit — Equation and Interpretation

Find and interpret the equation of a line of best fit.

A scatter plot shows study hours (x) and exam scores (y) for 10 students. The line of best fit passes through (2, 55) and (8, 85). Find: (a) The gradient (m) of the line of best fit. (b) The y-intercept. (c) The equation of the line. (d) Predict the score for a student who studies 5 hours. (e) Explain what the gradient means in context.

Correlation Coefficient r — Interpret

Circle the correct interpretation of each correlation coefficient.

r = 0.92 means:

Strong positive linear correlation

Weak positive linear correlation

Strong negative linear correlation

r = −0.15 means:

Weak or no negative linear correlation

Strong negative linear correlation

Perfect negative correlation

r = 0 means:

No linear relationship (but could have non-linear)

Perfect correlation

Exactly half strong and half weak

r = −0.85 means:

Strong negative linear correlation

Weak negative correlation

No relationship

Causation vs Correlation

Sort each example: Correlation implies Causation (likely), or Correlation does NOT imply Causation.

Higher cigarette smoking rates → higher rates of lung cancer

Ice cream sales and drowning rates both peak in summer

More study hours → better exam scores

Number of Nicolas Cage films per year correlates with pool drowning deaths

Higher alcohol consumption → increased liver disease risk

Countries with more TVs per capita have higher life expectancy (wealth confound)

Likely Causal

Correlation but NOT Causation

Residuals and Goodness of Fit

Calculate and interpret residuals from a line of best fit.

Using the model: Exam score = 5 × (study hours) + 45, calculate the residual for each student: (a) Studied 3 hrs, scored 62 (b) Studied 6 hrs, scored 72 (c) Studied 9 hrs, scored 92 For each, state whether the line overestimates or underestimates the actual score. What does a pattern of large residuals suggest about the model?

Extrapolation — When to Be Careful

Critique the use of extrapolation beyond the data range.

A model for plant height over time gives h = 1.5t + 3 (h in cm, t in weeks) based on data from weeks 1–8. A student uses this to predict the height at week 52. (a) What prediction does the model give? (b) Explain why this prediction is likely unreliable. (c) What factors might limit the plant's actual growth?

Explain the difference between interpolation and extrapolation. Which is more reliable and why? Give an example of each using a scatter plot context.

Scatter Plot Correlations in a Research Study

Record correlation types observed across 20 variable pairs in a dataset.

Item	Tally	Total
Strong positive correlation (r > 0.7)
Moderate positive correlation (0.3 < r ≤ 0.7)
Weak/no correlation (−0.3 ≤ r ≤ 0.3)
Moderate negative correlation (−0.7 ≤ r < −0.3)
Strong negative correlation (r < −0.7)

Two-Way Tables — Bivariate Categorical Data

Construct and analyse a two-way frequency table.

100 students were surveyed about sport preferences and gender: • 60 are female: 25 prefer netball, 20 prefer swimming, 15 prefer soccer • 40 are male: 5 prefer netball, 10 prefer swimming, 25 prefer soccer (a) Construct the two-way table. (b) What percentage of females prefer swimming? (c) What percentage of soccer players are male? (d) Is there an association between gender and sport preference? Justify.

Draw here

Outliers in Bivariate Data

Identify and analyse outliers in scatter plots.

Explain what an outlier means in the context of bivariate data. How does an outlier differ from a point that is merely an extreme value on one axis? Describe how outliers can affect the line of best fit and the correlation coefficient r.

In a scatter plot of shoe size vs reading level for 30 children aged 5–15, there is a strong positive correlation (r = 0.82). Does this mean bigger feet cause better reading? Identify the confounding variable and explain how it creates a spurious correlation.

Scatter Plot Variables — Independent vs Dependent

Sort each variable: which is the independent variable (x-axis) and which is dependent (y-axis)?

Hours of sunlight per day

Crop yield per hectare

Advertising spend ($)

Monthly sales revenue ($)

Daily temperature (°C)

Number of beach visitors

Years of experience

Annual salary

Independent Variable (x)

Dependent Variable (y)

Collect Your Own Bivariate Data

Design and conduct a data collection activity to investigate bivariate relationships.

1Measure your reaction time (use an online reaction time test) 10 times at different times of day (morning, afternoon, evening). Record time-of-day and reaction time. Create a scatter plot and describe any pattern you see.
2Record the outside temperature and the number of people wearing jackets when you go out for 7 different days. Create a scatter plot. Is there a negative correlation?
3Survey at least 15 people on two numerical variables (e.g. hours of sleep vs energy rating out of 10). Plot the scatter graph and calculate the correlation coefficient using a spreadsheet.

Pearson's Correlation Coefficient — Calculation

Calculate and interpret Pearson's correlation coefficient.

For the 5 data points: (1,2), (2,4), (3,5), (4,4), (5,7): (a) Calculate the mean of x and the mean of y. (b) Calculate Σ(x − x̄)(y − ȳ), Σ(x − x̄)², and Σ(y − ȳ)². (c) Use r = Σ(x−x̄)(y−ȳ) / √[Σ(x−x̄)² × Σ(y−ȳ)²] to find r. (d) Interpret the value of r you found.

Draw here

Steps to Draw a Line of Best Fit

Put the steps in the correct order for drawing a line of best fit by eye.

Plot all data points on a clearly labelled scatter graph

Identify the overall trend (positive, negative, no correlation)

Draw a straight line that best represents the trend

Ensure approximately equal numbers of points above and below the line

Make sure the line passes through or near the mean point (x̄, ȳ)

Select two points on the line (not data points) to calculate the equation

Critique a Statistical Claim

Critically evaluate a statistical claim involving correlation.

A newspaper headline reads: 'Research shows children who eat breakfast score higher on tests — proof that breakfast improves brain function.' Critically evaluate this claim. Identify: (a) what type of study this might be, (b) at least two confounding variables, (c) why correlation does not prove causation, (d) what type of study would be needed to establish causation.

Correlation Strength — Match the Description

Draw a line from each correlation coefficient to its description.

r = 1.0

r = 0.8

r = 0.3

r = 0

r = −0.7

r = −1.0

Moderate negative correlation

Perfect positive correlation

No linear correlation

Strong positive correlation

Weak positive correlation

Perfect negative correlation

Least Squares Regression Line

Understand and apply the line of best fit equation.

Explain what the least squares regression line minimises. Why is it called 'least squares'?

The regression line for study hours (x) vs test score (y) is ŷ = 42 + 8x. Interpret the slope and y-intercept in context.

Predict the test score for a student who studies 6 hours. Is this interpolation or extrapolation?

Predict the score for a student who studies 15 hours. Why should this prediction be treated with caution?

Correlation or Causation?

Sort each claim as showing genuine causation or merely correlation.

Smoking and lung cancer

Ice cream sales and drowning rates

Exercise and improved cardiovascular health

Number of TVs owned and life expectancy

Vaccination and reduced disease incidence

Shoe size and reading ability in children

Likely causation

Correlation only

Collecting and Graphing Bivariate Data

Design and carry out a small bivariate data investigation.

Choose two variables you believe might be correlated (e.g. temperature and ice cream sales, hours of sleep and concentration). State a hypothesis about their relationship.

Describe how you would collect data for your two variables. How many data points would you collect? What controls would you apply?

Sketch the shape of the scatter plot you would expect to see if your hypothesis is correct.

Draw here

How would you calculate r for your data? What value of r would support your hypothesis?

Scatter Plot Patterns Identified

Tally each type of correlation pattern observed in the scatter plots you studied.

Item	Tally	Total
Strong positive
Weak positive
Strong negative
Weak negative
No correlation

Identify the Correct Interpretation

Circle the best interpretation of each statistical statement.

r = 0.85 between height and shoe size means:

Height causes larger shoe size

There is a strong positive linear association

Knowing height exactly predicts shoe size

The slope of the regression line is 2.5. This means:

For each 1-unit increase in x, y increases by 2.5 on average

x is 2.5 times y

When x = 0, y = 2.5

An outlier in a scatter plot:

Can strongly affect the regression line

Should always be deleted

Proves the data is wrong

Extrapolation beyond the data range is unreliable because:

The linear pattern may not continue outside the data range

The formula changes outside the range

We run out of decimal places

Residuals and Model Quality

Assess how well a regression model fits the data.

Define a residual in the context of regression analysis.

A student scores 68 on a test. The regression model predicts 74. Calculate and interpret the residual.

If residuals are randomly scattered above and below the regression line, what does this suggest about the model?

If residuals show a curved pattern, what does this suggest? What model might be better?

Bivariate Data Investigation at Home

Design and conduct a small bivariate data study using household data.

1Collect data on two variables for at least 10 observations (e.g. temperature vs electricity bill for 10 months). Draw a scatter plot and estimate the correlation.
2Research a real Australian dataset (e.g. ABS website). Find two related variables and describe their correlation.
3Look at a health or fitness app on your phone or family member's phone. Find two variables that are tracked and describe any pattern you see.
4Research Simpson's Paradox — a situation where a trend appears in groups of data but disappears or reverses when groups are combined. Write a short summary.
5Find a scatter plot in a scientific journal or newspaper. Write three observations about the data shown, including the direction, strength, and any outliers.

Non-Linear Relationships in Data

Recognise when a linear model is not appropriate.

Sketch scatter plots showing: (a) a linear relationship, (b) a curved (quadratic) relationship, (c) no relationship. Label each.

Draw here

Population data for a city over 10 years shows exponential growth. Why would a linear regression model be inappropriate here?

What transformations (e.g. log, square root) could linearise an exponential relationship in data? Explain how you would apply them.

Pearson's Correlation Coefficient

Understand and calculate Pearson's r.

Explain what Pearson's correlation coefficient r measures. What are its maximum and minimum values?

For data: x = {2, 4, 6, 8, 10}, y = {5, 9, 13, 16, 21}. Calculate the mean of x and mean of y. Then calculate r using the formula or technology. Interpret the result.

Can two variables have r ≈ 0 but still have a strong non-linear relationship? Explain and give an example.

Scatter Plot Vocabulary

Match each scatter plot term to its correct description.

Response variable

Explanatory variable

Outlier

Cluster

Line of best fit

Extrapolation

Predicting outside the range of the data

A point clearly separated from the main pattern

Placed on the x-axis; independent variable

Minimises the sum of squared residuals

A group of points separate from others

Placed on the y-axis; depends on x

Confounding Variables and Study Design

Identify confounding variables and distinguish study types.

Define a confounding variable. Give an example of how a confounder could lead to a misleading correlation.

A study finds that areas with more hospitals have higher death rates. Does this mean hospitals cause death? Identify the confounding variable.

Explain the difference between an observational study and a randomised controlled experiment. Which one can establish causation?

Design a controlled experiment to test whether lack of sleep causes lower test scores. Describe your key controls.

What to do next

Statistics

Media Bias & Statistical Analysis

Analyse inferences and conclusions in the media, identifying potential sources of bias and misleading representations