How would you expect the rent of properties in London's commuter belt to be affected by the distance from London?
.jpg)
The following data was collected in order to investigate the relationship between the cost of rent of a double room and the distance from London:
| Distance from London (km) | Rent Cost (GBP) |
|---|---|
| 10 | 1500 |
| 20 | 1300 |
| 30 | 1100 |
| 40 | 900 |
| 50 | 700 |
| 60 | 2000 |
| 70 | 600 |
A scatter graph was then constructed:

We can see that overall the points are going down - which makes sense - the further away from London we are, the less rent will cost!
We call this negative correlation.
Negative correlation is a relationship between two variables where as one variable goes up, the other goes down.
The heights of plants and the number of weeks since the plants were planted, on the other hand, display a positive correlation:

As the weeks went up, so did the heights of the plants!

Sometimes we have no correlation.
That is when there is no apparent relationship between the two variables, i.e. one variable doesn't affect the other!
The scatter graph would then look something like this:

We can see that the points aren't going up as in positive correlation or down as with negative correlation - they're 'random'!
.jpg)
Let's come back to our rent vs distance from London graph one more time:

We can see that there is one 'random point' at (60, 2000) that doesn't really fit our negative correlation.
We call this an outlier because it lies outside the trend we've spotted (points going down).
For example, here we have an outlier at (60, 2000), so we have a rent of £2,000 at 60 km away from London.
Maybe there is a particularly beautiful or popular town 60 km from London and that's why we have an outlier there!
We omit any outliers when we draw the line of best fit:

A line of best fit (or a regression line or a trend line) is a straight line that best represents the relationship between a set of data points on a scatter graph.
We can then use it to predict what values we could have for the parts of the graph where we have no data.
For example, let's say we were looking to live at the most 35 km from London, but we wanted to get the cheapest rent possible.
We can use the scatter graph with a line of best fit to see what the rent would be 35 km away from London based on the trend:

So we can see that the cheapest rent we can expect is just above £1,000!

Ready to have a go at some questions?





