graphical meaning of correlation in linear regression
First guess, Which does plot have higher correlation?
par(mfrow = c(2, 1))
x1 = rnorm(1000) + seq(0.01, 10, 0.01)
y1 = rnorm(1000, 0, 2) + seq(0.01, 10, 0.01)
plot(x1, y1, xlim = c(-2, 12), ylim = c(-2, 12))
abline(lm(y1 ~ x1))
abline(0, 1, col = "pink")
x2 = rnorm(1000) + seq(0.01, 10, 0.01)
y2 = seq(-1, 0.998, 0.002) + seq(0.001, 1, 0.001)
plot(x2, y2, xlim = c(-2, 12), ylim = c(-2, 12), col = "red")
abline(lm(y2 ~ x2))
abline(0, 1, col = "pink")
Answer is the second plot!
par(mfrow = c(2, 1))
plot(x1, y1, main = paste("correlation = ", signif(cor(x1, y1))), xlim = c(-2,
12), ylim = c(-2, 12))
abline(lm(y1 ~ x1))
plot(x2, y2, main = paste("correlation = ", signif(cor(x2, y2))), xlim = c(-2,
12), ylim = c(-2, 12), col = "red")
abline(lm(y2 ~ x2))
Then, How about this?
par(mfrow = c(3, 1))
x3 = seq(-1, 0.998, 0.002) + seq(0.01, 10, 0.01)
y3 = rnorm(1000, 0, 0.01) + seq(0.01, 10, 0.01)
plot(x3, y3, xlim = c(-2, 12), ylim = c(-2, 12), main = paste("correlation =",
cor(x3, y3)))
abline(lm(y3 ~ x3))
x4 = seq(-1, 0.998, 0.002) + seq(0.01, 10, 0.01)
y4 = rnorm(1000, 0, 0.01) + seq(0.001, 1, 0.001)
plot(x4, y4, xlim = c(-2, 12), ylim = c(-2, 12), col = "red", main = paste("correlation =",
cor(x4, y4)))
abline(lm(y4 ~ x4))
x5 = seq(-1, 0.998, 0.002) + seq(0.01, 10, 0.01)
y5 = rnorm(1000, 0, 0.01)
plot(x5, y5, xlim = c(-2, 12), ylim = c(-2, 12), col = "blue", main = paste("correlation =",
cor(x5, y5)))
abline(lm(y5 ~ x5))
Then, Why the correlation of the third plot have almost zero?
What does the correlation stand for? slope? or density? Maybe density, which is the concentration degree of dots based on linear regression line.
In linear regression, the formula is y = b0 + b1 * x (b1 is sd(y)/sd(x)*cor(x,y))
In other word, correlation is the slope of lm(z-transformed y ~ z-transformed x)
Let's see..
par(mfrow = c(2, 1))
plot(x4, y4, main = paste(" Original plot (correlation =", cor(x4, y4), ")"))
abline(lm(y4 ~ x4))
abline(0, 1, col = "pink")
plot((x4 - mean(x4))/sd(x4), (y4 - mean(y4))/sd(y4), main = paste("Z-transformed x, y plot (correlation=",
cor(x4, y4), ")"))
abline((y4 - mean(y4))/sd(y4) ~ (x4 - mean(x4))/sd(x4))
abline(0, 1, col = "pink")
That's the reason for zero correlation value in horizontal line.