# linearity without the scatterplot

February 28, 2009 2:48 PM Subscribe

How do I check for linearity?

Is it possible, without ever drawing a scatterplot, or trying to interpret one, to know whether there's a linear relationship between two variables?

I have 2 columns in excel with numbers for the dependent and independent variable and I would like with 100% certainty to determine whether they have a linear relationship for purposes of regression.

Is it possible, without ever drawing a scatterplot, or trying to interpret one, to know whether there's a linear relationship between two variables?

I have 2 columns in excel with numbers for the dependent and independent variable and I would like with 100% certainty to determine whether they have a linear relationship for purposes of regression.

Well, and conceivably 0 is linear as well, but only if at least one of the columns only has one possible value.

posted by Flunkie at 3:05 PM on February 28, 2009

posted by Flunkie at 3:05 PM on February 28, 2009

Best answer: The correlation coefficient will tell you how strong a linear correlation is, and is commonly used for the purpose you mention. To be more correct, F-test can tell you how well a linear regression model fits your data.

There is no 100% certainty in statistics, but an F-test might reveal that the probability that the model does not fit well is, say, less than 10

Excel's statistics functions are notoriously buggy (web page down right now, see the Google cache).

posted by grouse at 3:09 PM on February 28, 2009 [1 favorite]

There is no 100% certainty in statistics, but an F-test might reveal that the probability that the model does not fit well is, say, less than 10

^{–16}.Excel's statistics functions are notoriously buggy (web page down right now, see the Google cache).

posted by grouse at 3:09 PM on February 28, 2009 [1 favorite]

Best answer: I don't know how set on using Excel you are, but here's a quick example in R. 'seq' just generates sequences 1,2,3,... for the first, and 0, -10, -20, ... for the second. 'lm' is linear model, y ~ x is 'y varies with x' (y is your data, x is your model), summary just gets a bunch of useful stuff about it.

R reads in .csv files, which can be exported from Excel very easily, with read.csv(). Just another option.

> x = seq(from=0,to=100,by=1)

> y = seq(from=0,to=-1000,by=-10)

> summary(lm(y~x))

Call:

lm(formula = y ~ x)

Residuals:

Min 1Q Median 3Q Max

-1.475e-13 -7.112e-14 -3.273e-14 1.264e-14 2.289e-12

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 6.335e-13 4.890e-14 1.296e+01 <2> x -1.000e+01 8.449e-16 -1.184e+16 <2> ---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.475e-13 on 99 degrees of freedom

Multiple R-squared: 1, Adjusted R-squared: 1

F-statistic: 1.401e+32 on 1 and 99 DF, p-value: < 2.2e-16

posted by devilsbrigade at 3:27 PM on February 28, 2009

R reads in .csv files, which can be exported from Excel very easily, with read.csv(). Just another option.

> x = seq(from=0,to=100,by=1)

> y = seq(from=0,to=-1000,by=-10)

> summary(lm(y~x))

Call:

lm(formula = y ~ x)

Residuals:

Min 1Q Median 3Q Max

-1.475e-13 -7.112e-14 -3.273e-14 1.264e-14 2.289e-12

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 6.335e-13 4.890e-14 1.296e+01 <2> x -1.000e+01 8.449e-16 -1.184e+16 <2> ---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.475e-13 on 99 degrees of freedom

Multiple R-squared: 1, Adjusted R-squared: 1

F-statistic: 1.401e+32 on 1 and 99 DF, p-value: < 2.2e-16

posted by devilsbrigade at 3:27 PM on February 28, 2009

In a correlation, +/- 1 would mean totally linear ***AND*** without error -- all the points fall on a line. If the relationship itself is linear but the data are noisy, then your correlation might be .8 or .7, depending on how big the error term is. And data from the real world are almost always pretty noisy, especially data that relate to human behavior.

Is there some pressing reason why you don't want to toss up a scatterplot? If you're not in excel, doing so is trivial. "scatter y x" in stata, or "plot(x,y)" in R. Up it pops, big as life and twice as ugly.

Likewise, is there a pressing reason why you need to check for linearity

posted by ROU_Xenophobe at 6:00 PM on February 28, 2009

Is there some pressing reason why you don't want to toss up a scatterplot? If you're not in excel, doing so is trivial. "scatter y x" in stata, or "plot(x,y)" in R. Up it pops, big as life and twice as ugly.

Likewise, is there a pressing reason why you need to check for linearity

*before*you run the regression instead of*by*running the regression or after running it by scatterplotting the residuals?posted by ROU_Xenophobe at 6:00 PM on February 28, 2009

Is it possible, without ever drawing a scatterplot, or trying to interpret one, to know whether there's a linear relationship between two variables?

Nope. You GOTTA look at the data. The numbers are meaningless. Check this out:

http://www.jerrydallal.com/LHSP/anscombe.htm

posted by tiburon at 7:54 PM on March 1, 2009 [1 favorite]

Nope. You GOTTA look at the data. The numbers are meaningless. Check this out:

http://www.jerrydallal.com/LHSP/anscombe.htm

posted by tiburon at 7:54 PM on March 1, 2009 [1 favorite]

This thread is closed to new comments.

posted by Flunkie at 3:01 PM on February 28, 2009