The name ‘simple linear regression’ can be deceiving, simple here is supposed to mean that there is one dependent or response variable and one independent or explanatory variable whose influence on the dependent variable is being assessed. The fact that there are only two variables involved does not necessarily make it simple, at least learning from the common mistakes that individuals commit in the process. So, here are some steps you should follow when performing a simple linear regression.

  1. Draw the scatterplot. This is an important exploratory step. It helps you to judge whether there is a linear or non-linear relationship between the two variables and spot nonconformities to the pattern for some pairs of observations. In case of a non-linear pattern in the scatter plot a transformation of one or both the variables might work.  You may consider removing outliers or conducting robust regression, this guide is helpful.
  2. Fit the model. Once convinced that the linear model could describe the observed relationship between variables. Usually an ordinary least-squares method is used to fit the line of best fit.
  3. Check the assumptions. Are the residuals normally distributed? You can use a histogram of the residuals and check whether they are symmetrical around a mean of zero or use a normal Q-Q plot. Are the residuals correlated with the explanatory variable? Under a linear regression there should be no correlation, so plot the residual versus predictor/independent variable scatter and observe that the points scatter along a horizontal – zero gradient- line. Are the sizes of the residuals dependent on the magnitude of the independent variable? The residuals should be homoscedastic, so check that the plot of residuals versus independent variable does not show a pattern that funnels out or in. See figure 1 below. If the assumptions of the model appear not to be met, a transformation may be necessary, or a different model altogether.
    Sample of plot showing homoschedastic and heretoschedastic residuals

    Figure 1: Sample of plot showing homoskedastic and heretoskedastic residuals

  4. Report. When the model meets the assumptions you can be comfortable to write it up! As a general practice the coefficients, an intercept and a gradient the simple case, and the standard errors, confidence intervals, R-squared and a significance test to determine if the explanatory variable is a significant predictor of the dependent variable are reported.

There you go! But don’t forget if your variables are correlated, however strongly, it doesn’t not men the independent variable causes the dependent variable!

Happy analyzing!!