Why the mean is the best estimator in linear regression?

It might sounds obvious, at first glance, to use mean in most of linear regression we are using and treated outliers in order to avoid the pitfalls of this measure. The use of linear regression has become so mainstream in human and social sciences that this practice seems cast in stone. However, the initial reasons of this is quite straightforward and simple but somehow unknown to some of us.

Image result for linear regression  cartoon

Since your goal is to represent the best way possible your data without over- or underfitting them if possible (see https://thinkaftertank.wordpress.com/2015/07/31/modevol-i-introduction-hawk-dove-game/), you want to minimize the error through the residual sum of squares or RSS which is the sum of the squares of residuals (the deviations predicted from your empirical values) in a simple model. But how do we know that the mean indeed produced the smallest value of RSS when compared to the Mode or the Median?

By an absurd demonstration, consider Y*, the best estimator to reduce RSS, as different of Y` (the mean).


Now, if we want the RSS to be as small as possible, we can play only with the second term. Being a squared, it would de facto be positive, only Y*=Y` can equal O. Any other choices would produce a bigger RSS. In a simple model, the mean is thus the best estimator to reduce the RSS.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s