It might sounds obvious, at first glance, to use mean in most of linear regression we are using and treated outliers in order to avoid the pitfalls of this measure. The use of linear regression has become so mainstream in human and social sciences that this practice seems cast in stone. However, the initial reasons of this is quite straightforward and simple but somehow unknown to some of us.
Since your goal is to represent the best way possible your data without over- or underfitting them if possible (see https://thinkaftertank.wordpress.com/2015/07/31/modevol-i-introduction-hawk-dove-game/), you want to minimize the error through the residual sum of squares or RSS which is the sum of the squares of residuals (the deviations predicted from your empirical values) in a simple model. But how do we know that the mean indeed produced the smallest value of RSS when compared to the Mode or the Median?
By an absurd demonstration, consider Y*, the best estimator to reduce RSS, as different of Y` (the mean).
Now, if we want the RSS to be as small as possible, we can play only with the second term. Being a squared, it would de facto be positive, only Y*=Y` can equal O. Any other choices would produce a bigger RSS. In a simple model, the mean is thus the best estimator to reduce the RSS.