Errors and residuals in linear regression
up vote
0
down vote
favorite
I think in common literature about statstics the authors are often very imprecise when it comes to residuals and errors. So far, I could not work that difference out completely and therefore have several questions.
Setting:
Given simple linear regression,
$$ y_i = beta_0 + beta_1 x_i + epsilon_i$$
with the error terms being normally distributed, i.e., $epsilon_i$ ~ $N(0, sigma^2)$. In this setting we consider the $y_i$ and the errors $epsilon_i$ to be random variables, and the independent variables $x_i$ to be nonrandom variables. The parameters $beta_0$ and $beta_1$ are unknown and fixed.
Therefore we have $E[y_i] = beta_0 + beta_1 x_i$ and $Var[y_i] = sigma^2$, which means the $y_i$ are also normally distributed according to $y_i$~$N(beta_0 + beta_1 x_i, sigma^2)$.
The errors $epsilon_i$ are now defined as the deviations of the observations $y_i$ from the "true" (deterministic model) $E[y_i] = beta_0 + beta_1 x_i$, i.e.
$$ y_i - E[y_i] = (beta_0 + beta_1 x_i + epsilon_i)-(beta_0 + beta_1 x_i) = epsilon_i.$$
We now want to estimate the model $E[y_i] = beta_0 + beta_1 x_i$, as the coefficients $beta_0$ and $beta_1$ are unkown. We do so by
$$ hat{y} = hat{beta_0} + hat{beta_1} x_i.$$
The residuals, lets call them $delta_i$, are definded as the deviations of the observations $y_i$ from the estimated model $hat{y_i}$, i.e.
$$delta_i = y_i - hat{y_i}.$$
Now the questions:
1.) In least squares estimation some authors reduce the squared sum of errors (SSE), $sum epsilon_i^2$, and some reduce the residual sum of square (RSS), $sum delta_i^2$, which is obviously not the same thing. Some even write they reduce $sum left(y_i - hat{y_i}right)^2$ but then they reduce the SSE, which is not even consistent within their own chosen framework. What is now the correct procedure of least square estimation, reducing SSE or RSS?
2.) How comes that there are so many contradicting definitions of residuals, errors and least squares in the text books, and is there any one book that is precise and consistent?
3.) Can we say anything about the distribution of the residuals, i.e. are they also normally distributed and if so with which mean and variance?
4.) Some authors also write the estimated fromula as $ hat{y} = hat{beta_0} + hat{beta_1} x_i + hat{epsilon_i}$, where they depict the residuals $delta_i$ as $hat{epsilon_i}$. But I think this is a wrong formula because then $y_i - hat{y_i}$, which is the defintion of the residuals, leads to result that is different from the residuals. Is there any justification why some would write the residuals into the estimated formula?
statistics least-squares linear-regression
New contributor
add a comment |
up vote
0
down vote
favorite
I think in common literature about statstics the authors are often very imprecise when it comes to residuals and errors. So far, I could not work that difference out completely and therefore have several questions.
Setting:
Given simple linear regression,
$$ y_i = beta_0 + beta_1 x_i + epsilon_i$$
with the error terms being normally distributed, i.e., $epsilon_i$ ~ $N(0, sigma^2)$. In this setting we consider the $y_i$ and the errors $epsilon_i$ to be random variables, and the independent variables $x_i$ to be nonrandom variables. The parameters $beta_0$ and $beta_1$ are unknown and fixed.
Therefore we have $E[y_i] = beta_0 + beta_1 x_i$ and $Var[y_i] = sigma^2$, which means the $y_i$ are also normally distributed according to $y_i$~$N(beta_0 + beta_1 x_i, sigma^2)$.
The errors $epsilon_i$ are now defined as the deviations of the observations $y_i$ from the "true" (deterministic model) $E[y_i] = beta_0 + beta_1 x_i$, i.e.
$$ y_i - E[y_i] = (beta_0 + beta_1 x_i + epsilon_i)-(beta_0 + beta_1 x_i) = epsilon_i.$$
We now want to estimate the model $E[y_i] = beta_0 + beta_1 x_i$, as the coefficients $beta_0$ and $beta_1$ are unkown. We do so by
$$ hat{y} = hat{beta_0} + hat{beta_1} x_i.$$
The residuals, lets call them $delta_i$, are definded as the deviations of the observations $y_i$ from the estimated model $hat{y_i}$, i.e.
$$delta_i = y_i - hat{y_i}.$$
Now the questions:
1.) In least squares estimation some authors reduce the squared sum of errors (SSE), $sum epsilon_i^2$, and some reduce the residual sum of square (RSS), $sum delta_i^2$, which is obviously not the same thing. Some even write they reduce $sum left(y_i - hat{y_i}right)^2$ but then they reduce the SSE, which is not even consistent within their own chosen framework. What is now the correct procedure of least square estimation, reducing SSE or RSS?
2.) How comes that there are so many contradicting definitions of residuals, errors and least squares in the text books, and is there any one book that is precise and consistent?
3.) Can we say anything about the distribution of the residuals, i.e. are they also normally distributed and if so with which mean and variance?
4.) Some authors also write the estimated fromula as $ hat{y} = hat{beta_0} + hat{beta_1} x_i + hat{epsilon_i}$, where they depict the residuals $delta_i$ as $hat{epsilon_i}$. But I think this is a wrong formula because then $y_i - hat{y_i}$, which is the defintion of the residuals, leads to result that is different from the residuals. Is there any justification why some would write the residuals into the estimated formula?
statistics least-squares linear-regression
New contributor
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I think in common literature about statstics the authors are often very imprecise when it comes to residuals and errors. So far, I could not work that difference out completely and therefore have several questions.
Setting:
Given simple linear regression,
$$ y_i = beta_0 + beta_1 x_i + epsilon_i$$
with the error terms being normally distributed, i.e., $epsilon_i$ ~ $N(0, sigma^2)$. In this setting we consider the $y_i$ and the errors $epsilon_i$ to be random variables, and the independent variables $x_i$ to be nonrandom variables. The parameters $beta_0$ and $beta_1$ are unknown and fixed.
Therefore we have $E[y_i] = beta_0 + beta_1 x_i$ and $Var[y_i] = sigma^2$, which means the $y_i$ are also normally distributed according to $y_i$~$N(beta_0 + beta_1 x_i, sigma^2)$.
The errors $epsilon_i$ are now defined as the deviations of the observations $y_i$ from the "true" (deterministic model) $E[y_i] = beta_0 + beta_1 x_i$, i.e.
$$ y_i - E[y_i] = (beta_0 + beta_1 x_i + epsilon_i)-(beta_0 + beta_1 x_i) = epsilon_i.$$
We now want to estimate the model $E[y_i] = beta_0 + beta_1 x_i$, as the coefficients $beta_0$ and $beta_1$ are unkown. We do so by
$$ hat{y} = hat{beta_0} + hat{beta_1} x_i.$$
The residuals, lets call them $delta_i$, are definded as the deviations of the observations $y_i$ from the estimated model $hat{y_i}$, i.e.
$$delta_i = y_i - hat{y_i}.$$
Now the questions:
1.) In least squares estimation some authors reduce the squared sum of errors (SSE), $sum epsilon_i^2$, and some reduce the residual sum of square (RSS), $sum delta_i^2$, which is obviously not the same thing. Some even write they reduce $sum left(y_i - hat{y_i}right)^2$ but then they reduce the SSE, which is not even consistent within their own chosen framework. What is now the correct procedure of least square estimation, reducing SSE or RSS?
2.) How comes that there are so many contradicting definitions of residuals, errors and least squares in the text books, and is there any one book that is precise and consistent?
3.) Can we say anything about the distribution of the residuals, i.e. are they also normally distributed and if so with which mean and variance?
4.) Some authors also write the estimated fromula as $ hat{y} = hat{beta_0} + hat{beta_1} x_i + hat{epsilon_i}$, where they depict the residuals $delta_i$ as $hat{epsilon_i}$. But I think this is a wrong formula because then $y_i - hat{y_i}$, which is the defintion of the residuals, leads to result that is different from the residuals. Is there any justification why some would write the residuals into the estimated formula?
statistics least-squares linear-regression
New contributor
I think in common literature about statstics the authors are often very imprecise when it comes to residuals and errors. So far, I could not work that difference out completely and therefore have several questions.
Setting:
Given simple linear regression,
$$ y_i = beta_0 + beta_1 x_i + epsilon_i$$
with the error terms being normally distributed, i.e., $epsilon_i$ ~ $N(0, sigma^2)$. In this setting we consider the $y_i$ and the errors $epsilon_i$ to be random variables, and the independent variables $x_i$ to be nonrandom variables. The parameters $beta_0$ and $beta_1$ are unknown and fixed.
Therefore we have $E[y_i] = beta_0 + beta_1 x_i$ and $Var[y_i] = sigma^2$, which means the $y_i$ are also normally distributed according to $y_i$~$N(beta_0 + beta_1 x_i, sigma^2)$.
The errors $epsilon_i$ are now defined as the deviations of the observations $y_i$ from the "true" (deterministic model) $E[y_i] = beta_0 + beta_1 x_i$, i.e.
$$ y_i - E[y_i] = (beta_0 + beta_1 x_i + epsilon_i)-(beta_0 + beta_1 x_i) = epsilon_i.$$
We now want to estimate the model $E[y_i] = beta_0 + beta_1 x_i$, as the coefficients $beta_0$ and $beta_1$ are unkown. We do so by
$$ hat{y} = hat{beta_0} + hat{beta_1} x_i.$$
The residuals, lets call them $delta_i$, are definded as the deviations of the observations $y_i$ from the estimated model $hat{y_i}$, i.e.
$$delta_i = y_i - hat{y_i}.$$
Now the questions:
1.) In least squares estimation some authors reduce the squared sum of errors (SSE), $sum epsilon_i^2$, and some reduce the residual sum of square (RSS), $sum delta_i^2$, which is obviously not the same thing. Some even write they reduce $sum left(y_i - hat{y_i}right)^2$ but then they reduce the SSE, which is not even consistent within their own chosen framework. What is now the correct procedure of least square estimation, reducing SSE or RSS?
2.) How comes that there are so many contradicting definitions of residuals, errors and least squares in the text books, and is there any one book that is precise and consistent?
3.) Can we say anything about the distribution of the residuals, i.e. are they also normally distributed and if so with which mean and variance?
4.) Some authors also write the estimated fromula as $ hat{y} = hat{beta_0} + hat{beta_1} x_i + hat{epsilon_i}$, where they depict the residuals $delta_i$ as $hat{epsilon_i}$. But I think this is a wrong formula because then $y_i - hat{y_i}$, which is the defintion of the residuals, leads to result that is different from the residuals. Is there any justification why some would write the residuals into the estimated formula?
statistics least-squares linear-regression
statistics least-squares linear-regression
New contributor
New contributor
edited 17 hours ago
New contributor
asked yesterday
guest1
123
123
New contributor
New contributor
add a comment |
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
guest1 is a new contributor. Be nice, and check out our Code of Conduct.
guest1 is a new contributor. Be nice, and check out our Code of Conduct.
guest1 is a new contributor. Be nice, and check out our Code of Conduct.
guest1 is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2996575%2ferrors-and-residuals-in-linear-regression%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password