What does this expected value notation mean?
From: Learning from Data, section 2.3.1 - Bias and Variance:
Let $f : X rightarrow Y$, and let $D={(x,y=f(x)) : x in A subseteq X}$ where each $x in A$ is chosen independently with distribution $P(X)$. Assume we've chosen some function $g^D : X rightarrow Y$ to approximate $f$ on $D$ with some error function.
Define $E_{text{out}}(g^{D}) = Bbb E_x[(g^{D}(x)-f(x))^2]$.
$(1.)$ What does $Bbb E_x$ mean?
I understand the definition of an expected value of a random variable $$E[R] = sum_{r in R}P(R=r)$$ or vector (length $N$)$$ E[bar R] = [sum_{r_j in R_k} P(R_k = r_j)]_{k=0, 1, dots, N-1}$$
But what does the notation $Bbb E_x$ mean in this context?
$(2.)$ What does $Bbb E_D[E_{out}(g^{D})]$
The book says it's the "expectation with respect to all data sets", but what does that mean? Expectations are operators on random variables. What is the random variable here? And how would I use proper notation to describe this?
As in $Bbb E_D[E_{out}(g^{D})] = sum_{?}?P(?)$
probability notation random-variables definition machine-learning
add a comment |
From: Learning from Data, section 2.3.1 - Bias and Variance:
Let $f : X rightarrow Y$, and let $D={(x,y=f(x)) : x in A subseteq X}$ where each $x in A$ is chosen independently with distribution $P(X)$. Assume we've chosen some function $g^D : X rightarrow Y$ to approximate $f$ on $D$ with some error function.
Define $E_{text{out}}(g^{D}) = Bbb E_x[(g^{D}(x)-f(x))^2]$.
$(1.)$ What does $Bbb E_x$ mean?
I understand the definition of an expected value of a random variable $$E[R] = sum_{r in R}P(R=r)$$ or vector (length $N$)$$ E[bar R] = [sum_{r_j in R_k} P(R_k = r_j)]_{k=0, 1, dots, N-1}$$
But what does the notation $Bbb E_x$ mean in this context?
$(2.)$ What does $Bbb E_D[E_{out}(g^{D})]$
The book says it's the "expectation with respect to all data sets", but what does that mean? Expectations are operators on random variables. What is the random variable here? And how would I use proper notation to describe this?
As in $Bbb E_D[E_{out}(g^{D})] = sum_{?}?P(?)$
probability notation random-variables definition machine-learning
add a comment |
From: Learning from Data, section 2.3.1 - Bias and Variance:
Let $f : X rightarrow Y$, and let $D={(x,y=f(x)) : x in A subseteq X}$ where each $x in A$ is chosen independently with distribution $P(X)$. Assume we've chosen some function $g^D : X rightarrow Y$ to approximate $f$ on $D$ with some error function.
Define $E_{text{out}}(g^{D}) = Bbb E_x[(g^{D}(x)-f(x))^2]$.
$(1.)$ What does $Bbb E_x$ mean?
I understand the definition of an expected value of a random variable $$E[R] = sum_{r in R}P(R=r)$$ or vector (length $N$)$$ E[bar R] = [sum_{r_j in R_k} P(R_k = r_j)]_{k=0, 1, dots, N-1}$$
But what does the notation $Bbb E_x$ mean in this context?
$(2.)$ What does $Bbb E_D[E_{out}(g^{D})]$
The book says it's the "expectation with respect to all data sets", but what does that mean? Expectations are operators on random variables. What is the random variable here? And how would I use proper notation to describe this?
As in $Bbb E_D[E_{out}(g^{D})] = sum_{?}?P(?)$
probability notation random-variables definition machine-learning
From: Learning from Data, section 2.3.1 - Bias and Variance:
Let $f : X rightarrow Y$, and let $D={(x,y=f(x)) : x in A subseteq X}$ where each $x in A$ is chosen independently with distribution $P(X)$. Assume we've chosen some function $g^D : X rightarrow Y$ to approximate $f$ on $D$ with some error function.
Define $E_{text{out}}(g^{D}) = Bbb E_x[(g^{D}(x)-f(x))^2]$.
$(1.)$ What does $Bbb E_x$ mean?
I understand the definition of an expected value of a random variable $$E[R] = sum_{r in R}P(R=r)$$ or vector (length $N$)$$ E[bar R] = [sum_{r_j in R_k} P(R_k = r_j)]_{k=0, 1, dots, N-1}$$
But what does the notation $Bbb E_x$ mean in this context?
$(2.)$ What does $Bbb E_D[E_{out}(g^{D})]$
The book says it's the "expectation with respect to all data sets", but what does that mean? Expectations are operators on random variables. What is the random variable here? And how would I use proper notation to describe this?
As in $Bbb E_D[E_{out}(g^{D})] = sum_{?}?P(?)$
probability notation random-variables definition machine-learning
probability notation random-variables definition machine-learning
asked Nov 23 at 18:38
Oliver G
1,5831529
1,5831529
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$(1.)$ What does $ E_x$ mean?
This means the expectation of the quantity in the brackets, with respect to $x in X$ drawn from the probability distribution $P(X)$. I.e., as an integral:
$$ int_X (g^D(x) - f(x))^2 p(x) dx $$
Where $p(x)$ is the density function of the distribution $P(X)$. This is the quantity often estimated from a sample with the in sample error of $g^D$:
$$ frac{1}{n} sum_i (g^D(x_i) - y_i)^2 $$
$(2.)$ What does $ E_D [ E_{out} [ g^D ]]$ mean?
The data set $D$ is random here. That is, we treat the data set we use to train our predictive model as random, and are averaging over all the possible training data sets according to their distribution.
$E_{out}$ stands for the average error across all random data sets, so there are really two random data sets being averaged over independently in this calculation:
$D$, the training data set, used to construct $g^D$.- An unnamed one averaged over in $E_{out}$, the testing data set.
That notation is indicating the average test error averaged across all possible training data sets. This is the quantity estimated with cross validation.
For (2), what is the random variable here and how would it be defined in proper notation? I understand now that $Bbb E_x$ just refers to $x in X$ where $X$ has the distribution, but then $Bbb E_D$ refers to $D in (?)$ where $(?)$ has a distribution. What is $(?)$ and what is its distribution?
– Oliver G
Nov 24 at 14:49
The D is a training data set. Each $(X, y)$ pair has a joint distribution, and can be sampled from. For example, if each $(X, y)$ pair is independent from one another, then $D$ is just a random assemblage of $(X, y)$'s, each taken independently of one another. If there are dependencies between $(X,y)$ pairs, for example in time series modeling, then you are sampling from the joint distribution of some number of $(X, y)$ pairs. I'm sorry, I don't know what you mean by "proper notation", so I hope that clarifies.
– Matthew Drury
Nov 24 at 17:49
add a comment |
The $x$ in "$mathbb{E}_x$" denotes that the expectation is taken over the random variable $x$. In particular, you should view $g^D$ as a deterministic function here, and thus $(g^D - f)^2$ is also a deterministic function. If you plug in a single random variable $x sim P(X)$, then you can take the expectation with respect to this random variable: this is $mathbb{E}_x [(g^D(x) - f(x))^2]$.
This defines a number $E_{out}(g^D)$, under the assumption that $g^D$ is a fixed deterministic function. However, if we reinterpret $g^D$ as a random function that depends on the random dataset $D$, then $E_{out}(g^D)$ becomes a random variable, and you can take the expectation over this random variable: this is $mathbb{E}_D[E_{out}(g^D)]$.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3010692%2fwhat-does-this-expected-value-notation-mean%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$(1.)$ What does $ E_x$ mean?
This means the expectation of the quantity in the brackets, with respect to $x in X$ drawn from the probability distribution $P(X)$. I.e., as an integral:
$$ int_X (g^D(x) - f(x))^2 p(x) dx $$
Where $p(x)$ is the density function of the distribution $P(X)$. This is the quantity often estimated from a sample with the in sample error of $g^D$:
$$ frac{1}{n} sum_i (g^D(x_i) - y_i)^2 $$
$(2.)$ What does $ E_D [ E_{out} [ g^D ]]$ mean?
The data set $D$ is random here. That is, we treat the data set we use to train our predictive model as random, and are averaging over all the possible training data sets according to their distribution.
$E_{out}$ stands for the average error across all random data sets, so there are really two random data sets being averaged over independently in this calculation:
$D$, the training data set, used to construct $g^D$.- An unnamed one averaged over in $E_{out}$, the testing data set.
That notation is indicating the average test error averaged across all possible training data sets. This is the quantity estimated with cross validation.
For (2), what is the random variable here and how would it be defined in proper notation? I understand now that $Bbb E_x$ just refers to $x in X$ where $X$ has the distribution, but then $Bbb E_D$ refers to $D in (?)$ where $(?)$ has a distribution. What is $(?)$ and what is its distribution?
– Oliver G
Nov 24 at 14:49
The D is a training data set. Each $(X, y)$ pair has a joint distribution, and can be sampled from. For example, if each $(X, y)$ pair is independent from one another, then $D$ is just a random assemblage of $(X, y)$'s, each taken independently of one another. If there are dependencies between $(X,y)$ pairs, for example in time series modeling, then you are sampling from the joint distribution of some number of $(X, y)$ pairs. I'm sorry, I don't know what you mean by "proper notation", so I hope that clarifies.
– Matthew Drury
Nov 24 at 17:49
add a comment |
$(1.)$ What does $ E_x$ mean?
This means the expectation of the quantity in the brackets, with respect to $x in X$ drawn from the probability distribution $P(X)$. I.e., as an integral:
$$ int_X (g^D(x) - f(x))^2 p(x) dx $$
Where $p(x)$ is the density function of the distribution $P(X)$. This is the quantity often estimated from a sample with the in sample error of $g^D$:
$$ frac{1}{n} sum_i (g^D(x_i) - y_i)^2 $$
$(2.)$ What does $ E_D [ E_{out} [ g^D ]]$ mean?
The data set $D$ is random here. That is, we treat the data set we use to train our predictive model as random, and are averaging over all the possible training data sets according to their distribution.
$E_{out}$ stands for the average error across all random data sets, so there are really two random data sets being averaged over independently in this calculation:
$D$, the training data set, used to construct $g^D$.- An unnamed one averaged over in $E_{out}$, the testing data set.
That notation is indicating the average test error averaged across all possible training data sets. This is the quantity estimated with cross validation.
For (2), what is the random variable here and how would it be defined in proper notation? I understand now that $Bbb E_x$ just refers to $x in X$ where $X$ has the distribution, but then $Bbb E_D$ refers to $D in (?)$ where $(?)$ has a distribution. What is $(?)$ and what is its distribution?
– Oliver G
Nov 24 at 14:49
The D is a training data set. Each $(X, y)$ pair has a joint distribution, and can be sampled from. For example, if each $(X, y)$ pair is independent from one another, then $D$ is just a random assemblage of $(X, y)$'s, each taken independently of one another. If there are dependencies between $(X,y)$ pairs, for example in time series modeling, then you are sampling from the joint distribution of some number of $(X, y)$ pairs. I'm sorry, I don't know what you mean by "proper notation", so I hope that clarifies.
– Matthew Drury
Nov 24 at 17:49
add a comment |
$(1.)$ What does $ E_x$ mean?
This means the expectation of the quantity in the brackets, with respect to $x in X$ drawn from the probability distribution $P(X)$. I.e., as an integral:
$$ int_X (g^D(x) - f(x))^2 p(x) dx $$
Where $p(x)$ is the density function of the distribution $P(X)$. This is the quantity often estimated from a sample with the in sample error of $g^D$:
$$ frac{1}{n} sum_i (g^D(x_i) - y_i)^2 $$
$(2.)$ What does $ E_D [ E_{out} [ g^D ]]$ mean?
The data set $D$ is random here. That is, we treat the data set we use to train our predictive model as random, and are averaging over all the possible training data sets according to their distribution.
$E_{out}$ stands for the average error across all random data sets, so there are really two random data sets being averaged over independently in this calculation:
$D$, the training data set, used to construct $g^D$.- An unnamed one averaged over in $E_{out}$, the testing data set.
That notation is indicating the average test error averaged across all possible training data sets. This is the quantity estimated with cross validation.
$(1.)$ What does $ E_x$ mean?
This means the expectation of the quantity in the brackets, with respect to $x in X$ drawn from the probability distribution $P(X)$. I.e., as an integral:
$$ int_X (g^D(x) - f(x))^2 p(x) dx $$
Where $p(x)$ is the density function of the distribution $P(X)$. This is the quantity often estimated from a sample with the in sample error of $g^D$:
$$ frac{1}{n} sum_i (g^D(x_i) - y_i)^2 $$
$(2.)$ What does $ E_D [ E_{out} [ g^D ]]$ mean?
The data set $D$ is random here. That is, we treat the data set we use to train our predictive model as random, and are averaging over all the possible training data sets according to their distribution.
$E_{out}$ stands for the average error across all random data sets, so there are really two random data sets being averaged over independently in this calculation:
$D$, the training data set, used to construct $g^D$.- An unnamed one averaged over in $E_{out}$, the testing data set.
That notation is indicating the average test error averaged across all possible training data sets. This is the quantity estimated with cross validation.
answered Nov 23 at 18:48
Matthew Drury
18235
18235
For (2), what is the random variable here and how would it be defined in proper notation? I understand now that $Bbb E_x$ just refers to $x in X$ where $X$ has the distribution, but then $Bbb E_D$ refers to $D in (?)$ where $(?)$ has a distribution. What is $(?)$ and what is its distribution?
– Oliver G
Nov 24 at 14:49
The D is a training data set. Each $(X, y)$ pair has a joint distribution, and can be sampled from. For example, if each $(X, y)$ pair is independent from one another, then $D$ is just a random assemblage of $(X, y)$'s, each taken independently of one another. If there are dependencies between $(X,y)$ pairs, for example in time series modeling, then you are sampling from the joint distribution of some number of $(X, y)$ pairs. I'm sorry, I don't know what you mean by "proper notation", so I hope that clarifies.
– Matthew Drury
Nov 24 at 17:49
add a comment |
For (2), what is the random variable here and how would it be defined in proper notation? I understand now that $Bbb E_x$ just refers to $x in X$ where $X$ has the distribution, but then $Bbb E_D$ refers to $D in (?)$ where $(?)$ has a distribution. What is $(?)$ and what is its distribution?
– Oliver G
Nov 24 at 14:49
The D is a training data set. Each $(X, y)$ pair has a joint distribution, and can be sampled from. For example, if each $(X, y)$ pair is independent from one another, then $D$ is just a random assemblage of $(X, y)$'s, each taken independently of one another. If there are dependencies between $(X,y)$ pairs, for example in time series modeling, then you are sampling from the joint distribution of some number of $(X, y)$ pairs. I'm sorry, I don't know what you mean by "proper notation", so I hope that clarifies.
– Matthew Drury
Nov 24 at 17:49
For (2), what is the random variable here and how would it be defined in proper notation? I understand now that $Bbb E_x$ just refers to $x in X$ where $X$ has the distribution, but then $Bbb E_D$ refers to $D in (?)$ where $(?)$ has a distribution. What is $(?)$ and what is its distribution?
– Oliver G
Nov 24 at 14:49
For (2), what is the random variable here and how would it be defined in proper notation? I understand now that $Bbb E_x$ just refers to $x in X$ where $X$ has the distribution, but then $Bbb E_D$ refers to $D in (?)$ where $(?)$ has a distribution. What is $(?)$ and what is its distribution?
– Oliver G
Nov 24 at 14:49
The D is a training data set. Each $(X, y)$ pair has a joint distribution, and can be sampled from. For example, if each $(X, y)$ pair is independent from one another, then $D$ is just a random assemblage of $(X, y)$'s, each taken independently of one another. If there are dependencies between $(X,y)$ pairs, for example in time series modeling, then you are sampling from the joint distribution of some number of $(X, y)$ pairs. I'm sorry, I don't know what you mean by "proper notation", so I hope that clarifies.
– Matthew Drury
Nov 24 at 17:49
The D is a training data set. Each $(X, y)$ pair has a joint distribution, and can be sampled from. For example, if each $(X, y)$ pair is independent from one another, then $D$ is just a random assemblage of $(X, y)$'s, each taken independently of one another. If there are dependencies between $(X,y)$ pairs, for example in time series modeling, then you are sampling from the joint distribution of some number of $(X, y)$ pairs. I'm sorry, I don't know what you mean by "proper notation", so I hope that clarifies.
– Matthew Drury
Nov 24 at 17:49
add a comment |
The $x$ in "$mathbb{E}_x$" denotes that the expectation is taken over the random variable $x$. In particular, you should view $g^D$ as a deterministic function here, and thus $(g^D - f)^2$ is also a deterministic function. If you plug in a single random variable $x sim P(X)$, then you can take the expectation with respect to this random variable: this is $mathbb{E}_x [(g^D(x) - f(x))^2]$.
This defines a number $E_{out}(g^D)$, under the assumption that $g^D$ is a fixed deterministic function. However, if we reinterpret $g^D$ as a random function that depends on the random dataset $D$, then $E_{out}(g^D)$ becomes a random variable, and you can take the expectation over this random variable: this is $mathbb{E}_D[E_{out}(g^D)]$.
add a comment |
The $x$ in "$mathbb{E}_x$" denotes that the expectation is taken over the random variable $x$. In particular, you should view $g^D$ as a deterministic function here, and thus $(g^D - f)^2$ is also a deterministic function. If you plug in a single random variable $x sim P(X)$, then you can take the expectation with respect to this random variable: this is $mathbb{E}_x [(g^D(x) - f(x))^2]$.
This defines a number $E_{out}(g^D)$, under the assumption that $g^D$ is a fixed deterministic function. However, if we reinterpret $g^D$ as a random function that depends on the random dataset $D$, then $E_{out}(g^D)$ becomes a random variable, and you can take the expectation over this random variable: this is $mathbb{E}_D[E_{out}(g^D)]$.
add a comment |
The $x$ in "$mathbb{E}_x$" denotes that the expectation is taken over the random variable $x$. In particular, you should view $g^D$ as a deterministic function here, and thus $(g^D - f)^2$ is also a deterministic function. If you plug in a single random variable $x sim P(X)$, then you can take the expectation with respect to this random variable: this is $mathbb{E}_x [(g^D(x) - f(x))^2]$.
This defines a number $E_{out}(g^D)$, under the assumption that $g^D$ is a fixed deterministic function. However, if we reinterpret $g^D$ as a random function that depends on the random dataset $D$, then $E_{out}(g^D)$ becomes a random variable, and you can take the expectation over this random variable: this is $mathbb{E}_D[E_{out}(g^D)]$.
The $x$ in "$mathbb{E}_x$" denotes that the expectation is taken over the random variable $x$. In particular, you should view $g^D$ as a deterministic function here, and thus $(g^D - f)^2$ is also a deterministic function. If you plug in a single random variable $x sim P(X)$, then you can take the expectation with respect to this random variable: this is $mathbb{E}_x [(g^D(x) - f(x))^2]$.
This defines a number $E_{out}(g^D)$, under the assumption that $g^D$ is a fixed deterministic function. However, if we reinterpret $g^D$ as a random function that depends on the random dataset $D$, then $E_{out}(g^D)$ becomes a random variable, and you can take the expectation over this random variable: this is $mathbb{E}_D[E_{out}(g^D)]$.
answered Nov 23 at 18:52
angryavian
38.7k23180
38.7k23180
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3010692%2fwhat-does-this-expected-value-notation-mean%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown