What does this expected value notation mean?












1














From: Learning from Data, section 2.3.1 - Bias and Variance:



Let $f : X rightarrow Y$, and let $D={(x,y=f(x)) : x in A subseteq X}$ where each $x in A$ is chosen independently with distribution $P(X)$. Assume we've chosen some function $g^D : X rightarrow Y$ to approximate $f$ on $D$ with some error function.



Define $E_{text{out}}(g^{D}) = Bbb E_x[(g^{D}(x)-f(x))^2]$.



$(1.)$ What does $Bbb E_x$ mean?



I understand the definition of an expected value of a random variable $$E[R] = sum_{r in R}P(R=r)$$ or vector (length $N$)$$ E[bar R] = [sum_{r_j in R_k} P(R_k = r_j)]_{k=0, 1, dots, N-1}$$



But what does the notation $Bbb E_x$ mean in this context?



$(2.)$ What does $Bbb E_D[E_{out}(g^{D})]$



The book says it's the "expectation with respect to all data sets", but what does that mean? Expectations are operators on random variables. What is the random variable here? And how would I use proper notation to describe this?



As in $Bbb E_D[E_{out}(g^{D})] = sum_{?}?P(?)$










share|cite|improve this question



























    1














    From: Learning from Data, section 2.3.1 - Bias and Variance:



    Let $f : X rightarrow Y$, and let $D={(x,y=f(x)) : x in A subseteq X}$ where each $x in A$ is chosen independently with distribution $P(X)$. Assume we've chosen some function $g^D : X rightarrow Y$ to approximate $f$ on $D$ with some error function.



    Define $E_{text{out}}(g^{D}) = Bbb E_x[(g^{D}(x)-f(x))^2]$.



    $(1.)$ What does $Bbb E_x$ mean?



    I understand the definition of an expected value of a random variable $$E[R] = sum_{r in R}P(R=r)$$ or vector (length $N$)$$ E[bar R] = [sum_{r_j in R_k} P(R_k = r_j)]_{k=0, 1, dots, N-1}$$



    But what does the notation $Bbb E_x$ mean in this context?



    $(2.)$ What does $Bbb E_D[E_{out}(g^{D})]$



    The book says it's the "expectation with respect to all data sets", but what does that mean? Expectations are operators on random variables. What is the random variable here? And how would I use proper notation to describe this?



    As in $Bbb E_D[E_{out}(g^{D})] = sum_{?}?P(?)$










    share|cite|improve this question

























      1












      1








      1







      From: Learning from Data, section 2.3.1 - Bias and Variance:



      Let $f : X rightarrow Y$, and let $D={(x,y=f(x)) : x in A subseteq X}$ where each $x in A$ is chosen independently with distribution $P(X)$. Assume we've chosen some function $g^D : X rightarrow Y$ to approximate $f$ on $D$ with some error function.



      Define $E_{text{out}}(g^{D}) = Bbb E_x[(g^{D}(x)-f(x))^2]$.



      $(1.)$ What does $Bbb E_x$ mean?



      I understand the definition of an expected value of a random variable $$E[R] = sum_{r in R}P(R=r)$$ or vector (length $N$)$$ E[bar R] = [sum_{r_j in R_k} P(R_k = r_j)]_{k=0, 1, dots, N-1}$$



      But what does the notation $Bbb E_x$ mean in this context?



      $(2.)$ What does $Bbb E_D[E_{out}(g^{D})]$



      The book says it's the "expectation with respect to all data sets", but what does that mean? Expectations are operators on random variables. What is the random variable here? And how would I use proper notation to describe this?



      As in $Bbb E_D[E_{out}(g^{D})] = sum_{?}?P(?)$










      share|cite|improve this question













      From: Learning from Data, section 2.3.1 - Bias and Variance:



      Let $f : X rightarrow Y$, and let $D={(x,y=f(x)) : x in A subseteq X}$ where each $x in A$ is chosen independently with distribution $P(X)$. Assume we've chosen some function $g^D : X rightarrow Y$ to approximate $f$ on $D$ with some error function.



      Define $E_{text{out}}(g^{D}) = Bbb E_x[(g^{D}(x)-f(x))^2]$.



      $(1.)$ What does $Bbb E_x$ mean?



      I understand the definition of an expected value of a random variable $$E[R] = sum_{r in R}P(R=r)$$ or vector (length $N$)$$ E[bar R] = [sum_{r_j in R_k} P(R_k = r_j)]_{k=0, 1, dots, N-1}$$



      But what does the notation $Bbb E_x$ mean in this context?



      $(2.)$ What does $Bbb E_D[E_{out}(g^{D})]$



      The book says it's the "expectation with respect to all data sets", but what does that mean? Expectations are operators on random variables. What is the random variable here? And how would I use proper notation to describe this?



      As in $Bbb E_D[E_{out}(g^{D})] = sum_{?}?P(?)$







      probability notation random-variables definition machine-learning






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Nov 23 at 18:38









      Oliver G

      1,5831529




      1,5831529






















          2 Answers
          2






          active

          oldest

          votes


















          0














          $(1.)$ What does $ E_x$ mean?



          This means the expectation of the quantity in the brackets, with respect to $x in X$ drawn from the probability distribution $P(X)$. I.e., as an integral:



          $$ int_X (g^D(x) - f(x))^2 p(x) dx $$



          Where $p(x)$ is the density function of the distribution $P(X)$. This is the quantity often estimated from a sample with the in sample error of $g^D$:



          $$ frac{1}{n} sum_i (g^D(x_i) - y_i)^2 $$



          $(2.)$ What does $ E_D [ E_{out} [ g^D ]]$ mean?



          The data set $D$ is random here. That is, we treat the data set we use to train our predictive model as random, and are averaging over all the possible training data sets according to their distribution.



          $E_{out}$ stands for the average error across all random data sets, so there are really two random data sets being averaged over independently in this calculation:





          • $D$, the training data set, used to construct $g^D$.

          • An unnamed one averaged over in $E_{out}$, the testing data set.


          That notation is indicating the average test error averaged across all possible training data sets. This is the quantity estimated with cross validation.






          share|cite|improve this answer





















          • For (2), what is the random variable here and how would it be defined in proper notation? I understand now that $Bbb E_x$ just refers to $x in X$ where $X$ has the distribution, but then $Bbb E_D$ refers to $D in (?)$ where $(?)$ has a distribution. What is $(?)$ and what is its distribution?
            – Oliver G
            Nov 24 at 14:49












          • The D is a training data set. Each $(X, y)$ pair has a joint distribution, and can be sampled from. For example, if each $(X, y)$ pair is independent from one another, then $D$ is just a random assemblage of $(X, y)$'s, each taken independently of one another. If there are dependencies between $(X,y)$ pairs, for example in time series modeling, then you are sampling from the joint distribution of some number of $(X, y)$ pairs. I'm sorry, I don't know what you mean by "proper notation", so I hope that clarifies.
            – Matthew Drury
            Nov 24 at 17:49



















          0














          The $x$ in "$mathbb{E}_x$" denotes that the expectation is taken over the random variable $x$. In particular, you should view $g^D$ as a deterministic function here, and thus $(g^D - f)^2$ is also a deterministic function. If you plug in a single random variable $x sim P(X)$, then you can take the expectation with respect to this random variable: this is $mathbb{E}_x [(g^D(x) - f(x))^2]$.



          This defines a number $E_{out}(g^D)$, under the assumption that $g^D$ is a fixed deterministic function. However, if we reinterpret $g^D$ as a random function that depends on the random dataset $D$, then $E_{out}(g^D)$ becomes a random variable, and you can take the expectation over this random variable: this is $mathbb{E}_D[E_{out}(g^D)]$.






          share|cite|improve this answer





















            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "69"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3010692%2fwhat-does-this-expected-value-notation-mean%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            $(1.)$ What does $ E_x$ mean?



            This means the expectation of the quantity in the brackets, with respect to $x in X$ drawn from the probability distribution $P(X)$. I.e., as an integral:



            $$ int_X (g^D(x) - f(x))^2 p(x) dx $$



            Where $p(x)$ is the density function of the distribution $P(X)$. This is the quantity often estimated from a sample with the in sample error of $g^D$:



            $$ frac{1}{n} sum_i (g^D(x_i) - y_i)^2 $$



            $(2.)$ What does $ E_D [ E_{out} [ g^D ]]$ mean?



            The data set $D$ is random here. That is, we treat the data set we use to train our predictive model as random, and are averaging over all the possible training data sets according to their distribution.



            $E_{out}$ stands for the average error across all random data sets, so there are really two random data sets being averaged over independently in this calculation:





            • $D$, the training data set, used to construct $g^D$.

            • An unnamed one averaged over in $E_{out}$, the testing data set.


            That notation is indicating the average test error averaged across all possible training data sets. This is the quantity estimated with cross validation.






            share|cite|improve this answer





















            • For (2), what is the random variable here and how would it be defined in proper notation? I understand now that $Bbb E_x$ just refers to $x in X$ where $X$ has the distribution, but then $Bbb E_D$ refers to $D in (?)$ where $(?)$ has a distribution. What is $(?)$ and what is its distribution?
              – Oliver G
              Nov 24 at 14:49












            • The D is a training data set. Each $(X, y)$ pair has a joint distribution, and can be sampled from. For example, if each $(X, y)$ pair is independent from one another, then $D$ is just a random assemblage of $(X, y)$'s, each taken independently of one another. If there are dependencies between $(X,y)$ pairs, for example in time series modeling, then you are sampling from the joint distribution of some number of $(X, y)$ pairs. I'm sorry, I don't know what you mean by "proper notation", so I hope that clarifies.
              – Matthew Drury
              Nov 24 at 17:49
















            0














            $(1.)$ What does $ E_x$ mean?



            This means the expectation of the quantity in the brackets, with respect to $x in X$ drawn from the probability distribution $P(X)$. I.e., as an integral:



            $$ int_X (g^D(x) - f(x))^2 p(x) dx $$



            Where $p(x)$ is the density function of the distribution $P(X)$. This is the quantity often estimated from a sample with the in sample error of $g^D$:



            $$ frac{1}{n} sum_i (g^D(x_i) - y_i)^2 $$



            $(2.)$ What does $ E_D [ E_{out} [ g^D ]]$ mean?



            The data set $D$ is random here. That is, we treat the data set we use to train our predictive model as random, and are averaging over all the possible training data sets according to their distribution.



            $E_{out}$ stands for the average error across all random data sets, so there are really two random data sets being averaged over independently in this calculation:





            • $D$, the training data set, used to construct $g^D$.

            • An unnamed one averaged over in $E_{out}$, the testing data set.


            That notation is indicating the average test error averaged across all possible training data sets. This is the quantity estimated with cross validation.






            share|cite|improve this answer





















            • For (2), what is the random variable here and how would it be defined in proper notation? I understand now that $Bbb E_x$ just refers to $x in X$ where $X$ has the distribution, but then $Bbb E_D$ refers to $D in (?)$ where $(?)$ has a distribution. What is $(?)$ and what is its distribution?
              – Oliver G
              Nov 24 at 14:49












            • The D is a training data set. Each $(X, y)$ pair has a joint distribution, and can be sampled from. For example, if each $(X, y)$ pair is independent from one another, then $D$ is just a random assemblage of $(X, y)$'s, each taken independently of one another. If there are dependencies between $(X,y)$ pairs, for example in time series modeling, then you are sampling from the joint distribution of some number of $(X, y)$ pairs. I'm sorry, I don't know what you mean by "proper notation", so I hope that clarifies.
              – Matthew Drury
              Nov 24 at 17:49














            0












            0








            0






            $(1.)$ What does $ E_x$ mean?



            This means the expectation of the quantity in the brackets, with respect to $x in X$ drawn from the probability distribution $P(X)$. I.e., as an integral:



            $$ int_X (g^D(x) - f(x))^2 p(x) dx $$



            Where $p(x)$ is the density function of the distribution $P(X)$. This is the quantity often estimated from a sample with the in sample error of $g^D$:



            $$ frac{1}{n} sum_i (g^D(x_i) - y_i)^2 $$



            $(2.)$ What does $ E_D [ E_{out} [ g^D ]]$ mean?



            The data set $D$ is random here. That is, we treat the data set we use to train our predictive model as random, and are averaging over all the possible training data sets according to their distribution.



            $E_{out}$ stands for the average error across all random data sets, so there are really two random data sets being averaged over independently in this calculation:





            • $D$, the training data set, used to construct $g^D$.

            • An unnamed one averaged over in $E_{out}$, the testing data set.


            That notation is indicating the average test error averaged across all possible training data sets. This is the quantity estimated with cross validation.






            share|cite|improve this answer












            $(1.)$ What does $ E_x$ mean?



            This means the expectation of the quantity in the brackets, with respect to $x in X$ drawn from the probability distribution $P(X)$. I.e., as an integral:



            $$ int_X (g^D(x) - f(x))^2 p(x) dx $$



            Where $p(x)$ is the density function of the distribution $P(X)$. This is the quantity often estimated from a sample with the in sample error of $g^D$:



            $$ frac{1}{n} sum_i (g^D(x_i) - y_i)^2 $$



            $(2.)$ What does $ E_D [ E_{out} [ g^D ]]$ mean?



            The data set $D$ is random here. That is, we treat the data set we use to train our predictive model as random, and are averaging over all the possible training data sets according to their distribution.



            $E_{out}$ stands for the average error across all random data sets, so there are really two random data sets being averaged over independently in this calculation:





            • $D$, the training data set, used to construct $g^D$.

            • An unnamed one averaged over in $E_{out}$, the testing data set.


            That notation is indicating the average test error averaged across all possible training data sets. This is the quantity estimated with cross validation.







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered Nov 23 at 18:48









            Matthew Drury

            18235




            18235












            • For (2), what is the random variable here and how would it be defined in proper notation? I understand now that $Bbb E_x$ just refers to $x in X$ where $X$ has the distribution, but then $Bbb E_D$ refers to $D in (?)$ where $(?)$ has a distribution. What is $(?)$ and what is its distribution?
              – Oliver G
              Nov 24 at 14:49












            • The D is a training data set. Each $(X, y)$ pair has a joint distribution, and can be sampled from. For example, if each $(X, y)$ pair is independent from one another, then $D$ is just a random assemblage of $(X, y)$'s, each taken independently of one another. If there are dependencies between $(X,y)$ pairs, for example in time series modeling, then you are sampling from the joint distribution of some number of $(X, y)$ pairs. I'm sorry, I don't know what you mean by "proper notation", so I hope that clarifies.
              – Matthew Drury
              Nov 24 at 17:49


















            • For (2), what is the random variable here and how would it be defined in proper notation? I understand now that $Bbb E_x$ just refers to $x in X$ where $X$ has the distribution, but then $Bbb E_D$ refers to $D in (?)$ where $(?)$ has a distribution. What is $(?)$ and what is its distribution?
              – Oliver G
              Nov 24 at 14:49












            • The D is a training data set. Each $(X, y)$ pair has a joint distribution, and can be sampled from. For example, if each $(X, y)$ pair is independent from one another, then $D$ is just a random assemblage of $(X, y)$'s, each taken independently of one another. If there are dependencies between $(X,y)$ pairs, for example in time series modeling, then you are sampling from the joint distribution of some number of $(X, y)$ pairs. I'm sorry, I don't know what you mean by "proper notation", so I hope that clarifies.
              – Matthew Drury
              Nov 24 at 17:49
















            For (2), what is the random variable here and how would it be defined in proper notation? I understand now that $Bbb E_x$ just refers to $x in X$ where $X$ has the distribution, but then $Bbb E_D$ refers to $D in (?)$ where $(?)$ has a distribution. What is $(?)$ and what is its distribution?
            – Oliver G
            Nov 24 at 14:49






            For (2), what is the random variable here and how would it be defined in proper notation? I understand now that $Bbb E_x$ just refers to $x in X$ where $X$ has the distribution, but then $Bbb E_D$ refers to $D in (?)$ where $(?)$ has a distribution. What is $(?)$ and what is its distribution?
            – Oliver G
            Nov 24 at 14:49














            The D is a training data set. Each $(X, y)$ pair has a joint distribution, and can be sampled from. For example, if each $(X, y)$ pair is independent from one another, then $D$ is just a random assemblage of $(X, y)$'s, each taken independently of one another. If there are dependencies between $(X,y)$ pairs, for example in time series modeling, then you are sampling from the joint distribution of some number of $(X, y)$ pairs. I'm sorry, I don't know what you mean by "proper notation", so I hope that clarifies.
            – Matthew Drury
            Nov 24 at 17:49




            The D is a training data set. Each $(X, y)$ pair has a joint distribution, and can be sampled from. For example, if each $(X, y)$ pair is independent from one another, then $D$ is just a random assemblage of $(X, y)$'s, each taken independently of one another. If there are dependencies between $(X,y)$ pairs, for example in time series modeling, then you are sampling from the joint distribution of some number of $(X, y)$ pairs. I'm sorry, I don't know what you mean by "proper notation", so I hope that clarifies.
            – Matthew Drury
            Nov 24 at 17:49











            0














            The $x$ in "$mathbb{E}_x$" denotes that the expectation is taken over the random variable $x$. In particular, you should view $g^D$ as a deterministic function here, and thus $(g^D - f)^2$ is also a deterministic function. If you plug in a single random variable $x sim P(X)$, then you can take the expectation with respect to this random variable: this is $mathbb{E}_x [(g^D(x) - f(x))^2]$.



            This defines a number $E_{out}(g^D)$, under the assumption that $g^D$ is a fixed deterministic function. However, if we reinterpret $g^D$ as a random function that depends on the random dataset $D$, then $E_{out}(g^D)$ becomes a random variable, and you can take the expectation over this random variable: this is $mathbb{E}_D[E_{out}(g^D)]$.






            share|cite|improve this answer


























              0














              The $x$ in "$mathbb{E}_x$" denotes that the expectation is taken over the random variable $x$. In particular, you should view $g^D$ as a deterministic function here, and thus $(g^D - f)^2$ is also a deterministic function. If you plug in a single random variable $x sim P(X)$, then you can take the expectation with respect to this random variable: this is $mathbb{E}_x [(g^D(x) - f(x))^2]$.



              This defines a number $E_{out}(g^D)$, under the assumption that $g^D$ is a fixed deterministic function. However, if we reinterpret $g^D$ as a random function that depends on the random dataset $D$, then $E_{out}(g^D)$ becomes a random variable, and you can take the expectation over this random variable: this is $mathbb{E}_D[E_{out}(g^D)]$.






              share|cite|improve this answer
























                0












                0








                0






                The $x$ in "$mathbb{E}_x$" denotes that the expectation is taken over the random variable $x$. In particular, you should view $g^D$ as a deterministic function here, and thus $(g^D - f)^2$ is also a deterministic function. If you plug in a single random variable $x sim P(X)$, then you can take the expectation with respect to this random variable: this is $mathbb{E}_x [(g^D(x) - f(x))^2]$.



                This defines a number $E_{out}(g^D)$, under the assumption that $g^D$ is a fixed deterministic function. However, if we reinterpret $g^D$ as a random function that depends on the random dataset $D$, then $E_{out}(g^D)$ becomes a random variable, and you can take the expectation over this random variable: this is $mathbb{E}_D[E_{out}(g^D)]$.






                share|cite|improve this answer












                The $x$ in "$mathbb{E}_x$" denotes that the expectation is taken over the random variable $x$. In particular, you should view $g^D$ as a deterministic function here, and thus $(g^D - f)^2$ is also a deterministic function. If you plug in a single random variable $x sim P(X)$, then you can take the expectation with respect to this random variable: this is $mathbb{E}_x [(g^D(x) - f(x))^2]$.



                This defines a number $E_{out}(g^D)$, under the assumption that $g^D$ is a fixed deterministic function. However, if we reinterpret $g^D$ as a random function that depends on the random dataset $D$, then $E_{out}(g^D)$ becomes a random variable, and you can take the expectation over this random variable: this is $mathbb{E}_D[E_{out}(g^D)]$.







                share|cite|improve this answer












                share|cite|improve this answer



                share|cite|improve this answer










                answered Nov 23 at 18:52









                angryavian

                38.7k23180




                38.7k23180






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Mathematics Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3010692%2fwhat-does-this-expected-value-notation-mean%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Plaza Victoria

                    In PowerPoint, is there a keyboard shortcut for bulleted / numbered list?

                    How to put 3 figures in Latex with 2 figures side by side and 1 below these side by side images but in...