Reconciling two interpretations of $E(X^2)$












4












$begingroup$


It's been over 25 years since my last course in probability, so this may be obvious or elementary. However, I've unexpectedly had to think deeply about this stuff for a research project, and I cannot reconcile my issue.



For the sake of simplicity/common ground, let's assume Riemann integrals always suffice, there are no convergence issues, and my random variables take values over all of $mathbb{R}$. All integrals below are $int_mathbb{R} dx $ whether or not the domain is explicitly written.



Given a r.v. $X$ with density function $f(x)$, we define $E(X) = int x f(x) dx$. Linearity is then proven, $E(aX+b)=aE(X)+b$, and this is completely sensible. In deriving the alternative formula for the variance, we bump into $E(X^2)$. In that derivation, we define
$$
E(X^2) = int x^2 f(x) dx
$$

and proceed. This is fine, but here's my issue.



$X^2$ is itself a random variable, so we could ask for its expected value (in reference to "itself," not $X$). To be clearer, we could set $Y=X^2$ and ask for $E(Y)$. This requires us to know the density function for $Y$, and this to me is not clear at all and non-trivial to get your hands on. So, setting $Y=X^2$ with density $h(y)$, why is it true that
$$
int y h(y) dy = int x^2 f(x) dx
$$


so that $E(Y) = E(X^2)$? Why do these two very different interpretations agree? I do not think this is as simple as a $u$-substitution.



For example, I can see that this works out fine if $X$ is standard normal. There, $E(X)=0$ and $Y=X^2$ is $chi^2$ distributed with $1$ degree of freedom. Since $sigma_X^2=1$ it is clear that $E(X^2)=1$. Chasing the calculations I also see that $E(Y)=E(chi_1^2)=1$, so they are in fact in agreement. I can even follow the derivations of the $chi_1^2$ distribution in terms of the $Gamma$-function and see the connection to the standard normal, but I see no reason for this to play out as nicely no matter the density of $X$. It also seems to get worse when considering $E(X^alpha)$ in general.



Are these two viewpoints in potential disagreement, or is there a piece of theory that says that there is no ambiguity?










share|cite|improve this question









$endgroup$








  • 2




    $begingroup$
    It's the Law of the Unconscious Statistician
    $endgroup$
    – Robert Israel
    Dec 17 '18 at 3:45










  • $begingroup$
    @RobertIsrael Thanks, I'd not heard of that, and am relieved that it is a true subtlety. The wiki article makes sense to me.
    $endgroup$
    – Randall
    Dec 17 '18 at 3:52
















4












$begingroup$


It's been over 25 years since my last course in probability, so this may be obvious or elementary. However, I've unexpectedly had to think deeply about this stuff for a research project, and I cannot reconcile my issue.



For the sake of simplicity/common ground, let's assume Riemann integrals always suffice, there are no convergence issues, and my random variables take values over all of $mathbb{R}$. All integrals below are $int_mathbb{R} dx $ whether or not the domain is explicitly written.



Given a r.v. $X$ with density function $f(x)$, we define $E(X) = int x f(x) dx$. Linearity is then proven, $E(aX+b)=aE(X)+b$, and this is completely sensible. In deriving the alternative formula for the variance, we bump into $E(X^2)$. In that derivation, we define
$$
E(X^2) = int x^2 f(x) dx
$$

and proceed. This is fine, but here's my issue.



$X^2$ is itself a random variable, so we could ask for its expected value (in reference to "itself," not $X$). To be clearer, we could set $Y=X^2$ and ask for $E(Y)$. This requires us to know the density function for $Y$, and this to me is not clear at all and non-trivial to get your hands on. So, setting $Y=X^2$ with density $h(y)$, why is it true that
$$
int y h(y) dy = int x^2 f(x) dx
$$


so that $E(Y) = E(X^2)$? Why do these two very different interpretations agree? I do not think this is as simple as a $u$-substitution.



For example, I can see that this works out fine if $X$ is standard normal. There, $E(X)=0$ and $Y=X^2$ is $chi^2$ distributed with $1$ degree of freedom. Since $sigma_X^2=1$ it is clear that $E(X^2)=1$. Chasing the calculations I also see that $E(Y)=E(chi_1^2)=1$, so they are in fact in agreement. I can even follow the derivations of the $chi_1^2$ distribution in terms of the $Gamma$-function and see the connection to the standard normal, but I see no reason for this to play out as nicely no matter the density of $X$. It also seems to get worse when considering $E(X^alpha)$ in general.



Are these two viewpoints in potential disagreement, or is there a piece of theory that says that there is no ambiguity?










share|cite|improve this question









$endgroup$








  • 2




    $begingroup$
    It's the Law of the Unconscious Statistician
    $endgroup$
    – Robert Israel
    Dec 17 '18 at 3:45










  • $begingroup$
    @RobertIsrael Thanks, I'd not heard of that, and am relieved that it is a true subtlety. The wiki article makes sense to me.
    $endgroup$
    – Randall
    Dec 17 '18 at 3:52














4












4








4





$begingroup$


It's been over 25 years since my last course in probability, so this may be obvious or elementary. However, I've unexpectedly had to think deeply about this stuff for a research project, and I cannot reconcile my issue.



For the sake of simplicity/common ground, let's assume Riemann integrals always suffice, there are no convergence issues, and my random variables take values over all of $mathbb{R}$. All integrals below are $int_mathbb{R} dx $ whether or not the domain is explicitly written.



Given a r.v. $X$ with density function $f(x)$, we define $E(X) = int x f(x) dx$. Linearity is then proven, $E(aX+b)=aE(X)+b$, and this is completely sensible. In deriving the alternative formula for the variance, we bump into $E(X^2)$. In that derivation, we define
$$
E(X^2) = int x^2 f(x) dx
$$

and proceed. This is fine, but here's my issue.



$X^2$ is itself a random variable, so we could ask for its expected value (in reference to "itself," not $X$). To be clearer, we could set $Y=X^2$ and ask for $E(Y)$. This requires us to know the density function for $Y$, and this to me is not clear at all and non-trivial to get your hands on. So, setting $Y=X^2$ with density $h(y)$, why is it true that
$$
int y h(y) dy = int x^2 f(x) dx
$$


so that $E(Y) = E(X^2)$? Why do these two very different interpretations agree? I do not think this is as simple as a $u$-substitution.



For example, I can see that this works out fine if $X$ is standard normal. There, $E(X)=0$ and $Y=X^2$ is $chi^2$ distributed with $1$ degree of freedom. Since $sigma_X^2=1$ it is clear that $E(X^2)=1$. Chasing the calculations I also see that $E(Y)=E(chi_1^2)=1$, so they are in fact in agreement. I can even follow the derivations of the $chi_1^2$ distribution in terms of the $Gamma$-function and see the connection to the standard normal, but I see no reason for this to play out as nicely no matter the density of $X$. It also seems to get worse when considering $E(X^alpha)$ in general.



Are these two viewpoints in potential disagreement, or is there a piece of theory that says that there is no ambiguity?










share|cite|improve this question









$endgroup$




It's been over 25 years since my last course in probability, so this may be obvious or elementary. However, I've unexpectedly had to think deeply about this stuff for a research project, and I cannot reconcile my issue.



For the sake of simplicity/common ground, let's assume Riemann integrals always suffice, there are no convergence issues, and my random variables take values over all of $mathbb{R}$. All integrals below are $int_mathbb{R} dx $ whether or not the domain is explicitly written.



Given a r.v. $X$ with density function $f(x)$, we define $E(X) = int x f(x) dx$. Linearity is then proven, $E(aX+b)=aE(X)+b$, and this is completely sensible. In deriving the alternative formula for the variance, we bump into $E(X^2)$. In that derivation, we define
$$
E(X^2) = int x^2 f(x) dx
$$

and proceed. This is fine, but here's my issue.



$X^2$ is itself a random variable, so we could ask for its expected value (in reference to "itself," not $X$). To be clearer, we could set $Y=X^2$ and ask for $E(Y)$. This requires us to know the density function for $Y$, and this to me is not clear at all and non-trivial to get your hands on. So, setting $Y=X^2$ with density $h(y)$, why is it true that
$$
int y h(y) dy = int x^2 f(x) dx
$$


so that $E(Y) = E(X^2)$? Why do these two very different interpretations agree? I do not think this is as simple as a $u$-substitution.



For example, I can see that this works out fine if $X$ is standard normal. There, $E(X)=0$ and $Y=X^2$ is $chi^2$ distributed with $1$ degree of freedom. Since $sigma_X^2=1$ it is clear that $E(X^2)=1$. Chasing the calculations I also see that $E(Y)=E(chi_1^2)=1$, so they are in fact in agreement. I can even follow the derivations of the $chi_1^2$ distribution in terms of the $Gamma$-function and see the connection to the standard normal, but I see no reason for this to play out as nicely no matter the density of $X$. It also seems to get worse when considering $E(X^alpha)$ in general.



Are these two viewpoints in potential disagreement, or is there a piece of theory that says that there is no ambiguity?







probability-theory expected-value






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Dec 17 '18 at 3:13









RandallRandall

10.6k11431




10.6k11431








  • 2




    $begingroup$
    It's the Law of the Unconscious Statistician
    $endgroup$
    – Robert Israel
    Dec 17 '18 at 3:45










  • $begingroup$
    @RobertIsrael Thanks, I'd not heard of that, and am relieved that it is a true subtlety. The wiki article makes sense to me.
    $endgroup$
    – Randall
    Dec 17 '18 at 3:52














  • 2




    $begingroup$
    It's the Law of the Unconscious Statistician
    $endgroup$
    – Robert Israel
    Dec 17 '18 at 3:45










  • $begingroup$
    @RobertIsrael Thanks, I'd not heard of that, and am relieved that it is a true subtlety. The wiki article makes sense to me.
    $endgroup$
    – Randall
    Dec 17 '18 at 3:52








2




2




$begingroup$
It's the Law of the Unconscious Statistician
$endgroup$
– Robert Israel
Dec 17 '18 at 3:45




$begingroup$
It's the Law of the Unconscious Statistician
$endgroup$
– Robert Israel
Dec 17 '18 at 3:45












$begingroup$
@RobertIsrael Thanks, I'd not heard of that, and am relieved that it is a true subtlety. The wiki article makes sense to me.
$endgroup$
– Randall
Dec 17 '18 at 3:52




$begingroup$
@RobertIsrael Thanks, I'd not heard of that, and am relieved that it is a true subtlety. The wiki article makes sense to me.
$endgroup$
– Randall
Dec 17 '18 at 3:52










2 Answers
2






active

oldest

votes


















4












$begingroup$

Pretty sure this can be reconciled with a bit of measure theory, but here's a concrete working out that they agree in this particular case.



Let $F$ be the cdf for $X$, so that the probability that $a le X le b$ is $F(b)-F(a)$. Then $f=F'$. Now can we use $F$ to work out the cdf for $Y$?



Yes we can! The cdf for $Y$, $H$ will be $H(y) = P(Yle y)=P(X^2le y)$. Thus if $yle 0$, $H(y)=0$. However if $y > 0$, then $P(X^2 le y) = P(-sqrt{y}le x le sqrt{y})=F(sqrt{y})-F(-sqrt{y})$.



Taking the derivative, we see that
$$h(y) = begin{cases} 0 & y le 0 \ frac{f(sqrt{y})+f(-sqrt{y})}{2sqrt{y}} & y > 0end{cases}.$$



Then
$$int y h(y) ,dy = int_0^infty y(f(sqrt{y})+f(-sqrt{y}))frac{1}{2sqrt{y}},dy.$$
Letting $u=sqrt{y}$, we have $du=frac{1}{2sqrt{y}},dy$
so
$$int y h(y) ,dy = int_0^infty u^2 (f(u)+f(-u)),du=int_{-infty}^infty u^2 f(u),du=E[X^2]$$



And the abstract approach



Let $(A,Omega,mu)$ be a probability space. A random variable on $A$ is a measurable function $f:Ato Bbb{R}$. The random variable $Y=X^2$ on $A$ is simply the measurable function $amapsto f(a)^2$, the composite of $xmapsto x^2$ with $f$. However, let's be a little more general. Let $g:Bbb{R}to Bbb{R}$ be any continuous function on $Bbb{R}$, and we can consider the random variable $Y=g(X)$, which is the function $gcirc f : Ato Bbb{R}$.



Now we can take the pushforward $mu$ along $f$ to get a measure $f_*mu$ on $Bbb{R}$ defined by $f_*mu(E) = mu(f^{-1}(E))$. If $f_*mu$ is absolutely continuous with respect to Lebesgue measure, then its Radon-Nikodym derivative will be the pdf of $X$, but let's not think about distribution functions for now.



We now have a new measure space, $(Bbb{R},mathcal{B},f_*mu)$. Since we obtained this space by pushing forwards $mu$ along $f$, we see that the identity function on this new space determines the same random variable $X$, since the probability that $X$ ends up in some set $BsubseteqBbb{R}$, by definition originally was $mu(f^{-1}(B)$, but this is $f_*mu(mathrm{id}^{-1}(B))$.



Similarly, $Y=g(X)$ will be described on this new space by simply the function $g:Bbb{R}toBbb{R}$.



The expected value of $Y$ is then
$$int g(x),f_*mu(dx),$$
however, we can then pushforward $f_*mu$ along $g$ to get $g_*f_*mu$.
This gives that the expected value of $Y$ is
$$int y, g_*f_*mu(dy).$$



Thus, rephrased in abstract language, the statement that you want is that these two values are equal, i.e.
$$int g(x),f_*mu(dx)=int y,g_*f_*mu(dy).$$



This is however, just a special case of change of variables, which says that
$$int j circ k ,dnu = int j ,d k_*nu,$$
with $j=textrm{id}$, $k=g$, and $nu=f_*mu$.






share|cite|improve this answer











$endgroup$













  • $begingroup$
    Thanks. Though I am glad that measure theory makes everything work out, the concrete approach is what really showed me the light.
    $endgroup$
    – Randall
    Dec 17 '18 at 14:40



















0












$begingroup$

It is easier to view expectation in the following way (it is long but worth the effort reading, I hope):



Let us fix the set-up. We have a statistical experiment with sample space $S$, and collection of "events" (subset of $S$) forming a $sigma$-algebra, and a probability measure there.



A random variable is a function "that provides a numerical measurement" to each element $sin S$. (satisfying measurabilty condition). .
So the best view point is to think of expectations as a concept associated to various functions on same SAMPLE SPACE the SAME set up of statistical experiment and fixed probability measure is more useful and closer to application domain. The phrase "Expectation of a random variable" is misleading, and delinking from the underlying probability measure is confusing.



Expectations SHOULD NOT BE viewed as something with respect to
the density of a specific random variable. It is expectation of some function
on the sample space with respect to underlying probability measure (various functions could have wildly different distributions and densities that should be set aside temporarily. That the formula for expectation involves the density of the random variable, But the formula should not be confused with the concept/definition of it. An algorithm for computing GCD of two numbers should not be confused with the definition of gcd)






share|cite|improve this answer









$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "69"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3043490%2freconciling-two-interpretations-of-ex2%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    4












    $begingroup$

    Pretty sure this can be reconciled with a bit of measure theory, but here's a concrete working out that they agree in this particular case.



    Let $F$ be the cdf for $X$, so that the probability that $a le X le b$ is $F(b)-F(a)$. Then $f=F'$. Now can we use $F$ to work out the cdf for $Y$?



    Yes we can! The cdf for $Y$, $H$ will be $H(y) = P(Yle y)=P(X^2le y)$. Thus if $yle 0$, $H(y)=0$. However if $y > 0$, then $P(X^2 le y) = P(-sqrt{y}le x le sqrt{y})=F(sqrt{y})-F(-sqrt{y})$.



    Taking the derivative, we see that
    $$h(y) = begin{cases} 0 & y le 0 \ frac{f(sqrt{y})+f(-sqrt{y})}{2sqrt{y}} & y > 0end{cases}.$$



    Then
    $$int y h(y) ,dy = int_0^infty y(f(sqrt{y})+f(-sqrt{y}))frac{1}{2sqrt{y}},dy.$$
    Letting $u=sqrt{y}$, we have $du=frac{1}{2sqrt{y}},dy$
    so
    $$int y h(y) ,dy = int_0^infty u^2 (f(u)+f(-u)),du=int_{-infty}^infty u^2 f(u),du=E[X^2]$$



    And the abstract approach



    Let $(A,Omega,mu)$ be a probability space. A random variable on $A$ is a measurable function $f:Ato Bbb{R}$. The random variable $Y=X^2$ on $A$ is simply the measurable function $amapsto f(a)^2$, the composite of $xmapsto x^2$ with $f$. However, let's be a little more general. Let $g:Bbb{R}to Bbb{R}$ be any continuous function on $Bbb{R}$, and we can consider the random variable $Y=g(X)$, which is the function $gcirc f : Ato Bbb{R}$.



    Now we can take the pushforward $mu$ along $f$ to get a measure $f_*mu$ on $Bbb{R}$ defined by $f_*mu(E) = mu(f^{-1}(E))$. If $f_*mu$ is absolutely continuous with respect to Lebesgue measure, then its Radon-Nikodym derivative will be the pdf of $X$, but let's not think about distribution functions for now.



    We now have a new measure space, $(Bbb{R},mathcal{B},f_*mu)$. Since we obtained this space by pushing forwards $mu$ along $f$, we see that the identity function on this new space determines the same random variable $X$, since the probability that $X$ ends up in some set $BsubseteqBbb{R}$, by definition originally was $mu(f^{-1}(B)$, but this is $f_*mu(mathrm{id}^{-1}(B))$.



    Similarly, $Y=g(X)$ will be described on this new space by simply the function $g:Bbb{R}toBbb{R}$.



    The expected value of $Y$ is then
    $$int g(x),f_*mu(dx),$$
    however, we can then pushforward $f_*mu$ along $g$ to get $g_*f_*mu$.
    This gives that the expected value of $Y$ is
    $$int y, g_*f_*mu(dy).$$



    Thus, rephrased in abstract language, the statement that you want is that these two values are equal, i.e.
    $$int g(x),f_*mu(dx)=int y,g_*f_*mu(dy).$$



    This is however, just a special case of change of variables, which says that
    $$int j circ k ,dnu = int j ,d k_*nu,$$
    with $j=textrm{id}$, $k=g$, and $nu=f_*mu$.






    share|cite|improve this answer











    $endgroup$













    • $begingroup$
      Thanks. Though I am glad that measure theory makes everything work out, the concrete approach is what really showed me the light.
      $endgroup$
      – Randall
      Dec 17 '18 at 14:40
















    4












    $begingroup$

    Pretty sure this can be reconciled with a bit of measure theory, but here's a concrete working out that they agree in this particular case.



    Let $F$ be the cdf for $X$, so that the probability that $a le X le b$ is $F(b)-F(a)$. Then $f=F'$. Now can we use $F$ to work out the cdf for $Y$?



    Yes we can! The cdf for $Y$, $H$ will be $H(y) = P(Yle y)=P(X^2le y)$. Thus if $yle 0$, $H(y)=0$. However if $y > 0$, then $P(X^2 le y) = P(-sqrt{y}le x le sqrt{y})=F(sqrt{y})-F(-sqrt{y})$.



    Taking the derivative, we see that
    $$h(y) = begin{cases} 0 & y le 0 \ frac{f(sqrt{y})+f(-sqrt{y})}{2sqrt{y}} & y > 0end{cases}.$$



    Then
    $$int y h(y) ,dy = int_0^infty y(f(sqrt{y})+f(-sqrt{y}))frac{1}{2sqrt{y}},dy.$$
    Letting $u=sqrt{y}$, we have $du=frac{1}{2sqrt{y}},dy$
    so
    $$int y h(y) ,dy = int_0^infty u^2 (f(u)+f(-u)),du=int_{-infty}^infty u^2 f(u),du=E[X^2]$$



    And the abstract approach



    Let $(A,Omega,mu)$ be a probability space. A random variable on $A$ is a measurable function $f:Ato Bbb{R}$. The random variable $Y=X^2$ on $A$ is simply the measurable function $amapsto f(a)^2$, the composite of $xmapsto x^2$ with $f$. However, let's be a little more general. Let $g:Bbb{R}to Bbb{R}$ be any continuous function on $Bbb{R}$, and we can consider the random variable $Y=g(X)$, which is the function $gcirc f : Ato Bbb{R}$.



    Now we can take the pushforward $mu$ along $f$ to get a measure $f_*mu$ on $Bbb{R}$ defined by $f_*mu(E) = mu(f^{-1}(E))$. If $f_*mu$ is absolutely continuous with respect to Lebesgue measure, then its Radon-Nikodym derivative will be the pdf of $X$, but let's not think about distribution functions for now.



    We now have a new measure space, $(Bbb{R},mathcal{B},f_*mu)$. Since we obtained this space by pushing forwards $mu$ along $f$, we see that the identity function on this new space determines the same random variable $X$, since the probability that $X$ ends up in some set $BsubseteqBbb{R}$, by definition originally was $mu(f^{-1}(B)$, but this is $f_*mu(mathrm{id}^{-1}(B))$.



    Similarly, $Y=g(X)$ will be described on this new space by simply the function $g:Bbb{R}toBbb{R}$.



    The expected value of $Y$ is then
    $$int g(x),f_*mu(dx),$$
    however, we can then pushforward $f_*mu$ along $g$ to get $g_*f_*mu$.
    This gives that the expected value of $Y$ is
    $$int y, g_*f_*mu(dy).$$



    Thus, rephrased in abstract language, the statement that you want is that these two values are equal, i.e.
    $$int g(x),f_*mu(dx)=int y,g_*f_*mu(dy).$$



    This is however, just a special case of change of variables, which says that
    $$int j circ k ,dnu = int j ,d k_*nu,$$
    with $j=textrm{id}$, $k=g$, and $nu=f_*mu$.






    share|cite|improve this answer











    $endgroup$













    • $begingroup$
      Thanks. Though I am glad that measure theory makes everything work out, the concrete approach is what really showed me the light.
      $endgroup$
      – Randall
      Dec 17 '18 at 14:40














    4












    4








    4





    $begingroup$

    Pretty sure this can be reconciled with a bit of measure theory, but here's a concrete working out that they agree in this particular case.



    Let $F$ be the cdf for $X$, so that the probability that $a le X le b$ is $F(b)-F(a)$. Then $f=F'$. Now can we use $F$ to work out the cdf for $Y$?



    Yes we can! The cdf for $Y$, $H$ will be $H(y) = P(Yle y)=P(X^2le y)$. Thus if $yle 0$, $H(y)=0$. However if $y > 0$, then $P(X^2 le y) = P(-sqrt{y}le x le sqrt{y})=F(sqrt{y})-F(-sqrt{y})$.



    Taking the derivative, we see that
    $$h(y) = begin{cases} 0 & y le 0 \ frac{f(sqrt{y})+f(-sqrt{y})}{2sqrt{y}} & y > 0end{cases}.$$



    Then
    $$int y h(y) ,dy = int_0^infty y(f(sqrt{y})+f(-sqrt{y}))frac{1}{2sqrt{y}},dy.$$
    Letting $u=sqrt{y}$, we have $du=frac{1}{2sqrt{y}},dy$
    so
    $$int y h(y) ,dy = int_0^infty u^2 (f(u)+f(-u)),du=int_{-infty}^infty u^2 f(u),du=E[X^2]$$



    And the abstract approach



    Let $(A,Omega,mu)$ be a probability space. A random variable on $A$ is a measurable function $f:Ato Bbb{R}$. The random variable $Y=X^2$ on $A$ is simply the measurable function $amapsto f(a)^2$, the composite of $xmapsto x^2$ with $f$. However, let's be a little more general. Let $g:Bbb{R}to Bbb{R}$ be any continuous function on $Bbb{R}$, and we can consider the random variable $Y=g(X)$, which is the function $gcirc f : Ato Bbb{R}$.



    Now we can take the pushforward $mu$ along $f$ to get a measure $f_*mu$ on $Bbb{R}$ defined by $f_*mu(E) = mu(f^{-1}(E))$. If $f_*mu$ is absolutely continuous with respect to Lebesgue measure, then its Radon-Nikodym derivative will be the pdf of $X$, but let's not think about distribution functions for now.



    We now have a new measure space, $(Bbb{R},mathcal{B},f_*mu)$. Since we obtained this space by pushing forwards $mu$ along $f$, we see that the identity function on this new space determines the same random variable $X$, since the probability that $X$ ends up in some set $BsubseteqBbb{R}$, by definition originally was $mu(f^{-1}(B)$, but this is $f_*mu(mathrm{id}^{-1}(B))$.



    Similarly, $Y=g(X)$ will be described on this new space by simply the function $g:Bbb{R}toBbb{R}$.



    The expected value of $Y$ is then
    $$int g(x),f_*mu(dx),$$
    however, we can then pushforward $f_*mu$ along $g$ to get $g_*f_*mu$.
    This gives that the expected value of $Y$ is
    $$int y, g_*f_*mu(dy).$$



    Thus, rephrased in abstract language, the statement that you want is that these two values are equal, i.e.
    $$int g(x),f_*mu(dx)=int y,g_*f_*mu(dy).$$



    This is however, just a special case of change of variables, which says that
    $$int j circ k ,dnu = int j ,d k_*nu,$$
    with $j=textrm{id}$, $k=g$, and $nu=f_*mu$.






    share|cite|improve this answer











    $endgroup$



    Pretty sure this can be reconciled with a bit of measure theory, but here's a concrete working out that they agree in this particular case.



    Let $F$ be the cdf for $X$, so that the probability that $a le X le b$ is $F(b)-F(a)$. Then $f=F'$. Now can we use $F$ to work out the cdf for $Y$?



    Yes we can! The cdf for $Y$, $H$ will be $H(y) = P(Yle y)=P(X^2le y)$. Thus if $yle 0$, $H(y)=0$. However if $y > 0$, then $P(X^2 le y) = P(-sqrt{y}le x le sqrt{y})=F(sqrt{y})-F(-sqrt{y})$.



    Taking the derivative, we see that
    $$h(y) = begin{cases} 0 & y le 0 \ frac{f(sqrt{y})+f(-sqrt{y})}{2sqrt{y}} & y > 0end{cases}.$$



    Then
    $$int y h(y) ,dy = int_0^infty y(f(sqrt{y})+f(-sqrt{y}))frac{1}{2sqrt{y}},dy.$$
    Letting $u=sqrt{y}$, we have $du=frac{1}{2sqrt{y}},dy$
    so
    $$int y h(y) ,dy = int_0^infty u^2 (f(u)+f(-u)),du=int_{-infty}^infty u^2 f(u),du=E[X^2]$$



    And the abstract approach



    Let $(A,Omega,mu)$ be a probability space. A random variable on $A$ is a measurable function $f:Ato Bbb{R}$. The random variable $Y=X^2$ on $A$ is simply the measurable function $amapsto f(a)^2$, the composite of $xmapsto x^2$ with $f$. However, let's be a little more general. Let $g:Bbb{R}to Bbb{R}$ be any continuous function on $Bbb{R}$, and we can consider the random variable $Y=g(X)$, which is the function $gcirc f : Ato Bbb{R}$.



    Now we can take the pushforward $mu$ along $f$ to get a measure $f_*mu$ on $Bbb{R}$ defined by $f_*mu(E) = mu(f^{-1}(E))$. If $f_*mu$ is absolutely continuous with respect to Lebesgue measure, then its Radon-Nikodym derivative will be the pdf of $X$, but let's not think about distribution functions for now.



    We now have a new measure space, $(Bbb{R},mathcal{B},f_*mu)$. Since we obtained this space by pushing forwards $mu$ along $f$, we see that the identity function on this new space determines the same random variable $X$, since the probability that $X$ ends up in some set $BsubseteqBbb{R}$, by definition originally was $mu(f^{-1}(B)$, but this is $f_*mu(mathrm{id}^{-1}(B))$.



    Similarly, $Y=g(X)$ will be described on this new space by simply the function $g:Bbb{R}toBbb{R}$.



    The expected value of $Y$ is then
    $$int g(x),f_*mu(dx),$$
    however, we can then pushforward $f_*mu$ along $g$ to get $g_*f_*mu$.
    This gives that the expected value of $Y$ is
    $$int y, g_*f_*mu(dy).$$



    Thus, rephrased in abstract language, the statement that you want is that these two values are equal, i.e.
    $$int g(x),f_*mu(dx)=int y,g_*f_*mu(dy).$$



    This is however, just a special case of change of variables, which says that
    $$int j circ k ,dnu = int j ,d k_*nu,$$
    with $j=textrm{id}$, $k=g$, and $nu=f_*mu$.







    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited Dec 17 '18 at 3:57

























    answered Dec 17 '18 at 3:36









    jgonjgon

    15.4k32143




    15.4k32143












    • $begingroup$
      Thanks. Though I am glad that measure theory makes everything work out, the concrete approach is what really showed me the light.
      $endgroup$
      – Randall
      Dec 17 '18 at 14:40


















    • $begingroup$
      Thanks. Though I am glad that measure theory makes everything work out, the concrete approach is what really showed me the light.
      $endgroup$
      – Randall
      Dec 17 '18 at 14:40
















    $begingroup$
    Thanks. Though I am glad that measure theory makes everything work out, the concrete approach is what really showed me the light.
    $endgroup$
    – Randall
    Dec 17 '18 at 14:40




    $begingroup$
    Thanks. Though I am glad that measure theory makes everything work out, the concrete approach is what really showed me the light.
    $endgroup$
    – Randall
    Dec 17 '18 at 14:40











    0












    $begingroup$

    It is easier to view expectation in the following way (it is long but worth the effort reading, I hope):



    Let us fix the set-up. We have a statistical experiment with sample space $S$, and collection of "events" (subset of $S$) forming a $sigma$-algebra, and a probability measure there.



    A random variable is a function "that provides a numerical measurement" to each element $sin S$. (satisfying measurabilty condition). .
    So the best view point is to think of expectations as a concept associated to various functions on same SAMPLE SPACE the SAME set up of statistical experiment and fixed probability measure is more useful and closer to application domain. The phrase "Expectation of a random variable" is misleading, and delinking from the underlying probability measure is confusing.



    Expectations SHOULD NOT BE viewed as something with respect to
    the density of a specific random variable. It is expectation of some function
    on the sample space with respect to underlying probability measure (various functions could have wildly different distributions and densities that should be set aside temporarily. That the formula for expectation involves the density of the random variable, But the formula should not be confused with the concept/definition of it. An algorithm for computing GCD of two numbers should not be confused with the definition of gcd)






    share|cite|improve this answer









    $endgroup$


















      0












      $begingroup$

      It is easier to view expectation in the following way (it is long but worth the effort reading, I hope):



      Let us fix the set-up. We have a statistical experiment with sample space $S$, and collection of "events" (subset of $S$) forming a $sigma$-algebra, and a probability measure there.



      A random variable is a function "that provides a numerical measurement" to each element $sin S$. (satisfying measurabilty condition). .
      So the best view point is to think of expectations as a concept associated to various functions on same SAMPLE SPACE the SAME set up of statistical experiment and fixed probability measure is more useful and closer to application domain. The phrase "Expectation of a random variable" is misleading, and delinking from the underlying probability measure is confusing.



      Expectations SHOULD NOT BE viewed as something with respect to
      the density of a specific random variable. It is expectation of some function
      on the sample space with respect to underlying probability measure (various functions could have wildly different distributions and densities that should be set aside temporarily. That the formula for expectation involves the density of the random variable, But the formula should not be confused with the concept/definition of it. An algorithm for computing GCD of two numbers should not be confused with the definition of gcd)






      share|cite|improve this answer









      $endgroup$
















        0












        0








        0





        $begingroup$

        It is easier to view expectation in the following way (it is long but worth the effort reading, I hope):



        Let us fix the set-up. We have a statistical experiment with sample space $S$, and collection of "events" (subset of $S$) forming a $sigma$-algebra, and a probability measure there.



        A random variable is a function "that provides a numerical measurement" to each element $sin S$. (satisfying measurabilty condition). .
        So the best view point is to think of expectations as a concept associated to various functions on same SAMPLE SPACE the SAME set up of statistical experiment and fixed probability measure is more useful and closer to application domain. The phrase "Expectation of a random variable" is misleading, and delinking from the underlying probability measure is confusing.



        Expectations SHOULD NOT BE viewed as something with respect to
        the density of a specific random variable. It is expectation of some function
        on the sample space with respect to underlying probability measure (various functions could have wildly different distributions and densities that should be set aside temporarily. That the formula for expectation involves the density of the random variable, But the formula should not be confused with the concept/definition of it. An algorithm for computing GCD of two numbers should not be confused with the definition of gcd)






        share|cite|improve this answer









        $endgroup$



        It is easier to view expectation in the following way (it is long but worth the effort reading, I hope):



        Let us fix the set-up. We have a statistical experiment with sample space $S$, and collection of "events" (subset of $S$) forming a $sigma$-algebra, and a probability measure there.



        A random variable is a function "that provides a numerical measurement" to each element $sin S$. (satisfying measurabilty condition). .
        So the best view point is to think of expectations as a concept associated to various functions on same SAMPLE SPACE the SAME set up of statistical experiment and fixed probability measure is more useful and closer to application domain. The phrase "Expectation of a random variable" is misleading, and delinking from the underlying probability measure is confusing.



        Expectations SHOULD NOT BE viewed as something with respect to
        the density of a specific random variable. It is expectation of some function
        on the sample space with respect to underlying probability measure (various functions could have wildly different distributions and densities that should be set aside temporarily. That the formula for expectation involves the density of the random variable, But the formula should not be confused with the concept/definition of it. An algorithm for computing GCD of two numbers should not be confused with the definition of gcd)







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered Dec 17 '18 at 4:08









        P VanchinathanP Vanchinathan

        15.5k12136




        15.5k12136






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Mathematics Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3043490%2freconciling-two-interpretations-of-ex2%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Plaza Victoria

            Puebla de Zaragoza

            Musa