Derivative of matrix expression $(Y − Abeta)^TW(Y − Abeta)$ wrt $beta$.
$begingroup$
$Y$ and $beta$ are $1 times n$ matrices and $W$ is a diagonal $n times n$ matrix.
What is the best way to think about how to simplify this expression and its derivative to get the expression below? What are the simple rules I should remember to get this?
$2A^TWAbeta − 2A^TW^TY$
matrices matrix-equations matrix-calculus
$endgroup$
add a comment |
$begingroup$
$Y$ and $beta$ are $1 times n$ matrices and $W$ is a diagonal $n times n$ matrix.
What is the best way to think about how to simplify this expression and its derivative to get the expression below? What are the simple rules I should remember to get this?
$2A^TWAbeta − 2A^TW^TY$
matrices matrix-equations matrix-calculus
$endgroup$
add a comment |
$begingroup$
$Y$ and $beta$ are $1 times n$ matrices and $W$ is a diagonal $n times n$ matrix.
What is the best way to think about how to simplify this expression and its derivative to get the expression below? What are the simple rules I should remember to get this?
$2A^TWAbeta − 2A^TW^TY$
matrices matrix-equations matrix-calculus
$endgroup$
$Y$ and $beta$ are $1 times n$ matrices and $W$ is a diagonal $n times n$ matrix.
What is the best way to think about how to simplify this expression and its derivative to get the expression below? What are the simple rules I should remember to get this?
$2A^TWAbeta − 2A^TW^TY$
matrices matrix-equations matrix-calculus
matrices matrix-equations matrix-calculus
edited Dec 9 '18 at 16:28
user3408780
asked Dec 8 '18 at 22:41
user3408780user3408780
62
62
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Define the vectors
$$eqalign{
g &= (Ab-y) &implies dg=A,db cr
h &= (W!Ab-Wy) &implies dh=W!A,db cr
}$$
Write the function in terms of these new variables and find its differential and gradient.
$$eqalign{
f &= g^Th cr
df &= h^Tdg + g^Tdh cr
&= (h^TA+g^TWA),db cr
&= (A^Th+A^TW^Tg)^T,db cr
frac{partial f}{partial b}
&= A^Th+A^TW^Tg cr
&= A^T(WAb-Wy)+A^TW^T(Ab-y) cr
&= A^T(W+W^T)Ab - A^T(W+W^T)y cr
}$$
If $W=W^T$ this can be simplied to
$$eqalign{
frac{partial f}{partial b}
&= 2A^TWAb - 2A^TWy cr
}$$
$endgroup$
$begingroup$
Yes forgot to mention W is a diagonal matrix
$endgroup$
– user3408780
Dec 9 '18 at 15:43
add a comment |
$begingroup$
Throughout I implicitly sum over repeated indices. You're differentiating a scalar $Z_i W_{ij}Z_j,,Z_i:=Y_i-A_{ik}beta_k$ with respect to a vector $beta$, giving a vector whose $k$th component is obtained by differentiating with respect to $beta_k$. Since $partial_k:=tfrac{partial}{partialbeta_k}impliespartial_k Z_i=-A_{ik}$, the product rule obtains $partial_k(Z_i W_{ij}Z_j)=-A_{ik}W_{ij}Z_j-Z_i W_{ij}A_{jk}$. Now you can use the rules of matrix multiplication to rewrite this neatly as $-[Z^T(W+W^T)A]_k$, making the derivative $-Z^T(W+W^T)A=(Abeta-Y)^T(W+W^T)A$.
Now for a sanity check, which is worthwhile with any calculation this complex. If $W$ were a scalar instead, we'd have $partial_k Z^TWZ=Wpartial_k (Z^TZ)=-2Z^TWA$. But when we reinstate $W$'s matrix status, we note its antisymmetric part doesn't even contribute to the scalar we're differentiating, so without loss of generality $W$ should be replaced throughout with its symmetric part $(W+W^T)/2$. That gives $-Z^T(W+W^T)A$ instead, as we've found.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3031744%2fderivative-of-matrix-expression-y-%25e2%2588%2592-a-betatwy-%25e2%2588%2592-a-beta-wrt-beta%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Define the vectors
$$eqalign{
g &= (Ab-y) &implies dg=A,db cr
h &= (W!Ab-Wy) &implies dh=W!A,db cr
}$$
Write the function in terms of these new variables and find its differential and gradient.
$$eqalign{
f &= g^Th cr
df &= h^Tdg + g^Tdh cr
&= (h^TA+g^TWA),db cr
&= (A^Th+A^TW^Tg)^T,db cr
frac{partial f}{partial b}
&= A^Th+A^TW^Tg cr
&= A^T(WAb-Wy)+A^TW^T(Ab-y) cr
&= A^T(W+W^T)Ab - A^T(W+W^T)y cr
}$$
If $W=W^T$ this can be simplied to
$$eqalign{
frac{partial f}{partial b}
&= 2A^TWAb - 2A^TWy cr
}$$
$endgroup$
$begingroup$
Yes forgot to mention W is a diagonal matrix
$endgroup$
– user3408780
Dec 9 '18 at 15:43
add a comment |
$begingroup$
Define the vectors
$$eqalign{
g &= (Ab-y) &implies dg=A,db cr
h &= (W!Ab-Wy) &implies dh=W!A,db cr
}$$
Write the function in terms of these new variables and find its differential and gradient.
$$eqalign{
f &= g^Th cr
df &= h^Tdg + g^Tdh cr
&= (h^TA+g^TWA),db cr
&= (A^Th+A^TW^Tg)^T,db cr
frac{partial f}{partial b}
&= A^Th+A^TW^Tg cr
&= A^T(WAb-Wy)+A^TW^T(Ab-y) cr
&= A^T(W+W^T)Ab - A^T(W+W^T)y cr
}$$
If $W=W^T$ this can be simplied to
$$eqalign{
frac{partial f}{partial b}
&= 2A^TWAb - 2A^TWy cr
}$$
$endgroup$
$begingroup$
Yes forgot to mention W is a diagonal matrix
$endgroup$
– user3408780
Dec 9 '18 at 15:43
add a comment |
$begingroup$
Define the vectors
$$eqalign{
g &= (Ab-y) &implies dg=A,db cr
h &= (W!Ab-Wy) &implies dh=W!A,db cr
}$$
Write the function in terms of these new variables and find its differential and gradient.
$$eqalign{
f &= g^Th cr
df &= h^Tdg + g^Tdh cr
&= (h^TA+g^TWA),db cr
&= (A^Th+A^TW^Tg)^T,db cr
frac{partial f}{partial b}
&= A^Th+A^TW^Tg cr
&= A^T(WAb-Wy)+A^TW^T(Ab-y) cr
&= A^T(W+W^T)Ab - A^T(W+W^T)y cr
}$$
If $W=W^T$ this can be simplied to
$$eqalign{
frac{partial f}{partial b}
&= 2A^TWAb - 2A^TWy cr
}$$
$endgroup$
Define the vectors
$$eqalign{
g &= (Ab-y) &implies dg=A,db cr
h &= (W!Ab-Wy) &implies dh=W!A,db cr
}$$
Write the function in terms of these new variables and find its differential and gradient.
$$eqalign{
f &= g^Th cr
df &= h^Tdg + g^Tdh cr
&= (h^TA+g^TWA),db cr
&= (A^Th+A^TW^Tg)^T,db cr
frac{partial f}{partial b}
&= A^Th+A^TW^Tg cr
&= A^T(WAb-Wy)+A^TW^T(Ab-y) cr
&= A^T(W+W^T)Ab - A^T(W+W^T)y cr
}$$
If $W=W^T$ this can be simplied to
$$eqalign{
frac{partial f}{partial b}
&= 2A^TWAb - 2A^TWy cr
}$$
answered Dec 9 '18 at 2:40
greggreg
8,2951823
8,2951823
$begingroup$
Yes forgot to mention W is a diagonal matrix
$endgroup$
– user3408780
Dec 9 '18 at 15:43
add a comment |
$begingroup$
Yes forgot to mention W is a diagonal matrix
$endgroup$
– user3408780
Dec 9 '18 at 15:43
$begingroup$
Yes forgot to mention W is a diagonal matrix
$endgroup$
– user3408780
Dec 9 '18 at 15:43
$begingroup$
Yes forgot to mention W is a diagonal matrix
$endgroup$
– user3408780
Dec 9 '18 at 15:43
add a comment |
$begingroup$
Throughout I implicitly sum over repeated indices. You're differentiating a scalar $Z_i W_{ij}Z_j,,Z_i:=Y_i-A_{ik}beta_k$ with respect to a vector $beta$, giving a vector whose $k$th component is obtained by differentiating with respect to $beta_k$. Since $partial_k:=tfrac{partial}{partialbeta_k}impliespartial_k Z_i=-A_{ik}$, the product rule obtains $partial_k(Z_i W_{ij}Z_j)=-A_{ik}W_{ij}Z_j-Z_i W_{ij}A_{jk}$. Now you can use the rules of matrix multiplication to rewrite this neatly as $-[Z^T(W+W^T)A]_k$, making the derivative $-Z^T(W+W^T)A=(Abeta-Y)^T(W+W^T)A$.
Now for a sanity check, which is worthwhile with any calculation this complex. If $W$ were a scalar instead, we'd have $partial_k Z^TWZ=Wpartial_k (Z^TZ)=-2Z^TWA$. But when we reinstate $W$'s matrix status, we note its antisymmetric part doesn't even contribute to the scalar we're differentiating, so without loss of generality $W$ should be replaced throughout with its symmetric part $(W+W^T)/2$. That gives $-Z^T(W+W^T)A$ instead, as we've found.
$endgroup$
add a comment |
$begingroup$
Throughout I implicitly sum over repeated indices. You're differentiating a scalar $Z_i W_{ij}Z_j,,Z_i:=Y_i-A_{ik}beta_k$ with respect to a vector $beta$, giving a vector whose $k$th component is obtained by differentiating with respect to $beta_k$. Since $partial_k:=tfrac{partial}{partialbeta_k}impliespartial_k Z_i=-A_{ik}$, the product rule obtains $partial_k(Z_i W_{ij}Z_j)=-A_{ik}W_{ij}Z_j-Z_i W_{ij}A_{jk}$. Now you can use the rules of matrix multiplication to rewrite this neatly as $-[Z^T(W+W^T)A]_k$, making the derivative $-Z^T(W+W^T)A=(Abeta-Y)^T(W+W^T)A$.
Now for a sanity check, which is worthwhile with any calculation this complex. If $W$ were a scalar instead, we'd have $partial_k Z^TWZ=Wpartial_k (Z^TZ)=-2Z^TWA$. But when we reinstate $W$'s matrix status, we note its antisymmetric part doesn't even contribute to the scalar we're differentiating, so without loss of generality $W$ should be replaced throughout with its symmetric part $(W+W^T)/2$. That gives $-Z^T(W+W^T)A$ instead, as we've found.
$endgroup$
add a comment |
$begingroup$
Throughout I implicitly sum over repeated indices. You're differentiating a scalar $Z_i W_{ij}Z_j,,Z_i:=Y_i-A_{ik}beta_k$ with respect to a vector $beta$, giving a vector whose $k$th component is obtained by differentiating with respect to $beta_k$. Since $partial_k:=tfrac{partial}{partialbeta_k}impliespartial_k Z_i=-A_{ik}$, the product rule obtains $partial_k(Z_i W_{ij}Z_j)=-A_{ik}W_{ij}Z_j-Z_i W_{ij}A_{jk}$. Now you can use the rules of matrix multiplication to rewrite this neatly as $-[Z^T(W+W^T)A]_k$, making the derivative $-Z^T(W+W^T)A=(Abeta-Y)^T(W+W^T)A$.
Now for a sanity check, which is worthwhile with any calculation this complex. If $W$ were a scalar instead, we'd have $partial_k Z^TWZ=Wpartial_k (Z^TZ)=-2Z^TWA$. But when we reinstate $W$'s matrix status, we note its antisymmetric part doesn't even contribute to the scalar we're differentiating, so without loss of generality $W$ should be replaced throughout with its symmetric part $(W+W^T)/2$. That gives $-Z^T(W+W^T)A$ instead, as we've found.
$endgroup$
Throughout I implicitly sum over repeated indices. You're differentiating a scalar $Z_i W_{ij}Z_j,,Z_i:=Y_i-A_{ik}beta_k$ with respect to a vector $beta$, giving a vector whose $k$th component is obtained by differentiating with respect to $beta_k$. Since $partial_k:=tfrac{partial}{partialbeta_k}impliespartial_k Z_i=-A_{ik}$, the product rule obtains $partial_k(Z_i W_{ij}Z_j)=-A_{ik}W_{ij}Z_j-Z_i W_{ij}A_{jk}$. Now you can use the rules of matrix multiplication to rewrite this neatly as $-[Z^T(W+W^T)A]_k$, making the derivative $-Z^T(W+W^T)A=(Abeta-Y)^T(W+W^T)A$.
Now for a sanity check, which is worthwhile with any calculation this complex. If $W$ were a scalar instead, we'd have $partial_k Z^TWZ=Wpartial_k (Z^TZ)=-2Z^TWA$. But when we reinstate $W$'s matrix status, we note its antisymmetric part doesn't even contribute to the scalar we're differentiating, so without loss of generality $W$ should be replaced throughout with its symmetric part $(W+W^T)/2$. That gives $-Z^T(W+W^T)A$ instead, as we've found.
answered Dec 8 '18 at 22:59
J.G.J.G.
26.7k22742
26.7k22742
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3031744%2fderivative-of-matrix-expression-y-%25e2%2588%2592-a-betatwy-%25e2%2588%2592-a-beta-wrt-beta%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown