Is it possible to define a Hessian Matrix for a Matrix-valued function?
$begingroup$
So I'm doing a project on optimization (non-negative matrix factorization), which I know is not convex, from this question:
Why does the non-negative matrix factorization problem non-convex?
However this was addressed only for the scalar case which is not my project's focus.
My question is: How am I supposed to define a gradient and a Hessian matrix for more general cases. Is it possible? Is it still called a Hessian matrix or is it some sort of tensor (of which I don't really know much).
My function:
$$ f = min_{W, H} leftlVert X ; - ; WH rightrVert_{F}^{2} $$
Which is equivalent to:
$$ min_{W, H} f = trlbrack (X ; - ; WH)(X ; - ; WH)^{T} rbrack = trlbrack (X ; - ; WH)^{T}(X ; - ; WH) rbrack $$
I know how to calculate the partial derivatives of this function, and I acutally have:
$$ frac{partial f}{partial W} = frac{partial trlbrack (X ; - ; WH)(X ; - ; WH)^{T} rbrack}{partial W} = -2XH^{T} + 2WHH^{T}$$
Which is the same result as equation (24) found in this document:
http://cal.cs.illinois.edu/~johannes/research/matrix%20calculus.pdf
Applying the same idea, i calculated the other partial derivative:
$$ frac{partial f}{partial H} = frac{partial trlbrack (X ; - ; WH)(X ; - ; WH)^{T} rbrack}{partial H} = -2W^{T}X + 2W^{T}WH$$
If both of these are correct, is it possible to define a vector with entries that are matrices such that:
$$ nabla f(W, H) = left(frac{partial f}{partial W} ;; frac{partial f}{partial H} right)$$
And what would be the way to compute the Hessian? (If it makes sense).
Please do not mark as duplicate. The question
Defining the Hessian of a function that takes general matrices as an input has no appropriate answer on how to compute these but rather an example to disprove the OP's method and an insult.
matrix-equations matrix-calculus nonlinear-optimization
$endgroup$
|
show 2 more comments
$begingroup$
So I'm doing a project on optimization (non-negative matrix factorization), which I know is not convex, from this question:
Why does the non-negative matrix factorization problem non-convex?
However this was addressed only for the scalar case which is not my project's focus.
My question is: How am I supposed to define a gradient and a Hessian matrix for more general cases. Is it possible? Is it still called a Hessian matrix or is it some sort of tensor (of which I don't really know much).
My function:
$$ f = min_{W, H} leftlVert X ; - ; WH rightrVert_{F}^{2} $$
Which is equivalent to:
$$ min_{W, H} f = trlbrack (X ; - ; WH)(X ; - ; WH)^{T} rbrack = trlbrack (X ; - ; WH)^{T}(X ; - ; WH) rbrack $$
I know how to calculate the partial derivatives of this function, and I acutally have:
$$ frac{partial f}{partial W} = frac{partial trlbrack (X ; - ; WH)(X ; - ; WH)^{T} rbrack}{partial W} = -2XH^{T} + 2WHH^{T}$$
Which is the same result as equation (24) found in this document:
http://cal.cs.illinois.edu/~johannes/research/matrix%20calculus.pdf
Applying the same idea, i calculated the other partial derivative:
$$ frac{partial f}{partial H} = frac{partial trlbrack (X ; - ; WH)(X ; - ; WH)^{T} rbrack}{partial H} = -2W^{T}X + 2W^{T}WH$$
If both of these are correct, is it possible to define a vector with entries that are matrices such that:
$$ nabla f(W, H) = left(frac{partial f}{partial W} ;; frac{partial f}{partial H} right)$$
And what would be the way to compute the Hessian? (If it makes sense).
Please do not mark as duplicate. The question
Defining the Hessian of a function that takes general matrices as an input has no appropriate answer on how to compute these but rather an example to disprove the OP's method and an insult.
matrix-equations matrix-calculus nonlinear-optimization
$endgroup$
$begingroup$
What's your function $f$?
$endgroup$
– user550103
Dec 24 '18 at 11:50
$begingroup$
@user550103 I've added the function.
$endgroup$
– Maganna Dev
Dec 24 '18 at 12:07
$begingroup$
Your gradients look correct.
$endgroup$
– user550103
Dec 24 '18 at 12:19
$begingroup$
@user550103 What troubles me are the dimensions of each partial derivative. if $$ frac{partial f}{partial W} in R^{m times r}$$ and $$ frac{partial f}{partial H} in R^{r times n}$$, does the gradient i want to define exist? Does the Hessian exist?
$endgroup$
– Maganna Dev
Dec 24 '18 at 12:22
$begingroup$
If I am understanding your problem, then the gradients are matrices. Yes, they are matrices, which is as expected...
$endgroup$
– user550103
Dec 24 '18 at 12:25
|
show 2 more comments
$begingroup$
So I'm doing a project on optimization (non-negative matrix factorization), which I know is not convex, from this question:
Why does the non-negative matrix factorization problem non-convex?
However this was addressed only for the scalar case which is not my project's focus.
My question is: How am I supposed to define a gradient and a Hessian matrix for more general cases. Is it possible? Is it still called a Hessian matrix or is it some sort of tensor (of which I don't really know much).
My function:
$$ f = min_{W, H} leftlVert X ; - ; WH rightrVert_{F}^{2} $$
Which is equivalent to:
$$ min_{W, H} f = trlbrack (X ; - ; WH)(X ; - ; WH)^{T} rbrack = trlbrack (X ; - ; WH)^{T}(X ; - ; WH) rbrack $$
I know how to calculate the partial derivatives of this function, and I acutally have:
$$ frac{partial f}{partial W} = frac{partial trlbrack (X ; - ; WH)(X ; - ; WH)^{T} rbrack}{partial W} = -2XH^{T} + 2WHH^{T}$$
Which is the same result as equation (24) found in this document:
http://cal.cs.illinois.edu/~johannes/research/matrix%20calculus.pdf
Applying the same idea, i calculated the other partial derivative:
$$ frac{partial f}{partial H} = frac{partial trlbrack (X ; - ; WH)(X ; - ; WH)^{T} rbrack}{partial H} = -2W^{T}X + 2W^{T}WH$$
If both of these are correct, is it possible to define a vector with entries that are matrices such that:
$$ nabla f(W, H) = left(frac{partial f}{partial W} ;; frac{partial f}{partial H} right)$$
And what would be the way to compute the Hessian? (If it makes sense).
Please do not mark as duplicate. The question
Defining the Hessian of a function that takes general matrices as an input has no appropriate answer on how to compute these but rather an example to disprove the OP's method and an insult.
matrix-equations matrix-calculus nonlinear-optimization
$endgroup$
So I'm doing a project on optimization (non-negative matrix factorization), which I know is not convex, from this question:
Why does the non-negative matrix factorization problem non-convex?
However this was addressed only for the scalar case which is not my project's focus.
My question is: How am I supposed to define a gradient and a Hessian matrix for more general cases. Is it possible? Is it still called a Hessian matrix or is it some sort of tensor (of which I don't really know much).
My function:
$$ f = min_{W, H} leftlVert X ; - ; WH rightrVert_{F}^{2} $$
Which is equivalent to:
$$ min_{W, H} f = trlbrack (X ; - ; WH)(X ; - ; WH)^{T} rbrack = trlbrack (X ; - ; WH)^{T}(X ; - ; WH) rbrack $$
I know how to calculate the partial derivatives of this function, and I acutally have:
$$ frac{partial f}{partial W} = frac{partial trlbrack (X ; - ; WH)(X ; - ; WH)^{T} rbrack}{partial W} = -2XH^{T} + 2WHH^{T}$$
Which is the same result as equation (24) found in this document:
http://cal.cs.illinois.edu/~johannes/research/matrix%20calculus.pdf
Applying the same idea, i calculated the other partial derivative:
$$ frac{partial f}{partial H} = frac{partial trlbrack (X ; - ; WH)(X ; - ; WH)^{T} rbrack}{partial H} = -2W^{T}X + 2W^{T}WH$$
If both of these are correct, is it possible to define a vector with entries that are matrices such that:
$$ nabla f(W, H) = left(frac{partial f}{partial W} ;; frac{partial f}{partial H} right)$$
And what would be the way to compute the Hessian? (If it makes sense).
Please do not mark as duplicate. The question
Defining the Hessian of a function that takes general matrices as an input has no appropriate answer on how to compute these but rather an example to disprove the OP's method and an insult.
matrix-equations matrix-calculus nonlinear-optimization
matrix-equations matrix-calculus nonlinear-optimization
edited Dec 24 '18 at 12:02
Maganna Dev
asked Dec 24 '18 at 10:08
Maganna DevMaganna Dev
33
33
$begingroup$
What's your function $f$?
$endgroup$
– user550103
Dec 24 '18 at 11:50
$begingroup$
@user550103 I've added the function.
$endgroup$
– Maganna Dev
Dec 24 '18 at 12:07
$begingroup$
Your gradients look correct.
$endgroup$
– user550103
Dec 24 '18 at 12:19
$begingroup$
@user550103 What troubles me are the dimensions of each partial derivative. if $$ frac{partial f}{partial W} in R^{m times r}$$ and $$ frac{partial f}{partial H} in R^{r times n}$$, does the gradient i want to define exist? Does the Hessian exist?
$endgroup$
– Maganna Dev
Dec 24 '18 at 12:22
$begingroup$
If I am understanding your problem, then the gradients are matrices. Yes, they are matrices, which is as expected...
$endgroup$
– user550103
Dec 24 '18 at 12:25
|
show 2 more comments
$begingroup$
What's your function $f$?
$endgroup$
– user550103
Dec 24 '18 at 11:50
$begingroup$
@user550103 I've added the function.
$endgroup$
– Maganna Dev
Dec 24 '18 at 12:07
$begingroup$
Your gradients look correct.
$endgroup$
– user550103
Dec 24 '18 at 12:19
$begingroup$
@user550103 What troubles me are the dimensions of each partial derivative. if $$ frac{partial f}{partial W} in R^{m times r}$$ and $$ frac{partial f}{partial H} in R^{r times n}$$, does the gradient i want to define exist? Does the Hessian exist?
$endgroup$
– Maganna Dev
Dec 24 '18 at 12:22
$begingroup$
If I am understanding your problem, then the gradients are matrices. Yes, they are matrices, which is as expected...
$endgroup$
– user550103
Dec 24 '18 at 12:25
$begingroup$
What's your function $f$?
$endgroup$
– user550103
Dec 24 '18 at 11:50
$begingroup$
What's your function $f$?
$endgroup$
– user550103
Dec 24 '18 at 11:50
$begingroup$
@user550103 I've added the function.
$endgroup$
– Maganna Dev
Dec 24 '18 at 12:07
$begingroup$
@user550103 I've added the function.
$endgroup$
– Maganna Dev
Dec 24 '18 at 12:07
$begingroup$
Your gradients look correct.
$endgroup$
– user550103
Dec 24 '18 at 12:19
$begingroup$
Your gradients look correct.
$endgroup$
– user550103
Dec 24 '18 at 12:19
$begingroup$
@user550103 What troubles me are the dimensions of each partial derivative. if $$ frac{partial f}{partial W} in R^{m times r}$$ and $$ frac{partial f}{partial H} in R^{r times n}$$, does the gradient i want to define exist? Does the Hessian exist?
$endgroup$
– Maganna Dev
Dec 24 '18 at 12:22
$begingroup$
@user550103 What troubles me are the dimensions of each partial derivative. if $$ frac{partial f}{partial W} in R^{m times r}$$ and $$ frac{partial f}{partial H} in R^{r times n}$$, does the gradient i want to define exist? Does the Hessian exist?
$endgroup$
– Maganna Dev
Dec 24 '18 at 12:22
$begingroup$
If I am understanding your problem, then the gradients are matrices. Yes, they are matrices, which is as expected...
$endgroup$
– user550103
Dec 24 '18 at 12:25
$begingroup$
If I am understanding your problem, then the gradients are matrices. Yes, they are matrices, which is as expected...
$endgroup$
– user550103
Dec 24 '18 at 12:25
|
show 2 more comments
1 Answer
1
active
oldest
votes
$begingroup$
Define a new matrix $$Y=WH-X$$
Write the function in terms of this new variable
$$f = |Y|^2_F = Y:Y$$
where a colon denotes the trace/Frobenius product, i.e. $,,A:B={rm tr}(A^TB)$
Find its differential and gradients.
$$eqalign{
df &= 2Y:dY cr
&= 2Y:(dW,H+W,dH) cr
&= 2YH^T:dW + 2W^TY:dH cr
&= 2(WH-X)H^T:dW + 2W^T(WH-X):dH cr
frac{partial f}{partial W} &= 2(WH-X)H^T,quad
frac{partial f}{partial H} = 2W^T(WH-X) cr
}$$
Since the gradients are themselves matrices, the hessians will be 4th order tensors which cannot be represented in matrix notation.
One way to approach the hessian is to use vectorization which flattens matrices into vectors.
For example,
$$eqalign{
G &= frac{partial f}{partial W} = 2WHH^T - 2XH^T cr
dG &= 2,dW,HH^T cr
{rm vec}(dG) &= 2,{rm vec}(dW,HH^T) cr
dg &= 2,(HH^Totimes I),dw cr
nabla_{ww}f &= 2,(HH^Totimes I) cr
}$$
Working through the other hessians
$$eqalign{
nabla_{hh}f &= 2,(Iotimes W^TW) cr
nabla_{wh}f &= 2(Hotimes W) + 2(Iotimes Y)K cr
nabla_{hw}f &= 2(H^Totimes W^T) + 2(Y^Totimes I)K cr
}$$
where $K$ is the Commutation Matrix which can be used to vectorize the transpose of a matrix
$${rm vec}(X^T) = K,{rm vec}(X)$$
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3051113%2fis-it-possible-to-define-a-hessian-matrix-for-a-matrix-valued-function%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Define a new matrix $$Y=WH-X$$
Write the function in terms of this new variable
$$f = |Y|^2_F = Y:Y$$
where a colon denotes the trace/Frobenius product, i.e. $,,A:B={rm tr}(A^TB)$
Find its differential and gradients.
$$eqalign{
df &= 2Y:dY cr
&= 2Y:(dW,H+W,dH) cr
&= 2YH^T:dW + 2W^TY:dH cr
&= 2(WH-X)H^T:dW + 2W^T(WH-X):dH cr
frac{partial f}{partial W} &= 2(WH-X)H^T,quad
frac{partial f}{partial H} = 2W^T(WH-X) cr
}$$
Since the gradients are themselves matrices, the hessians will be 4th order tensors which cannot be represented in matrix notation.
One way to approach the hessian is to use vectorization which flattens matrices into vectors.
For example,
$$eqalign{
G &= frac{partial f}{partial W} = 2WHH^T - 2XH^T cr
dG &= 2,dW,HH^T cr
{rm vec}(dG) &= 2,{rm vec}(dW,HH^T) cr
dg &= 2,(HH^Totimes I),dw cr
nabla_{ww}f &= 2,(HH^Totimes I) cr
}$$
Working through the other hessians
$$eqalign{
nabla_{hh}f &= 2,(Iotimes W^TW) cr
nabla_{wh}f &= 2(Hotimes W) + 2(Iotimes Y)K cr
nabla_{hw}f &= 2(H^Totimes W^T) + 2(Y^Totimes I)K cr
}$$
where $K$ is the Commutation Matrix which can be used to vectorize the transpose of a matrix
$${rm vec}(X^T) = K,{rm vec}(X)$$
$endgroup$
add a comment |
$begingroup$
Define a new matrix $$Y=WH-X$$
Write the function in terms of this new variable
$$f = |Y|^2_F = Y:Y$$
where a colon denotes the trace/Frobenius product, i.e. $,,A:B={rm tr}(A^TB)$
Find its differential and gradients.
$$eqalign{
df &= 2Y:dY cr
&= 2Y:(dW,H+W,dH) cr
&= 2YH^T:dW + 2W^TY:dH cr
&= 2(WH-X)H^T:dW + 2W^T(WH-X):dH cr
frac{partial f}{partial W} &= 2(WH-X)H^T,quad
frac{partial f}{partial H} = 2W^T(WH-X) cr
}$$
Since the gradients are themselves matrices, the hessians will be 4th order tensors which cannot be represented in matrix notation.
One way to approach the hessian is to use vectorization which flattens matrices into vectors.
For example,
$$eqalign{
G &= frac{partial f}{partial W} = 2WHH^T - 2XH^T cr
dG &= 2,dW,HH^T cr
{rm vec}(dG) &= 2,{rm vec}(dW,HH^T) cr
dg &= 2,(HH^Totimes I),dw cr
nabla_{ww}f &= 2,(HH^Totimes I) cr
}$$
Working through the other hessians
$$eqalign{
nabla_{hh}f &= 2,(Iotimes W^TW) cr
nabla_{wh}f &= 2(Hotimes W) + 2(Iotimes Y)K cr
nabla_{hw}f &= 2(H^Totimes W^T) + 2(Y^Totimes I)K cr
}$$
where $K$ is the Commutation Matrix which can be used to vectorize the transpose of a matrix
$${rm vec}(X^T) = K,{rm vec}(X)$$
$endgroup$
add a comment |
$begingroup$
Define a new matrix $$Y=WH-X$$
Write the function in terms of this new variable
$$f = |Y|^2_F = Y:Y$$
where a colon denotes the trace/Frobenius product, i.e. $,,A:B={rm tr}(A^TB)$
Find its differential and gradients.
$$eqalign{
df &= 2Y:dY cr
&= 2Y:(dW,H+W,dH) cr
&= 2YH^T:dW + 2W^TY:dH cr
&= 2(WH-X)H^T:dW + 2W^T(WH-X):dH cr
frac{partial f}{partial W} &= 2(WH-X)H^T,quad
frac{partial f}{partial H} = 2W^T(WH-X) cr
}$$
Since the gradients are themselves matrices, the hessians will be 4th order tensors which cannot be represented in matrix notation.
One way to approach the hessian is to use vectorization which flattens matrices into vectors.
For example,
$$eqalign{
G &= frac{partial f}{partial W} = 2WHH^T - 2XH^T cr
dG &= 2,dW,HH^T cr
{rm vec}(dG) &= 2,{rm vec}(dW,HH^T) cr
dg &= 2,(HH^Totimes I),dw cr
nabla_{ww}f &= 2,(HH^Totimes I) cr
}$$
Working through the other hessians
$$eqalign{
nabla_{hh}f &= 2,(Iotimes W^TW) cr
nabla_{wh}f &= 2(Hotimes W) + 2(Iotimes Y)K cr
nabla_{hw}f &= 2(H^Totimes W^T) + 2(Y^Totimes I)K cr
}$$
where $K$ is the Commutation Matrix which can be used to vectorize the transpose of a matrix
$${rm vec}(X^T) = K,{rm vec}(X)$$
$endgroup$
Define a new matrix $$Y=WH-X$$
Write the function in terms of this new variable
$$f = |Y|^2_F = Y:Y$$
where a colon denotes the trace/Frobenius product, i.e. $,,A:B={rm tr}(A^TB)$
Find its differential and gradients.
$$eqalign{
df &= 2Y:dY cr
&= 2Y:(dW,H+W,dH) cr
&= 2YH^T:dW + 2W^TY:dH cr
&= 2(WH-X)H^T:dW + 2W^T(WH-X):dH cr
frac{partial f}{partial W} &= 2(WH-X)H^T,quad
frac{partial f}{partial H} = 2W^T(WH-X) cr
}$$
Since the gradients are themselves matrices, the hessians will be 4th order tensors which cannot be represented in matrix notation.
One way to approach the hessian is to use vectorization which flattens matrices into vectors.
For example,
$$eqalign{
G &= frac{partial f}{partial W} = 2WHH^T - 2XH^T cr
dG &= 2,dW,HH^T cr
{rm vec}(dG) &= 2,{rm vec}(dW,HH^T) cr
dg &= 2,(HH^Totimes I),dw cr
nabla_{ww}f &= 2,(HH^Totimes I) cr
}$$
Working through the other hessians
$$eqalign{
nabla_{hh}f &= 2,(Iotimes W^TW) cr
nabla_{wh}f &= 2(Hotimes W) + 2(Iotimes Y)K cr
nabla_{hw}f &= 2(H^Totimes W^T) + 2(Y^Totimes I)K cr
}$$
where $K$ is the Commutation Matrix which can be used to vectorize the transpose of a matrix
$${rm vec}(X^T) = K,{rm vec}(X)$$
edited Dec 24 '18 at 14:19
answered Dec 24 '18 at 13:40
greggreg
9,4311825
9,4311825
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3051113%2fis-it-possible-to-define-a-hessian-matrix-for-a-matrix-valued-function%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
What's your function $f$?
$endgroup$
– user550103
Dec 24 '18 at 11:50
$begingroup$
@user550103 I've added the function.
$endgroup$
– Maganna Dev
Dec 24 '18 at 12:07
$begingroup$
Your gradients look correct.
$endgroup$
– user550103
Dec 24 '18 at 12:19
$begingroup$
@user550103 What troubles me are the dimensions of each partial derivative. if $$ frac{partial f}{partial W} in R^{m times r}$$ and $$ frac{partial f}{partial H} in R^{r times n}$$, does the gradient i want to define exist? Does the Hessian exist?
$endgroup$
– Maganna Dev
Dec 24 '18 at 12:22
$begingroup$
If I am understanding your problem, then the gradients are matrices. Yes, they are matrices, which is as expected...
$endgroup$
– user550103
Dec 24 '18 at 12:25