How can maximizing $x^T A x$ where $A$ is positive semi-definite be reduced to maximizing $trace(x^T A x)$?












0














Suppose $A$ is a given matrix of shape $n times n$, and $x$ is some unknown matrix of shape $n times m$, the objective is



$$
begin{array}{rl}
& max_x x^T A x \
text{subject to } & lambda_i(A) ge 0 text{ for all } 1 le i le n \
& |x_i|_2 = 1 text{where $x_i$ is the $i^{th}$ column of $x$}
end{array}
$$



where the objective is a $m times m$ matrix.



My question is can it be reduced to



$$
begin{array}{rl}
& max_x tr(x^T A x) \
text{subject to } & lambda_i(A) ge 0 text{ for all } 1 le i le n \
& |x_i|_2 = 1 text{where $x_i$ is the $i^{th}$ column of $x$}
end{array}
$$



where the objective is now a scalar.



This question came up as I was learning Fisher's discriminant analysis in the multi-class setting, where the objective is to find a matrix $W$ to transform $X$ onto a low-dimensional space so that the between-class covariance is maximized and the within-class covariance is minimized. And most books give the following criterion,



$$
max_W frac{tr(W^T S_b W)}{tr(W^T S_w W)}
$$



where $S_b$ is the between-class scatter matrix, and $S_w$ is the within-class scatter. I believe the scatter matrix is used here instead of the estimate of covariance matrix is because they only differ by a positive constant term.



But I don't understand why we can formulate the criterion using trace when we want to maximize a matrix. In other words, I don't know the proof of this reduction.



UPDATE: What I've tried: In the context of Fisher's discriminant analysis, I think maximizing a covariance matrix directly amounts to maximizing the Frobenius norm of that matrix, because we want everything to covariate as much as possible, every elements in that matrix should be large in absolute value. But this is not rigorous enough to my liking. Even if it's correct, it only lead to



$$
max_x x^T A x iff max_x |x^T A x|_F iff max_x sqrt{tr(x^T A^T x x^T A x)}
$$










share|cite|improve this question




















  • 2




    For $X^TAX$ to be defined, if $X$ has shape $ntimes m$ and $A$ has shape $ntimes p$, then you must have $p=n$, that is, $A$ must be square. Moreover, $X^TAX$ has shape $mtimes m$, what does it mean to maxmize it? Let's imagine for a moment that we can give it some meaning, what happens if we multiply $X$ by a constant?
    – Jean-Claude Arbaut
    Nov 23 at 13:11












  • @Jean-ClaudeArbaut Yes, $A$ is square here since it's positive semidefinite, I'll edit it to say the shape is $n times n$. I think the meaning in the context of Fisher's discriminant analysis should be maximizing the Frobenius norm. Please see my updates.
    – longtengaa
    Nov 23 at 13:20










  • But then you can make the Frobenius norm arbitrarily large by multiplying $X$ by a large constant (when you maximize with vector $x$, you usually require $||x||=1$ to avoid this, what here?)
    – Jean-Claude Arbaut
    Nov 23 at 13:24










  • @Jean-ClaudeArbaut oh, that's right. I was trying to formulate a problem in its own right instead of in the context of Fisher's discriminant analysis (FDA). In FDA $|x| = 1$ is not needed because both denominator and nominator have $W$.
    – longtengaa
    Nov 23 at 13:29










  • The question is: How do you maximize a matrix? What's your metric at all?
    – Mostafa Ayaz
    Nov 24 at 12:41
















0














Suppose $A$ is a given matrix of shape $n times n$, and $x$ is some unknown matrix of shape $n times m$, the objective is



$$
begin{array}{rl}
& max_x x^T A x \
text{subject to } & lambda_i(A) ge 0 text{ for all } 1 le i le n \
& |x_i|_2 = 1 text{where $x_i$ is the $i^{th}$ column of $x$}
end{array}
$$



where the objective is a $m times m$ matrix.



My question is can it be reduced to



$$
begin{array}{rl}
& max_x tr(x^T A x) \
text{subject to } & lambda_i(A) ge 0 text{ for all } 1 le i le n \
& |x_i|_2 = 1 text{where $x_i$ is the $i^{th}$ column of $x$}
end{array}
$$



where the objective is now a scalar.



This question came up as I was learning Fisher's discriminant analysis in the multi-class setting, where the objective is to find a matrix $W$ to transform $X$ onto a low-dimensional space so that the between-class covariance is maximized and the within-class covariance is minimized. And most books give the following criterion,



$$
max_W frac{tr(W^T S_b W)}{tr(W^T S_w W)}
$$



where $S_b$ is the between-class scatter matrix, and $S_w$ is the within-class scatter. I believe the scatter matrix is used here instead of the estimate of covariance matrix is because they only differ by a positive constant term.



But I don't understand why we can formulate the criterion using trace when we want to maximize a matrix. In other words, I don't know the proof of this reduction.



UPDATE: What I've tried: In the context of Fisher's discriminant analysis, I think maximizing a covariance matrix directly amounts to maximizing the Frobenius norm of that matrix, because we want everything to covariate as much as possible, every elements in that matrix should be large in absolute value. But this is not rigorous enough to my liking. Even if it's correct, it only lead to



$$
max_x x^T A x iff max_x |x^T A x|_F iff max_x sqrt{tr(x^T A^T x x^T A x)}
$$










share|cite|improve this question




















  • 2




    For $X^TAX$ to be defined, if $X$ has shape $ntimes m$ and $A$ has shape $ntimes p$, then you must have $p=n$, that is, $A$ must be square. Moreover, $X^TAX$ has shape $mtimes m$, what does it mean to maxmize it? Let's imagine for a moment that we can give it some meaning, what happens if we multiply $X$ by a constant?
    – Jean-Claude Arbaut
    Nov 23 at 13:11












  • @Jean-ClaudeArbaut Yes, $A$ is square here since it's positive semidefinite, I'll edit it to say the shape is $n times n$. I think the meaning in the context of Fisher's discriminant analysis should be maximizing the Frobenius norm. Please see my updates.
    – longtengaa
    Nov 23 at 13:20










  • But then you can make the Frobenius norm arbitrarily large by multiplying $X$ by a large constant (when you maximize with vector $x$, you usually require $||x||=1$ to avoid this, what here?)
    – Jean-Claude Arbaut
    Nov 23 at 13:24










  • @Jean-ClaudeArbaut oh, that's right. I was trying to formulate a problem in its own right instead of in the context of Fisher's discriminant analysis (FDA). In FDA $|x| = 1$ is not needed because both denominator and nominator have $W$.
    – longtengaa
    Nov 23 at 13:29










  • The question is: How do you maximize a matrix? What's your metric at all?
    – Mostafa Ayaz
    Nov 24 at 12:41














0












0








0


0





Suppose $A$ is a given matrix of shape $n times n$, and $x$ is some unknown matrix of shape $n times m$, the objective is



$$
begin{array}{rl}
& max_x x^T A x \
text{subject to } & lambda_i(A) ge 0 text{ for all } 1 le i le n \
& |x_i|_2 = 1 text{where $x_i$ is the $i^{th}$ column of $x$}
end{array}
$$



where the objective is a $m times m$ matrix.



My question is can it be reduced to



$$
begin{array}{rl}
& max_x tr(x^T A x) \
text{subject to } & lambda_i(A) ge 0 text{ for all } 1 le i le n \
& |x_i|_2 = 1 text{where $x_i$ is the $i^{th}$ column of $x$}
end{array}
$$



where the objective is now a scalar.



This question came up as I was learning Fisher's discriminant analysis in the multi-class setting, where the objective is to find a matrix $W$ to transform $X$ onto a low-dimensional space so that the between-class covariance is maximized and the within-class covariance is minimized. And most books give the following criterion,



$$
max_W frac{tr(W^T S_b W)}{tr(W^T S_w W)}
$$



where $S_b$ is the between-class scatter matrix, and $S_w$ is the within-class scatter. I believe the scatter matrix is used here instead of the estimate of covariance matrix is because they only differ by a positive constant term.



But I don't understand why we can formulate the criterion using trace when we want to maximize a matrix. In other words, I don't know the proof of this reduction.



UPDATE: What I've tried: In the context of Fisher's discriminant analysis, I think maximizing a covariance matrix directly amounts to maximizing the Frobenius norm of that matrix, because we want everything to covariate as much as possible, every elements in that matrix should be large in absolute value. But this is not rigorous enough to my liking. Even if it's correct, it only lead to



$$
max_x x^T A x iff max_x |x^T A x|_F iff max_x sqrt{tr(x^T A^T x x^T A x)}
$$










share|cite|improve this question















Suppose $A$ is a given matrix of shape $n times n$, and $x$ is some unknown matrix of shape $n times m$, the objective is



$$
begin{array}{rl}
& max_x x^T A x \
text{subject to } & lambda_i(A) ge 0 text{ for all } 1 le i le n \
& |x_i|_2 = 1 text{where $x_i$ is the $i^{th}$ column of $x$}
end{array}
$$



where the objective is a $m times m$ matrix.



My question is can it be reduced to



$$
begin{array}{rl}
& max_x tr(x^T A x) \
text{subject to } & lambda_i(A) ge 0 text{ for all } 1 le i le n \
& |x_i|_2 = 1 text{where $x_i$ is the $i^{th}$ column of $x$}
end{array}
$$



where the objective is now a scalar.



This question came up as I was learning Fisher's discriminant analysis in the multi-class setting, where the objective is to find a matrix $W$ to transform $X$ onto a low-dimensional space so that the between-class covariance is maximized and the within-class covariance is minimized. And most books give the following criterion,



$$
max_W frac{tr(W^T S_b W)}{tr(W^T S_w W)}
$$



where $S_b$ is the between-class scatter matrix, and $S_w$ is the within-class scatter. I believe the scatter matrix is used here instead of the estimate of covariance matrix is because they only differ by a positive constant term.



But I don't understand why we can formulate the criterion using trace when we want to maximize a matrix. In other words, I don't know the proof of this reduction.



UPDATE: What I've tried: In the context of Fisher's discriminant analysis, I think maximizing a covariance matrix directly amounts to maximizing the Frobenius norm of that matrix, because we want everything to covariate as much as possible, every elements in that matrix should be large in absolute value. But this is not rigorous enough to my liking. Even if it's correct, it only lead to



$$
max_x x^T A x iff max_x |x^T A x|_F iff max_x sqrt{tr(x^T A^T x x^T A x)}
$$







matrices optimization trace






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Nov 23 at 13:36

























asked Nov 23 at 13:06









longtengaa

11




11








  • 2




    For $X^TAX$ to be defined, if $X$ has shape $ntimes m$ and $A$ has shape $ntimes p$, then you must have $p=n$, that is, $A$ must be square. Moreover, $X^TAX$ has shape $mtimes m$, what does it mean to maxmize it? Let's imagine for a moment that we can give it some meaning, what happens if we multiply $X$ by a constant?
    – Jean-Claude Arbaut
    Nov 23 at 13:11












  • @Jean-ClaudeArbaut Yes, $A$ is square here since it's positive semidefinite, I'll edit it to say the shape is $n times n$. I think the meaning in the context of Fisher's discriminant analysis should be maximizing the Frobenius norm. Please see my updates.
    – longtengaa
    Nov 23 at 13:20










  • But then you can make the Frobenius norm arbitrarily large by multiplying $X$ by a large constant (when you maximize with vector $x$, you usually require $||x||=1$ to avoid this, what here?)
    – Jean-Claude Arbaut
    Nov 23 at 13:24










  • @Jean-ClaudeArbaut oh, that's right. I was trying to formulate a problem in its own right instead of in the context of Fisher's discriminant analysis (FDA). In FDA $|x| = 1$ is not needed because both denominator and nominator have $W$.
    – longtengaa
    Nov 23 at 13:29










  • The question is: How do you maximize a matrix? What's your metric at all?
    – Mostafa Ayaz
    Nov 24 at 12:41














  • 2




    For $X^TAX$ to be defined, if $X$ has shape $ntimes m$ and $A$ has shape $ntimes p$, then you must have $p=n$, that is, $A$ must be square. Moreover, $X^TAX$ has shape $mtimes m$, what does it mean to maxmize it? Let's imagine for a moment that we can give it some meaning, what happens if we multiply $X$ by a constant?
    – Jean-Claude Arbaut
    Nov 23 at 13:11












  • @Jean-ClaudeArbaut Yes, $A$ is square here since it's positive semidefinite, I'll edit it to say the shape is $n times n$. I think the meaning in the context of Fisher's discriminant analysis should be maximizing the Frobenius norm. Please see my updates.
    – longtengaa
    Nov 23 at 13:20










  • But then you can make the Frobenius norm arbitrarily large by multiplying $X$ by a large constant (when you maximize with vector $x$, you usually require $||x||=1$ to avoid this, what here?)
    – Jean-Claude Arbaut
    Nov 23 at 13:24










  • @Jean-ClaudeArbaut oh, that's right. I was trying to formulate a problem in its own right instead of in the context of Fisher's discriminant analysis (FDA). In FDA $|x| = 1$ is not needed because both denominator and nominator have $W$.
    – longtengaa
    Nov 23 at 13:29










  • The question is: How do you maximize a matrix? What's your metric at all?
    – Mostafa Ayaz
    Nov 24 at 12:41








2




2




For $X^TAX$ to be defined, if $X$ has shape $ntimes m$ and $A$ has shape $ntimes p$, then you must have $p=n$, that is, $A$ must be square. Moreover, $X^TAX$ has shape $mtimes m$, what does it mean to maxmize it? Let's imagine for a moment that we can give it some meaning, what happens if we multiply $X$ by a constant?
– Jean-Claude Arbaut
Nov 23 at 13:11






For $X^TAX$ to be defined, if $X$ has shape $ntimes m$ and $A$ has shape $ntimes p$, then you must have $p=n$, that is, $A$ must be square. Moreover, $X^TAX$ has shape $mtimes m$, what does it mean to maxmize it? Let's imagine for a moment that we can give it some meaning, what happens if we multiply $X$ by a constant?
– Jean-Claude Arbaut
Nov 23 at 13:11














@Jean-ClaudeArbaut Yes, $A$ is square here since it's positive semidefinite, I'll edit it to say the shape is $n times n$. I think the meaning in the context of Fisher's discriminant analysis should be maximizing the Frobenius norm. Please see my updates.
– longtengaa
Nov 23 at 13:20




@Jean-ClaudeArbaut Yes, $A$ is square here since it's positive semidefinite, I'll edit it to say the shape is $n times n$. I think the meaning in the context of Fisher's discriminant analysis should be maximizing the Frobenius norm. Please see my updates.
– longtengaa
Nov 23 at 13:20












But then you can make the Frobenius norm arbitrarily large by multiplying $X$ by a large constant (when you maximize with vector $x$, you usually require $||x||=1$ to avoid this, what here?)
– Jean-Claude Arbaut
Nov 23 at 13:24




But then you can make the Frobenius norm arbitrarily large by multiplying $X$ by a large constant (when you maximize with vector $x$, you usually require $||x||=1$ to avoid this, what here?)
– Jean-Claude Arbaut
Nov 23 at 13:24












@Jean-ClaudeArbaut oh, that's right. I was trying to formulate a problem in its own right instead of in the context of Fisher's discriminant analysis (FDA). In FDA $|x| = 1$ is not needed because both denominator and nominator have $W$.
– longtengaa
Nov 23 at 13:29




@Jean-ClaudeArbaut oh, that's right. I was trying to formulate a problem in its own right instead of in the context of Fisher's discriminant analysis (FDA). In FDA $|x| = 1$ is not needed because both denominator and nominator have $W$.
– longtengaa
Nov 23 at 13:29












The question is: How do you maximize a matrix? What's your metric at all?
– Mostafa Ayaz
Nov 24 at 12:41




The question is: How do you maximize a matrix? What's your metric at all?
– Mostafa Ayaz
Nov 24 at 12:41















active

oldest

votes











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3010342%2fhow-can-maximizing-xt-a-x-where-a-is-positive-semi-definite-be-reduced-to-m%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Mathematics Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3010342%2fhow-can-maximizing-xt-a-x-where-a-is-positive-semi-definite-be-reduced-to-m%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Plaza Victoria

Brian Clough

Cáceres