Gradient descent versus Lagrange multipliers
I am new to optimization methods and trying to understand them. I am familiar with two main methods.
gradient descent, which, as I understood, means we try to calculate the next point on the function which will take us closer to the min/max:
$$f(x^t)=x^{t-1}-lambda(frac{partial}{partial{x}}f(x^{t-1}))$$
where $lambda$ acts as the step size or learning rate.Lagrange multipliers, which is a method of finding the ideal place on the function (s.t another function) which will minmax it:
$$ mathcal{L}(x,lambda) = frac{partial}{partial{x}}f(x)-lambdafrac{partial}{partial{x}}g(x)$$
while
$$g(x)=0$$
that means $g(x)$ is the condition and $lambda$ is a scalar which adjust the normal amplitude (so that the normal of $f(x)$ will be with the same amplitude of $g(x)$. The result of both equations will result in specific x and lambda that minimize the function $f(x)$.
I dont understand some basic points here:
when we use Lagrange multipliers we result with specific $x$ of the minimum, where is the next step for the iterations leading to the optimal result?
if we result with the optimal $x$ how does $lambda$ affect the final result?
is there a connection between gradient descent and Lagrange multipliers? If so, what is the case where they are equal? (is gradient descent a specific case of lagrange multiplier in which $g(x)$ is a specific function? Will both $lambda$'s have the same meaning in this case?)
Thank you.
optimization vector-analysis lagrange-multiplier gradient-descent
|
show 2 more comments
I am new to optimization methods and trying to understand them. I am familiar with two main methods.
gradient descent, which, as I understood, means we try to calculate the next point on the function which will take us closer to the min/max:
$$f(x^t)=x^{t-1}-lambda(frac{partial}{partial{x}}f(x^{t-1}))$$
where $lambda$ acts as the step size or learning rate.Lagrange multipliers, which is a method of finding the ideal place on the function (s.t another function) which will minmax it:
$$ mathcal{L}(x,lambda) = frac{partial}{partial{x}}f(x)-lambdafrac{partial}{partial{x}}g(x)$$
while
$$g(x)=0$$
that means $g(x)$ is the condition and $lambda$ is a scalar which adjust the normal amplitude (so that the normal of $f(x)$ will be with the same amplitude of $g(x)$. The result of both equations will result in specific x and lambda that minimize the function $f(x)$.
I dont understand some basic points here:
when we use Lagrange multipliers we result with specific $x$ of the minimum, where is the next step for the iterations leading to the optimal result?
if we result with the optimal $x$ how does $lambda$ affect the final result?
is there a connection between gradient descent and Lagrange multipliers? If so, what is the case where they are equal? (is gradient descent a specific case of lagrange multiplier in which $g(x)$ is a specific function? Will both $lambda$'s have the same meaning in this case?)
Thank you.
optimization vector-analysis lagrange-multiplier gradient-descent
If $x$ is a minimizer of a smooth function $f:mathbb R^n to mathbb R$, then $nabla f(x) = 0$. One way to think about gradient descent is that it is an iterative method for solving the equation $nabla f(x) = 0$. Analogously, if $x$ is a minimizer of $f$ subject to the constraint that $g_i(x) = 0$ for $i = 1,...,m$, where each $g_i$ is smooth, then $x$ satisfies the optimality condtion $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$ for some scalars $lambda_i$. We can't use the gradient descent iteration to find a solution to this optimality condition, but another iteration might work.
– littleO
Nov 15 '17 at 11:38
thank you very much that was very inlightning. when we use the lagragian what will be the possible iteration in order to find the optimal x and lambda i understand from your explenation that gradient descent is a method to find the x where $$f(x^t)=f(x^{t-1})$$ how can we implant this idea over lagrange multipliers?
– optimize
Nov 15 '17 at 20:56
I would explain that last point a little differently. The gradient descent iteration is $x^{k+1} = x^k - t nabla f(x^k)$. If the gradient descent iteration converges to a point $x^star$, then we should have $x^star = x^star - t nabla f(x^star)$, which implies that $nabla f(x^star) = 0$. So, you can look at the gradient descent iteration as an iterative method to solve the optimality condition $nabla f(x) = 0$.
– littleO
Nov 15 '17 at 21:17
There are more complicated "primal-dual algorithms" that attempt to iteratively solve the "primal-dual optimality condition" $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$. They are more complicated to describe, and often they make additional assumptions about the problem structure. But the Augmented Lagrangian Method is one such algorithm to be aware of. If you're curious, you can read about this method in section 17.3 (p. 513) of Numerical Optimization by Nocedal and Wright. However, this is unnecessary if your goal is just to understand Lagrange multipliers and gradient descent.
– littleO
Nov 15 '17 at 21:21
yes that was clear to me. what will be the iteration for lagrange multipliers? isnt the result of the lagrange multiplier be the optimal x?
– optimize
Nov 15 '17 at 21:22
|
show 2 more comments
I am new to optimization methods and trying to understand them. I am familiar with two main methods.
gradient descent, which, as I understood, means we try to calculate the next point on the function which will take us closer to the min/max:
$$f(x^t)=x^{t-1}-lambda(frac{partial}{partial{x}}f(x^{t-1}))$$
where $lambda$ acts as the step size or learning rate.Lagrange multipliers, which is a method of finding the ideal place on the function (s.t another function) which will minmax it:
$$ mathcal{L}(x,lambda) = frac{partial}{partial{x}}f(x)-lambdafrac{partial}{partial{x}}g(x)$$
while
$$g(x)=0$$
that means $g(x)$ is the condition and $lambda$ is a scalar which adjust the normal amplitude (so that the normal of $f(x)$ will be with the same amplitude of $g(x)$. The result of both equations will result in specific x and lambda that minimize the function $f(x)$.
I dont understand some basic points here:
when we use Lagrange multipliers we result with specific $x$ of the minimum, where is the next step for the iterations leading to the optimal result?
if we result with the optimal $x$ how does $lambda$ affect the final result?
is there a connection between gradient descent and Lagrange multipliers? If so, what is the case where they are equal? (is gradient descent a specific case of lagrange multiplier in which $g(x)$ is a specific function? Will both $lambda$'s have the same meaning in this case?)
Thank you.
optimization vector-analysis lagrange-multiplier gradient-descent
I am new to optimization methods and trying to understand them. I am familiar with two main methods.
gradient descent, which, as I understood, means we try to calculate the next point on the function which will take us closer to the min/max:
$$f(x^t)=x^{t-1}-lambda(frac{partial}{partial{x}}f(x^{t-1}))$$
where $lambda$ acts as the step size or learning rate.Lagrange multipliers, which is a method of finding the ideal place on the function (s.t another function) which will minmax it:
$$ mathcal{L}(x,lambda) = frac{partial}{partial{x}}f(x)-lambdafrac{partial}{partial{x}}g(x)$$
while
$$g(x)=0$$
that means $g(x)$ is the condition and $lambda$ is a scalar which adjust the normal amplitude (so that the normal of $f(x)$ will be with the same amplitude of $g(x)$. The result of both equations will result in specific x and lambda that minimize the function $f(x)$.
I dont understand some basic points here:
when we use Lagrange multipliers we result with specific $x$ of the minimum, where is the next step for the iterations leading to the optimal result?
if we result with the optimal $x$ how does $lambda$ affect the final result?
is there a connection between gradient descent and Lagrange multipliers? If so, what is the case where they are equal? (is gradient descent a specific case of lagrange multiplier in which $g(x)$ is a specific function? Will both $lambda$'s have the same meaning in this case?)
Thank you.
optimization vector-analysis lagrange-multiplier gradient-descent
optimization vector-analysis lagrange-multiplier gradient-descent
edited Apr 6 '18 at 13:17
Rodrigo de Azevedo
12.8k41855
12.8k41855
asked Nov 15 '17 at 11:25
optimize
163
163
If $x$ is a minimizer of a smooth function $f:mathbb R^n to mathbb R$, then $nabla f(x) = 0$. One way to think about gradient descent is that it is an iterative method for solving the equation $nabla f(x) = 0$. Analogously, if $x$ is a minimizer of $f$ subject to the constraint that $g_i(x) = 0$ for $i = 1,...,m$, where each $g_i$ is smooth, then $x$ satisfies the optimality condtion $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$ for some scalars $lambda_i$. We can't use the gradient descent iteration to find a solution to this optimality condition, but another iteration might work.
– littleO
Nov 15 '17 at 11:38
thank you very much that was very inlightning. when we use the lagragian what will be the possible iteration in order to find the optimal x and lambda i understand from your explenation that gradient descent is a method to find the x where $$f(x^t)=f(x^{t-1})$$ how can we implant this idea over lagrange multipliers?
– optimize
Nov 15 '17 at 20:56
I would explain that last point a little differently. The gradient descent iteration is $x^{k+1} = x^k - t nabla f(x^k)$. If the gradient descent iteration converges to a point $x^star$, then we should have $x^star = x^star - t nabla f(x^star)$, which implies that $nabla f(x^star) = 0$. So, you can look at the gradient descent iteration as an iterative method to solve the optimality condition $nabla f(x) = 0$.
– littleO
Nov 15 '17 at 21:17
There are more complicated "primal-dual algorithms" that attempt to iteratively solve the "primal-dual optimality condition" $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$. They are more complicated to describe, and often they make additional assumptions about the problem structure. But the Augmented Lagrangian Method is one such algorithm to be aware of. If you're curious, you can read about this method in section 17.3 (p. 513) of Numerical Optimization by Nocedal and Wright. However, this is unnecessary if your goal is just to understand Lagrange multipliers and gradient descent.
– littleO
Nov 15 '17 at 21:21
yes that was clear to me. what will be the iteration for lagrange multipliers? isnt the result of the lagrange multiplier be the optimal x?
– optimize
Nov 15 '17 at 21:22
|
show 2 more comments
If $x$ is a minimizer of a smooth function $f:mathbb R^n to mathbb R$, then $nabla f(x) = 0$. One way to think about gradient descent is that it is an iterative method for solving the equation $nabla f(x) = 0$. Analogously, if $x$ is a minimizer of $f$ subject to the constraint that $g_i(x) = 0$ for $i = 1,...,m$, where each $g_i$ is smooth, then $x$ satisfies the optimality condtion $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$ for some scalars $lambda_i$. We can't use the gradient descent iteration to find a solution to this optimality condition, but another iteration might work.
– littleO
Nov 15 '17 at 11:38
thank you very much that was very inlightning. when we use the lagragian what will be the possible iteration in order to find the optimal x and lambda i understand from your explenation that gradient descent is a method to find the x where $$f(x^t)=f(x^{t-1})$$ how can we implant this idea over lagrange multipliers?
– optimize
Nov 15 '17 at 20:56
I would explain that last point a little differently. The gradient descent iteration is $x^{k+1} = x^k - t nabla f(x^k)$. If the gradient descent iteration converges to a point $x^star$, then we should have $x^star = x^star - t nabla f(x^star)$, which implies that $nabla f(x^star) = 0$. So, you can look at the gradient descent iteration as an iterative method to solve the optimality condition $nabla f(x) = 0$.
– littleO
Nov 15 '17 at 21:17
There are more complicated "primal-dual algorithms" that attempt to iteratively solve the "primal-dual optimality condition" $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$. They are more complicated to describe, and often they make additional assumptions about the problem structure. But the Augmented Lagrangian Method is one such algorithm to be aware of. If you're curious, you can read about this method in section 17.3 (p. 513) of Numerical Optimization by Nocedal and Wright. However, this is unnecessary if your goal is just to understand Lagrange multipliers and gradient descent.
– littleO
Nov 15 '17 at 21:21
yes that was clear to me. what will be the iteration for lagrange multipliers? isnt the result of the lagrange multiplier be the optimal x?
– optimize
Nov 15 '17 at 21:22
If $x$ is a minimizer of a smooth function $f:mathbb R^n to mathbb R$, then $nabla f(x) = 0$. One way to think about gradient descent is that it is an iterative method for solving the equation $nabla f(x) = 0$. Analogously, if $x$ is a minimizer of $f$ subject to the constraint that $g_i(x) = 0$ for $i = 1,...,m$, where each $g_i$ is smooth, then $x$ satisfies the optimality condtion $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$ for some scalars $lambda_i$. We can't use the gradient descent iteration to find a solution to this optimality condition, but another iteration might work.
– littleO
Nov 15 '17 at 11:38
If $x$ is a minimizer of a smooth function $f:mathbb R^n to mathbb R$, then $nabla f(x) = 0$. One way to think about gradient descent is that it is an iterative method for solving the equation $nabla f(x) = 0$. Analogously, if $x$ is a minimizer of $f$ subject to the constraint that $g_i(x) = 0$ for $i = 1,...,m$, where each $g_i$ is smooth, then $x$ satisfies the optimality condtion $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$ for some scalars $lambda_i$. We can't use the gradient descent iteration to find a solution to this optimality condition, but another iteration might work.
– littleO
Nov 15 '17 at 11:38
thank you very much that was very inlightning. when we use the lagragian what will be the possible iteration in order to find the optimal x and lambda i understand from your explenation that gradient descent is a method to find the x where $$f(x^t)=f(x^{t-1})$$ how can we implant this idea over lagrange multipliers?
– optimize
Nov 15 '17 at 20:56
thank you very much that was very inlightning. when we use the lagragian what will be the possible iteration in order to find the optimal x and lambda i understand from your explenation that gradient descent is a method to find the x where $$f(x^t)=f(x^{t-1})$$ how can we implant this idea over lagrange multipliers?
– optimize
Nov 15 '17 at 20:56
I would explain that last point a little differently. The gradient descent iteration is $x^{k+1} = x^k - t nabla f(x^k)$. If the gradient descent iteration converges to a point $x^star$, then we should have $x^star = x^star - t nabla f(x^star)$, which implies that $nabla f(x^star) = 0$. So, you can look at the gradient descent iteration as an iterative method to solve the optimality condition $nabla f(x) = 0$.
– littleO
Nov 15 '17 at 21:17
I would explain that last point a little differently. The gradient descent iteration is $x^{k+1} = x^k - t nabla f(x^k)$. If the gradient descent iteration converges to a point $x^star$, then we should have $x^star = x^star - t nabla f(x^star)$, which implies that $nabla f(x^star) = 0$. So, you can look at the gradient descent iteration as an iterative method to solve the optimality condition $nabla f(x) = 0$.
– littleO
Nov 15 '17 at 21:17
There are more complicated "primal-dual algorithms" that attempt to iteratively solve the "primal-dual optimality condition" $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$. They are more complicated to describe, and often they make additional assumptions about the problem structure. But the Augmented Lagrangian Method is one such algorithm to be aware of. If you're curious, you can read about this method in section 17.3 (p. 513) of Numerical Optimization by Nocedal and Wright. However, this is unnecessary if your goal is just to understand Lagrange multipliers and gradient descent.
– littleO
Nov 15 '17 at 21:21
There are more complicated "primal-dual algorithms" that attempt to iteratively solve the "primal-dual optimality condition" $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$. They are more complicated to describe, and often they make additional assumptions about the problem structure. But the Augmented Lagrangian Method is one such algorithm to be aware of. If you're curious, you can read about this method in section 17.3 (p. 513) of Numerical Optimization by Nocedal and Wright. However, this is unnecessary if your goal is just to understand Lagrange multipliers and gradient descent.
– littleO
Nov 15 '17 at 21:21
yes that was clear to me. what will be the iteration for lagrange multipliers? isnt the result of the lagrange multiplier be the optimal x?
– optimize
Nov 15 '17 at 21:22
yes that was clear to me. what will be the iteration for lagrange multipliers? isnt the result of the lagrange multiplier be the optimal x?
– optimize
Nov 15 '17 at 21:22
|
show 2 more comments
1 Answer
1
active
oldest
votes
I'm not sure if your understanding of the Lagrange multipliers method is correct. This method involves the construction of the Lagrangian:
$$ mathcal{L}(x,lambda) = f(x)-lambda cdot g(x)$$
and then finding solutions to:
$$
nabla_{x,lambda} mathcal{L}(x, lambda)=0 tag{1}label{1}
$$
Please note that this gradient is calculated with respect to both $x$ and $lambda$:
$$
nabla_{x,lambda} mathcal{L}(x, lambda)=left( frac{partial mathcal{L}}{partial x}, frac{partial mathcal{L}}{partial lambda} right)
$$
Solutions of $eqref{1}$ are of course the critical points of $mathcal{L}(x, lambda)$, not necessarily extremums, also saddle points. Lagrange multipliers method tells us that the "$x$" part of some of those points constitute the minimum of the function $f(x)$ with $g(x)=0$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2521305%2fgradient-descent-versus-lagrange-multipliers%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I'm not sure if your understanding of the Lagrange multipliers method is correct. This method involves the construction of the Lagrangian:
$$ mathcal{L}(x,lambda) = f(x)-lambda cdot g(x)$$
and then finding solutions to:
$$
nabla_{x,lambda} mathcal{L}(x, lambda)=0 tag{1}label{1}
$$
Please note that this gradient is calculated with respect to both $x$ and $lambda$:
$$
nabla_{x,lambda} mathcal{L}(x, lambda)=left( frac{partial mathcal{L}}{partial x}, frac{partial mathcal{L}}{partial lambda} right)
$$
Solutions of $eqref{1}$ are of course the critical points of $mathcal{L}(x, lambda)$, not necessarily extremums, also saddle points. Lagrange multipliers method tells us that the "$x$" part of some of those points constitute the minimum of the function $f(x)$ with $g(x)=0$
add a comment |
I'm not sure if your understanding of the Lagrange multipliers method is correct. This method involves the construction of the Lagrangian:
$$ mathcal{L}(x,lambda) = f(x)-lambda cdot g(x)$$
and then finding solutions to:
$$
nabla_{x,lambda} mathcal{L}(x, lambda)=0 tag{1}label{1}
$$
Please note that this gradient is calculated with respect to both $x$ and $lambda$:
$$
nabla_{x,lambda} mathcal{L}(x, lambda)=left( frac{partial mathcal{L}}{partial x}, frac{partial mathcal{L}}{partial lambda} right)
$$
Solutions of $eqref{1}$ are of course the critical points of $mathcal{L}(x, lambda)$, not necessarily extremums, also saddle points. Lagrange multipliers method tells us that the "$x$" part of some of those points constitute the minimum of the function $f(x)$ with $g(x)=0$
add a comment |
I'm not sure if your understanding of the Lagrange multipliers method is correct. This method involves the construction of the Lagrangian:
$$ mathcal{L}(x,lambda) = f(x)-lambda cdot g(x)$$
and then finding solutions to:
$$
nabla_{x,lambda} mathcal{L}(x, lambda)=0 tag{1}label{1}
$$
Please note that this gradient is calculated with respect to both $x$ and $lambda$:
$$
nabla_{x,lambda} mathcal{L}(x, lambda)=left( frac{partial mathcal{L}}{partial x}, frac{partial mathcal{L}}{partial lambda} right)
$$
Solutions of $eqref{1}$ are of course the critical points of $mathcal{L}(x, lambda)$, not necessarily extremums, also saddle points. Lagrange multipliers method tells us that the "$x$" part of some of those points constitute the minimum of the function $f(x)$ with $g(x)=0$
I'm not sure if your understanding of the Lagrange multipliers method is correct. This method involves the construction of the Lagrangian:
$$ mathcal{L}(x,lambda) = f(x)-lambda cdot g(x)$$
and then finding solutions to:
$$
nabla_{x,lambda} mathcal{L}(x, lambda)=0 tag{1}label{1}
$$
Please note that this gradient is calculated with respect to both $x$ and $lambda$:
$$
nabla_{x,lambda} mathcal{L}(x, lambda)=left( frac{partial mathcal{L}}{partial x}, frac{partial mathcal{L}}{partial lambda} right)
$$
Solutions of $eqref{1}$ are of course the critical points of $mathcal{L}(x, lambda)$, not necessarily extremums, also saddle points. Lagrange multipliers method tells us that the "$x$" part of some of those points constitute the minimum of the function $f(x)$ with $g(x)=0$
answered Apr 6 '18 at 13:13
Piotr Suszyński
1
1
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2521305%2fgradient-descent-versus-lagrange-multipliers%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
If $x$ is a minimizer of a smooth function $f:mathbb R^n to mathbb R$, then $nabla f(x) = 0$. One way to think about gradient descent is that it is an iterative method for solving the equation $nabla f(x) = 0$. Analogously, if $x$ is a minimizer of $f$ subject to the constraint that $g_i(x) = 0$ for $i = 1,...,m$, where each $g_i$ is smooth, then $x$ satisfies the optimality condtion $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$ for some scalars $lambda_i$. We can't use the gradient descent iteration to find a solution to this optimality condition, but another iteration might work.
– littleO
Nov 15 '17 at 11:38
thank you very much that was very inlightning. when we use the lagragian what will be the possible iteration in order to find the optimal x and lambda i understand from your explenation that gradient descent is a method to find the x where $$f(x^t)=f(x^{t-1})$$ how can we implant this idea over lagrange multipliers?
– optimize
Nov 15 '17 at 20:56
I would explain that last point a little differently. The gradient descent iteration is $x^{k+1} = x^k - t nabla f(x^k)$. If the gradient descent iteration converges to a point $x^star$, then we should have $x^star = x^star - t nabla f(x^star)$, which implies that $nabla f(x^star) = 0$. So, you can look at the gradient descent iteration as an iterative method to solve the optimality condition $nabla f(x) = 0$.
– littleO
Nov 15 '17 at 21:17
There are more complicated "primal-dual algorithms" that attempt to iteratively solve the "primal-dual optimality condition" $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$. They are more complicated to describe, and often they make additional assumptions about the problem structure. But the Augmented Lagrangian Method is one such algorithm to be aware of. If you're curious, you can read about this method in section 17.3 (p. 513) of Numerical Optimization by Nocedal and Wright. However, this is unnecessary if your goal is just to understand Lagrange multipliers and gradient descent.
– littleO
Nov 15 '17 at 21:21
yes that was clear to me. what will be the iteration for lagrange multipliers? isnt the result of the lagrange multiplier be the optimal x?
– optimize
Nov 15 '17 at 21:22