Gradient descent versus Lagrange multipliers

Multi tool use

I am new to optimization methods and trying to understand them. I am familiar with two main methods.

gradient descent, which, as I understood, means we try to calculate the next point on the function which will take us closer to the min/max:
$$f(x^t)=x^{t-1}-lambda(frac{partial}{partial{x}}f(x^{t-1}))$$
where $lambda$ acts as the step size or learning rate.

Lagrange multipliers, which is a method of finding the ideal place on the function (s.t another function) which will minmax it:
$$ mathcal{L}(x,lambda) = frac{partial}{partial{x}}f(x)-lambdafrac{partial}{partial{x}}g(x)$$
while
$$g(x)=0$$
that means $g(x)$ is the condition and $lambda$ is a scalar which adjust the normal amplitude (so that the normal of $f(x)$ will be with the same amplitude of $g(x)$. The result of both equations will result in specific x and lambda that minimize the function $f(x)$.

I dont understand some basic points here:

when we use Lagrange multipliers we result with specific $x$ of the minimum, where is the next step for the iterations leading to the optimal result?

if we result with the optimal $x$ how does $lambda$ affect the final result?

is there a connection between gradient descent and Lagrange multipliers? If so, what is the case where they are equal? (is gradient descent a specific case of lagrange multiplier in which $g(x)$ is a specific function? Will both $lambda$'s have the same meaning in this case?)

Thank you.

edited Apr 6 '18 at 13:17

Rodrigo de Azevedo

12.8k41855

asked Nov 15 '17 at 11:25

optimize

163

If $x$ is a minimizer of a smooth function $f:mathbb R^n to mathbb R$, then $nabla f(x) = 0$. One way to think about gradient descent is that it is an iterative method for solving the equation $nabla f(x) = 0$. Analogously, if $x$ is a minimizer of $f$ subject to the constraint that $g_i(x) = 0$ for $i = 1,...,m$, where each $g_i$ is smooth, then $x$ satisfies the optimality condtion $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$ for some scalars $lambda_i$. We can't use the gradient descent iteration to find a solution to this optimality condition, but another iteration might work.
– littleO
Nov 15 '17 at 11:38

thank you very much that was very inlightning. when we use the lagragian what will be the possible iteration in order to find the optimal x and lambda i understand from your explenation that gradient descent is a method to find the x where $$f(x^t)=f(x^{t-1})$$ how can we implant this idea over lagrange multipliers?
– optimize
Nov 15 '17 at 20:56

I would explain that last point a little differently. The gradient descent iteration is $x^{k+1} = x^k - t nabla f(x^k)$. If the gradient descent iteration converges to a point $x^star$, then we should have $x^star = x^star - t nabla f(x^star)$, which implies that $nabla f(x^star) = 0$. So, you can look at the gradient descent iteration as an iterative method to solve the optimality condition $nabla f(x) = 0$.
– littleO
Nov 15 '17 at 21:17

There are more complicated "primal-dual algorithms" that attempt to iteratively solve the "primal-dual optimality condition" $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$. They are more complicated to describe, and often they make additional assumptions about the problem structure. But the Augmented Lagrangian Method is one such algorithm to be aware of. If you're curious, you can read about this method in section 17.3 (p. 513) of Numerical Optimization by Nocedal and Wright. However, this is unnecessary if your goal is just to understand Lagrange multipliers and gradient descent.
– littleO
Nov 15 '17 at 21:21

yes that was clear to me. what will be the iteration for lagrange multipliers? isnt the result of the lagrange multiplier be the optimal x?
– optimize
Nov 15 '17 at 21:22

|
show 2 more comments

I am new to optimization methods and trying to understand them. I am familiar with two main methods.

gradient descent, which, as I understood, means we try to calculate the next point on the function which will take us closer to the min/max:
$$f(x^t)=x^{t-1}-lambda(frac{partial}{partial{x}}f(x^{t-1}))$$
where $lambda$ acts as the step size or learning rate.

Lagrange multipliers, which is a method of finding the ideal place on the function (s.t another function) which will minmax it:
$$ mathcal{L}(x,lambda) = frac{partial}{partial{x}}f(x)-lambdafrac{partial}{partial{x}}g(x)$$
while
$$g(x)=0$$
that means $g(x)$ is the condition and $lambda$ is a scalar which adjust the normal amplitude (so that the normal of $f(x)$ will be with the same amplitude of $g(x)$. The result of both equations will result in specific x and lambda that minimize the function $f(x)$.

I dont understand some basic points here:

when we use Lagrange multipliers we result with specific $x$ of the minimum, where is the next step for the iterations leading to the optimal result?

if we result with the optimal $x$ how does $lambda$ affect the final result?

is there a connection between gradient descent and Lagrange multipliers? If so, what is the case where they are equal? (is gradient descent a specific case of lagrange multiplier in which $g(x)$ is a specific function? Will both $lambda$'s have the same meaning in this case?)

Thank you.

edited Apr 6 '18 at 13:17

Rodrigo de Azevedo

12.8k41855

asked Nov 15 '17 at 11:25

optimize

163

If $x$ is a minimizer of a smooth function $f:mathbb R^n to mathbb R$, then $nabla f(x) = 0$. One way to think about gradient descent is that it is an iterative method for solving the equation $nabla f(x) = 0$. Analogously, if $x$ is a minimizer of $f$ subject to the constraint that $g_i(x) = 0$ for $i = 1,...,m$, where each $g_i$ is smooth, then $x$ satisfies the optimality condtion $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$ for some scalars $lambda_i$. We can't use the gradient descent iteration to find a solution to this optimality condition, but another iteration might work.
– littleO
Nov 15 '17 at 11:38

thank you very much that was very inlightning. when we use the lagragian what will be the possible iteration in order to find the optimal x and lambda i understand from your explenation that gradient descent is a method to find the x where $$f(x^t)=f(x^{t-1})$$ how can we implant this idea over lagrange multipliers?
– optimize
Nov 15 '17 at 20:56

I would explain that last point a little differently. The gradient descent iteration is $x^{k+1} = x^k - t nabla f(x^k)$. If the gradient descent iteration converges to a point $x^star$, then we should have $x^star = x^star - t nabla f(x^star)$, which implies that $nabla f(x^star) = 0$. So, you can look at the gradient descent iteration as an iterative method to solve the optimality condition $nabla f(x) = 0$.
– littleO
Nov 15 '17 at 21:17

There are more complicated "primal-dual algorithms" that attempt to iteratively solve the "primal-dual optimality condition" $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$. They are more complicated to describe, and often they make additional assumptions about the problem structure. But the Augmented Lagrangian Method is one such algorithm to be aware of. If you're curious, you can read about this method in section 17.3 (p. 513) of Numerical Optimization by Nocedal and Wright. However, this is unnecessary if your goal is just to understand Lagrange multipliers and gradient descent.
– littleO
Nov 15 '17 at 21:21

yes that was clear to me. what will be the iteration for lagrange multipliers? isnt the result of the lagrange multiplier be the optimal x?
– optimize
Nov 15 '17 at 21:22

|
show 2 more comments

I am new to optimization methods and trying to understand them. I am familiar with two main methods.

gradient descent, which, as I understood, means we try to calculate the next point on the function which will take us closer to the min/max:
$$f(x^t)=x^{t-1}-lambda(frac{partial}{partial{x}}f(x^{t-1}))$$
where $lambda$ acts as the step size or learning rate.

Lagrange multipliers, which is a method of finding the ideal place on the function (s.t another function) which will minmax it:
$$ mathcal{L}(x,lambda) = frac{partial}{partial{x}}f(x)-lambdafrac{partial}{partial{x}}g(x)$$
while
$$g(x)=0$$
that means $g(x)$ is the condition and $lambda$ is a scalar which adjust the normal amplitude (so that the normal of $f(x)$ will be with the same amplitude of $g(x)$. The result of both equations will result in specific x and lambda that minimize the function $f(x)$.

I dont understand some basic points here:

when we use Lagrange multipliers we result with specific $x$ of the minimum, where is the next step for the iterations leading to the optimal result?

if we result with the optimal $x$ how does $lambda$ affect the final result?

is there a connection between gradient descent and Lagrange multipliers? If so, what is the case where they are equal? (is gradient descent a specific case of lagrange multiplier in which $g(x)$ is a specific function? Will both $lambda$'s have the same meaning in this case?)

Thank you.

edited Apr 6 '18 at 13:17

Rodrigo de Azevedo

12.8k41855

asked Nov 15 '17 at 11:25

optimize

163

I am new to optimization methods and trying to understand them. I am familiar with two main methods.

gradient descent, which, as I understood, means we try to calculate the next point on the function which will take us closer to the min/max:
$$f(x^t)=x^{t-1}-lambda(frac{partial}{partial{x}}f(x^{t-1}))$$
where $lambda$ acts as the step size or learning rate.

Lagrange multipliers, which is a method of finding the ideal place on the function (s.t another function) which will minmax it:
$$ mathcal{L}(x,lambda) = frac{partial}{partial{x}}f(x)-lambdafrac{partial}{partial{x}}g(x)$$
while
$$g(x)=0$$
that means $g(x)$ is the condition and $lambda$ is a scalar which adjust the normal amplitude (so that the normal of $f(x)$ will be with the same amplitude of $g(x)$. The result of both equations will result in specific x and lambda that minimize the function $f(x)$.

I dont understand some basic points here:

when we use Lagrange multipliers we result with specific $x$ of the minimum, where is the next step for the iterations leading to the optimal result?

if we result with the optimal $x$ how does $lambda$ affect the final result?

is there a connection between gradient descent and Lagrange multipliers? If so, what is the case where they are equal? (is gradient descent a specific case of lagrange multiplier in which $g(x)$ is a specific function? Will both $lambda$'s have the same meaning in this case?)

Thank you.

optimization vector-analysis lagrange-multiplier gradient-descent

edited Apr 6 '18 at 13:17

Rodrigo de Azevedo

12.8k41855

asked Nov 15 '17 at 11:25

optimize

163

edited Apr 6 '18 at 13:17

Rodrigo de Azevedo

12.8k41855

asked Nov 15 '17 at 11:25

optimize

163

edited Apr 6 '18 at 13:17

Rodrigo de Azevedo

12.8k41855

edited Apr 6 '18 at 13:17

Rodrigo de Azevedo

12.8k41855

edited Apr 6 '18 at 13:17

Rodrigo de Azevedo

12.8k41855

asked Nov 15 '17 at 11:25

optimize

163

asked Nov 15 '17 at 11:25

optimize

163

asked Nov 15 '17 at 11:25

optimize

163

If $x$ is a minimizer of a smooth function $f:mathbb R^n to mathbb R$, then $nabla f(x) = 0$. One way to think about gradient descent is that it is an iterative method for solving the equation $nabla f(x) = 0$. Analogously, if $x$ is a minimizer of $f$ subject to the constraint that $g_i(x) = 0$ for $i = 1,...,m$, where each $g_i$ is smooth, then $x$ satisfies the optimality condtion $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$ for some scalars $lambda_i$. We can't use the gradient descent iteration to find a solution to this optimality condition, but another iteration might work.
– littleO
Nov 15 '17 at 11:38

thank you very much that was very inlightning. when we use the lagragian what will be the possible iteration in order to find the optimal x and lambda i understand from your explenation that gradient descent is a method to find the x where $$f(x^t)=f(x^{t-1})$$ how can we implant this idea over lagrange multipliers?
– optimize
Nov 15 '17 at 20:56

I would explain that last point a little differently. The gradient descent iteration is $x^{k+1} = x^k - t nabla f(x^k)$. If the gradient descent iteration converges to a point $x^star$, then we should have $x^star = x^star - t nabla f(x^star)$, which implies that $nabla f(x^star) = 0$. So, you can look at the gradient descent iteration as an iterative method to solve the optimality condition $nabla f(x) = 0$.
– littleO
Nov 15 '17 at 21:17

There are more complicated "primal-dual algorithms" that attempt to iteratively solve the "primal-dual optimality condition" $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$. They are more complicated to describe, and often they make additional assumptions about the problem structure. But the Augmented Lagrangian Method is one such algorithm to be aware of. If you're curious, you can read about this method in section 17.3 (p. 513) of Numerical Optimization by Nocedal and Wright. However, this is unnecessary if your goal is just to understand Lagrange multipliers and gradient descent.
– littleO
Nov 15 '17 at 21:21

yes that was clear to me. what will be the iteration for lagrange multipliers? isnt the result of the lagrange multiplier be the optimal x?
– optimize
Nov 15 '17 at 21:22

|
show 2 more comments

If $x$ is a minimizer of a smooth function $f:mathbb R^n to mathbb R$, then $nabla f(x) = 0$. One way to think about gradient descent is that it is an iterative method for solving the equation $nabla f(x) = 0$. Analogously, if $x$ is a minimizer of $f$ subject to the constraint that $g_i(x) = 0$ for $i = 1,...,m$, where each $g_i$ is smooth, then $x$ satisfies the optimality condtion $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$ for some scalars $lambda_i$. We can't use the gradient descent iteration to find a solution to this optimality condition, but another iteration might work.
– littleO
Nov 15 '17 at 11:38

thank you very much that was very inlightning. when we use the lagragian what will be the possible iteration in order to find the optimal x and lambda i understand from your explenation that gradient descent is a method to find the x where $$f(x^t)=f(x^{t-1})$$ how can we implant this idea over lagrange multipliers?
– optimize
Nov 15 '17 at 20:56

I would explain that last point a little differently. The gradient descent iteration is $x^{k+1} = x^k - t nabla f(x^k)$. If the gradient descent iteration converges to a point $x^star$, then we should have $x^star = x^star - t nabla f(x^star)$, which implies that $nabla f(x^star) = 0$. So, you can look at the gradient descent iteration as an iterative method to solve the optimality condition $nabla f(x) = 0$.
– littleO
Nov 15 '17 at 21:17

There are more complicated "primal-dual algorithms" that attempt to iteratively solve the "primal-dual optimality condition" $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$. They are more complicated to describe, and often they make additional assumptions about the problem structure. But the Augmented Lagrangian Method is one such algorithm to be aware of. If you're curious, you can read about this method in section 17.3 (p. 513) of Numerical Optimization by Nocedal and Wright. However, this is unnecessary if your goal is just to understand Lagrange multipliers and gradient descent.
– littleO
Nov 15 '17 at 21:21

yes that was clear to me. what will be the iteration for lagrange multipliers? isnt the result of the lagrange multiplier be the optimal x?
– optimize
Nov 15 '17 at 21:22

If $x$ is a minimizer of a smooth function $f:mathbb R^n to mathbb R$, then $nabla f(x) = 0$. One way to think about gradient descent is that it is an iterative method for solving the equation $nabla f(x) = 0$. Analogously, if $x$ is a minimizer of $f$ subject to the constraint that $g_i(x) = 0$ for $i = 1,...,m$, where each $g_i$ is smooth, then $x$ satisfies the optimality condtion $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$ for some scalars $lambda_i$. We can't use the gradient descent iteration to find a solution to this optimality condition, but another iteration might work.
– littleO
Nov 15 '17 at 11:38

thank you very much that was very inlightning. when we use the lagragian what will be the possible iteration in order to find the optimal x and lambda i understand from your explenation that gradient descent is a method to find the x where $$f(x^t)=f(x^{t-1})$$ how can we implant this idea over lagrange multipliers?
– optimize
Nov 15 '17 at 20:56

I would explain that last point a little differently. The gradient descent iteration is $x^{k+1} = x^k - t nabla f(x^k)$. If the gradient descent iteration converges to a point $x^star$, then we should have $x^star = x^star - t nabla f(x^star)$, which implies that $nabla f(x^star) = 0$. So, you can look at the gradient descent iteration as an iterative method to solve the optimality condition $nabla f(x) = 0$.
– littleO
Nov 15 '17 at 21:17

There are more complicated "primal-dual algorithms" that attempt to iteratively solve the "primal-dual optimality condition" $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$. They are more complicated to describe, and often they make additional assumptions about the problem structure. But the Augmented Lagrangian Method is one such algorithm to be aware of. If you're curious, you can read about this method in section 17.3 (p. 513) of Numerical Optimization by Nocedal and Wright. However, this is unnecessary if your goal is just to understand Lagrange multipliers and gradient descent.
– littleO
Nov 15 '17 at 21:21

yes that was clear to me. what will be the iteration for lagrange multipliers? isnt the result of the lagrange multiplier be the optimal x?
– optimize
Nov 15 '17 at 21:22

|
show 2 more comments

1 Answer
1

active

oldest

votes

I'm not sure if your understanding of the Lagrange multipliers method is correct. This method involves the construction of the Lagrangian:
$$ mathcal{L}(x,lambda) = f(x)-lambda cdot g(x)$$
and then finding solutions to:
$$
nabla_{x,lambda} mathcal{L}(x, lambda)=0 tag{1}label{1}
$$
Please note that this gradient is calculated with respect to both $x$ and $lambda$:
$$
nabla_{x,lambda} mathcal{L}(x, lambda)=left( frac{partial mathcal{L}}{partial x}, frac{partial mathcal{L}}{partial lambda} right)
$$
Solutions of $eqref{1}$ are of course the critical points of $mathcal{L}(x, lambda)$, not necessarily extremums, also saddle points. Lagrange multipliers method tells us that the "$x$" part of some of those points constitute the minimum of the function $f(x)$ with $g(x)=0$

answered Apr 6 '18 at 13:13

Piotr Suszyński

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2521305%2fgradient-descent-versus-lagrange-multipliers%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

answered Apr 6 '18 at 13:13

Piotr Suszyński

add a comment |

answered Apr 6 '18 at 13:13

Piotr Suszyński

add a comment |

answered Apr 6 '18 at 13:13

Piotr Suszyński

answered Apr 6 '18 at 13:13

Piotr Suszyński

answered Apr 6 '18 at 13:13

Piotr Suszyński

answered Apr 6 '18 at 13:13

Piotr Suszyński

answered Apr 6 '18 at 13:13

Piotr Suszyński

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

d 5oDod YYLj 6SeHWuVQyT IC,eUg7D yuxxmm7Oey6EPzj,yX1cNxRhS PSzwesug3sG j1j,X9O6,krzyC4OL6xXS6YEjZ0o zzyB

搜尋此網誌

Csdrhrt