Gradient descent versus Lagrange multipliers












1














I am new to optimization methods and trying to understand them. I am familiar with two main methods.




  1. gradient descent, which, as I understood, means we try to calculate the next point on the function which will take us closer to the min/max:
    $$f(x^t)=x^{t-1}-lambda(frac{partial}{partial{x}}f(x^{t-1}))$$
    where $lambda$ acts as the step size or learning rate.


  2. Lagrange multipliers, which is a method of finding the ideal place on the function (s.t another function) which will minmax it:
    $$ mathcal{L}(x,lambda) = frac{partial}{partial{x}}f(x)-lambdafrac{partial}{partial{x}}g(x)$$
    while
    $$g(x)=0$$
    that means $g(x)$ is the condition and $lambda$ is a scalar which adjust the normal amplitude (so that the normal of $f(x)$ will be with the same amplitude of $g(x)$. The result of both equations will result in specific x and lambda that minimize the function $f(x)$.



I dont understand some basic points here:




  1. when we use Lagrange multipliers we result with specific $x$ of the minimum, where is the next step for the iterations leading to the optimal result?


  2. if we result with the optimal $x$ how does $lambda$ affect the final result?


  3. is there a connection between gradient descent and Lagrange multipliers? If so, what is the case where they are equal? (is gradient descent a specific case of lagrange multiplier in which $g(x)$ is a specific function? Will both $lambda$'s have the same meaning in this case?)



Thank you.










share|cite|improve this question
























  • If $x$ is a minimizer of a smooth function $f:mathbb R^n to mathbb R$, then $nabla f(x) = 0$. One way to think about gradient descent is that it is an iterative method for solving the equation $nabla f(x) = 0$. Analogously, if $x$ is a minimizer of $f$ subject to the constraint that $g_i(x) = 0$ for $i = 1,...,m$, where each $g_i$ is smooth, then $x$ satisfies the optimality condtion $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$ for some scalars $lambda_i$. We can't use the gradient descent iteration to find a solution to this optimality condition, but another iteration might work.
    – littleO
    Nov 15 '17 at 11:38










  • thank you very much that was very inlightning. when we use the lagragian what will be the possible iteration in order to find the optimal x and lambda i understand from your explenation that gradient descent is a method to find the x where $$f(x^t)=f(x^{t-1})$$ how can we implant this idea over lagrange multipliers?
    – optimize
    Nov 15 '17 at 20:56












  • I would explain that last point a little differently. The gradient descent iteration is $x^{k+1} = x^k - t nabla f(x^k)$. If the gradient descent iteration converges to a point $x^star$, then we should have $x^star = x^star - t nabla f(x^star)$, which implies that $nabla f(x^star) = 0$. So, you can look at the gradient descent iteration as an iterative method to solve the optimality condition $nabla f(x) = 0$.
    – littleO
    Nov 15 '17 at 21:17










  • There are more complicated "primal-dual algorithms" that attempt to iteratively solve the "primal-dual optimality condition" $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$. They are more complicated to describe, and often they make additional assumptions about the problem structure. But the Augmented Lagrangian Method is one such algorithm to be aware of. If you're curious, you can read about this method in section 17.3 (p. 513) of Numerical Optimization by Nocedal and Wright. However, this is unnecessary if your goal is just to understand Lagrange multipliers and gradient descent.
    – littleO
    Nov 15 '17 at 21:21












  • yes that was clear to me. what will be the iteration for lagrange multipliers? isnt the result of the lagrange multiplier be the optimal x?
    – optimize
    Nov 15 '17 at 21:22
















1














I am new to optimization methods and trying to understand them. I am familiar with two main methods.




  1. gradient descent, which, as I understood, means we try to calculate the next point on the function which will take us closer to the min/max:
    $$f(x^t)=x^{t-1}-lambda(frac{partial}{partial{x}}f(x^{t-1}))$$
    where $lambda$ acts as the step size or learning rate.


  2. Lagrange multipliers, which is a method of finding the ideal place on the function (s.t another function) which will minmax it:
    $$ mathcal{L}(x,lambda) = frac{partial}{partial{x}}f(x)-lambdafrac{partial}{partial{x}}g(x)$$
    while
    $$g(x)=0$$
    that means $g(x)$ is the condition and $lambda$ is a scalar which adjust the normal amplitude (so that the normal of $f(x)$ will be with the same amplitude of $g(x)$. The result of both equations will result in specific x and lambda that minimize the function $f(x)$.



I dont understand some basic points here:




  1. when we use Lagrange multipliers we result with specific $x$ of the minimum, where is the next step for the iterations leading to the optimal result?


  2. if we result with the optimal $x$ how does $lambda$ affect the final result?


  3. is there a connection between gradient descent and Lagrange multipliers? If so, what is the case where they are equal? (is gradient descent a specific case of lagrange multiplier in which $g(x)$ is a specific function? Will both $lambda$'s have the same meaning in this case?)



Thank you.










share|cite|improve this question
























  • If $x$ is a minimizer of a smooth function $f:mathbb R^n to mathbb R$, then $nabla f(x) = 0$. One way to think about gradient descent is that it is an iterative method for solving the equation $nabla f(x) = 0$. Analogously, if $x$ is a minimizer of $f$ subject to the constraint that $g_i(x) = 0$ for $i = 1,...,m$, where each $g_i$ is smooth, then $x$ satisfies the optimality condtion $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$ for some scalars $lambda_i$. We can't use the gradient descent iteration to find a solution to this optimality condition, but another iteration might work.
    – littleO
    Nov 15 '17 at 11:38










  • thank you very much that was very inlightning. when we use the lagragian what will be the possible iteration in order to find the optimal x and lambda i understand from your explenation that gradient descent is a method to find the x where $$f(x^t)=f(x^{t-1})$$ how can we implant this idea over lagrange multipliers?
    – optimize
    Nov 15 '17 at 20:56












  • I would explain that last point a little differently. The gradient descent iteration is $x^{k+1} = x^k - t nabla f(x^k)$. If the gradient descent iteration converges to a point $x^star$, then we should have $x^star = x^star - t nabla f(x^star)$, which implies that $nabla f(x^star) = 0$. So, you can look at the gradient descent iteration as an iterative method to solve the optimality condition $nabla f(x) = 0$.
    – littleO
    Nov 15 '17 at 21:17










  • There are more complicated "primal-dual algorithms" that attempt to iteratively solve the "primal-dual optimality condition" $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$. They are more complicated to describe, and often they make additional assumptions about the problem structure. But the Augmented Lagrangian Method is one such algorithm to be aware of. If you're curious, you can read about this method in section 17.3 (p. 513) of Numerical Optimization by Nocedal and Wright. However, this is unnecessary if your goal is just to understand Lagrange multipliers and gradient descent.
    – littleO
    Nov 15 '17 at 21:21












  • yes that was clear to me. what will be the iteration for lagrange multipliers? isnt the result of the lagrange multiplier be the optimal x?
    – optimize
    Nov 15 '17 at 21:22














1












1








1







I am new to optimization methods and trying to understand them. I am familiar with two main methods.




  1. gradient descent, which, as I understood, means we try to calculate the next point on the function which will take us closer to the min/max:
    $$f(x^t)=x^{t-1}-lambda(frac{partial}{partial{x}}f(x^{t-1}))$$
    where $lambda$ acts as the step size or learning rate.


  2. Lagrange multipliers, which is a method of finding the ideal place on the function (s.t another function) which will minmax it:
    $$ mathcal{L}(x,lambda) = frac{partial}{partial{x}}f(x)-lambdafrac{partial}{partial{x}}g(x)$$
    while
    $$g(x)=0$$
    that means $g(x)$ is the condition and $lambda$ is a scalar which adjust the normal amplitude (so that the normal of $f(x)$ will be with the same amplitude of $g(x)$. The result of both equations will result in specific x and lambda that minimize the function $f(x)$.



I dont understand some basic points here:




  1. when we use Lagrange multipliers we result with specific $x$ of the minimum, where is the next step for the iterations leading to the optimal result?


  2. if we result with the optimal $x$ how does $lambda$ affect the final result?


  3. is there a connection between gradient descent and Lagrange multipliers? If so, what is the case where they are equal? (is gradient descent a specific case of lagrange multiplier in which $g(x)$ is a specific function? Will both $lambda$'s have the same meaning in this case?)



Thank you.










share|cite|improve this question















I am new to optimization methods and trying to understand them. I am familiar with two main methods.




  1. gradient descent, which, as I understood, means we try to calculate the next point on the function which will take us closer to the min/max:
    $$f(x^t)=x^{t-1}-lambda(frac{partial}{partial{x}}f(x^{t-1}))$$
    where $lambda$ acts as the step size or learning rate.


  2. Lagrange multipliers, which is a method of finding the ideal place on the function (s.t another function) which will minmax it:
    $$ mathcal{L}(x,lambda) = frac{partial}{partial{x}}f(x)-lambdafrac{partial}{partial{x}}g(x)$$
    while
    $$g(x)=0$$
    that means $g(x)$ is the condition and $lambda$ is a scalar which adjust the normal amplitude (so that the normal of $f(x)$ will be with the same amplitude of $g(x)$. The result of both equations will result in specific x and lambda that minimize the function $f(x)$.



I dont understand some basic points here:




  1. when we use Lagrange multipliers we result with specific $x$ of the minimum, where is the next step for the iterations leading to the optimal result?


  2. if we result with the optimal $x$ how does $lambda$ affect the final result?


  3. is there a connection between gradient descent and Lagrange multipliers? If so, what is the case where they are equal? (is gradient descent a specific case of lagrange multiplier in which $g(x)$ is a specific function? Will both $lambda$'s have the same meaning in this case?)



Thank you.







optimization vector-analysis lagrange-multiplier gradient-descent






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Apr 6 '18 at 13:17









Rodrigo de Azevedo

12.8k41855




12.8k41855










asked Nov 15 '17 at 11:25









optimize

163




163












  • If $x$ is a minimizer of a smooth function $f:mathbb R^n to mathbb R$, then $nabla f(x) = 0$. One way to think about gradient descent is that it is an iterative method for solving the equation $nabla f(x) = 0$. Analogously, if $x$ is a minimizer of $f$ subject to the constraint that $g_i(x) = 0$ for $i = 1,...,m$, where each $g_i$ is smooth, then $x$ satisfies the optimality condtion $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$ for some scalars $lambda_i$. We can't use the gradient descent iteration to find a solution to this optimality condition, but another iteration might work.
    – littleO
    Nov 15 '17 at 11:38










  • thank you very much that was very inlightning. when we use the lagragian what will be the possible iteration in order to find the optimal x and lambda i understand from your explenation that gradient descent is a method to find the x where $$f(x^t)=f(x^{t-1})$$ how can we implant this idea over lagrange multipliers?
    – optimize
    Nov 15 '17 at 20:56












  • I would explain that last point a little differently. The gradient descent iteration is $x^{k+1} = x^k - t nabla f(x^k)$. If the gradient descent iteration converges to a point $x^star$, then we should have $x^star = x^star - t nabla f(x^star)$, which implies that $nabla f(x^star) = 0$. So, you can look at the gradient descent iteration as an iterative method to solve the optimality condition $nabla f(x) = 0$.
    – littleO
    Nov 15 '17 at 21:17










  • There are more complicated "primal-dual algorithms" that attempt to iteratively solve the "primal-dual optimality condition" $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$. They are more complicated to describe, and often they make additional assumptions about the problem structure. But the Augmented Lagrangian Method is one such algorithm to be aware of. If you're curious, you can read about this method in section 17.3 (p. 513) of Numerical Optimization by Nocedal and Wright. However, this is unnecessary if your goal is just to understand Lagrange multipliers and gradient descent.
    – littleO
    Nov 15 '17 at 21:21












  • yes that was clear to me. what will be the iteration for lagrange multipliers? isnt the result of the lagrange multiplier be the optimal x?
    – optimize
    Nov 15 '17 at 21:22


















  • If $x$ is a minimizer of a smooth function $f:mathbb R^n to mathbb R$, then $nabla f(x) = 0$. One way to think about gradient descent is that it is an iterative method for solving the equation $nabla f(x) = 0$. Analogously, if $x$ is a minimizer of $f$ subject to the constraint that $g_i(x) = 0$ for $i = 1,...,m$, where each $g_i$ is smooth, then $x$ satisfies the optimality condtion $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$ for some scalars $lambda_i$. We can't use the gradient descent iteration to find a solution to this optimality condition, but another iteration might work.
    – littleO
    Nov 15 '17 at 11:38










  • thank you very much that was very inlightning. when we use the lagragian what will be the possible iteration in order to find the optimal x and lambda i understand from your explenation that gradient descent is a method to find the x where $$f(x^t)=f(x^{t-1})$$ how can we implant this idea over lagrange multipliers?
    – optimize
    Nov 15 '17 at 20:56












  • I would explain that last point a little differently. The gradient descent iteration is $x^{k+1} = x^k - t nabla f(x^k)$. If the gradient descent iteration converges to a point $x^star$, then we should have $x^star = x^star - t nabla f(x^star)$, which implies that $nabla f(x^star) = 0$. So, you can look at the gradient descent iteration as an iterative method to solve the optimality condition $nabla f(x) = 0$.
    – littleO
    Nov 15 '17 at 21:17










  • There are more complicated "primal-dual algorithms" that attempt to iteratively solve the "primal-dual optimality condition" $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$. They are more complicated to describe, and often they make additional assumptions about the problem structure. But the Augmented Lagrangian Method is one such algorithm to be aware of. If you're curious, you can read about this method in section 17.3 (p. 513) of Numerical Optimization by Nocedal and Wright. However, this is unnecessary if your goal is just to understand Lagrange multipliers and gradient descent.
    – littleO
    Nov 15 '17 at 21:21












  • yes that was clear to me. what will be the iteration for lagrange multipliers? isnt the result of the lagrange multiplier be the optimal x?
    – optimize
    Nov 15 '17 at 21:22
















If $x$ is a minimizer of a smooth function $f:mathbb R^n to mathbb R$, then $nabla f(x) = 0$. One way to think about gradient descent is that it is an iterative method for solving the equation $nabla f(x) = 0$. Analogously, if $x$ is a minimizer of $f$ subject to the constraint that $g_i(x) = 0$ for $i = 1,...,m$, where each $g_i$ is smooth, then $x$ satisfies the optimality condtion $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$ for some scalars $lambda_i$. We can't use the gradient descent iteration to find a solution to this optimality condition, but another iteration might work.
– littleO
Nov 15 '17 at 11:38




If $x$ is a minimizer of a smooth function $f:mathbb R^n to mathbb R$, then $nabla f(x) = 0$. One way to think about gradient descent is that it is an iterative method for solving the equation $nabla f(x) = 0$. Analogously, if $x$ is a minimizer of $f$ subject to the constraint that $g_i(x) = 0$ for $i = 1,...,m$, where each $g_i$ is smooth, then $x$ satisfies the optimality condtion $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$ for some scalars $lambda_i$. We can't use the gradient descent iteration to find a solution to this optimality condition, but another iteration might work.
– littleO
Nov 15 '17 at 11:38












thank you very much that was very inlightning. when we use the lagragian what will be the possible iteration in order to find the optimal x and lambda i understand from your explenation that gradient descent is a method to find the x where $$f(x^t)=f(x^{t-1})$$ how can we implant this idea over lagrange multipliers?
– optimize
Nov 15 '17 at 20:56






thank you very much that was very inlightning. when we use the lagragian what will be the possible iteration in order to find the optimal x and lambda i understand from your explenation that gradient descent is a method to find the x where $$f(x^t)=f(x^{t-1})$$ how can we implant this idea over lagrange multipliers?
– optimize
Nov 15 '17 at 20:56














I would explain that last point a little differently. The gradient descent iteration is $x^{k+1} = x^k - t nabla f(x^k)$. If the gradient descent iteration converges to a point $x^star$, then we should have $x^star = x^star - t nabla f(x^star)$, which implies that $nabla f(x^star) = 0$. So, you can look at the gradient descent iteration as an iterative method to solve the optimality condition $nabla f(x) = 0$.
– littleO
Nov 15 '17 at 21:17




I would explain that last point a little differently. The gradient descent iteration is $x^{k+1} = x^k - t nabla f(x^k)$. If the gradient descent iteration converges to a point $x^star$, then we should have $x^star = x^star - t nabla f(x^star)$, which implies that $nabla f(x^star) = 0$. So, you can look at the gradient descent iteration as an iterative method to solve the optimality condition $nabla f(x) = 0$.
– littleO
Nov 15 '17 at 21:17












There are more complicated "primal-dual algorithms" that attempt to iteratively solve the "primal-dual optimality condition" $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$. They are more complicated to describe, and often they make additional assumptions about the problem structure. But the Augmented Lagrangian Method is one such algorithm to be aware of. If you're curious, you can read about this method in section 17.3 (p. 513) of Numerical Optimization by Nocedal and Wright. However, this is unnecessary if your goal is just to understand Lagrange multipliers and gradient descent.
– littleO
Nov 15 '17 at 21:21






There are more complicated "primal-dual algorithms" that attempt to iteratively solve the "primal-dual optimality condition" $nabla f(x) + sum_i lambda_i nabla g_i(x) = 0$. They are more complicated to describe, and often they make additional assumptions about the problem structure. But the Augmented Lagrangian Method is one such algorithm to be aware of. If you're curious, you can read about this method in section 17.3 (p. 513) of Numerical Optimization by Nocedal and Wright. However, this is unnecessary if your goal is just to understand Lagrange multipliers and gradient descent.
– littleO
Nov 15 '17 at 21:21














yes that was clear to me. what will be the iteration for lagrange multipliers? isnt the result of the lagrange multiplier be the optimal x?
– optimize
Nov 15 '17 at 21:22




yes that was clear to me. what will be the iteration for lagrange multipliers? isnt the result of the lagrange multiplier be the optimal x?
– optimize
Nov 15 '17 at 21:22










1 Answer
1






active

oldest

votes


















0














I'm not sure if your understanding of the Lagrange multipliers method is correct. This method involves the construction of the Lagrangian:
$$ mathcal{L}(x,lambda) = f(x)-lambda cdot g(x)$$
and then finding solutions to:
$$
nabla_{x,lambda} mathcal{L}(x, lambda)=0 tag{1}label{1}
$$
Please note that this gradient is calculated with respect to both $x$ and $lambda$:
$$
nabla_{x,lambda} mathcal{L}(x, lambda)=left( frac{partial mathcal{L}}{partial x}, frac{partial mathcal{L}}{partial lambda} right)
$$
Solutions of $eqref{1}$ are of course the critical points of $mathcal{L}(x, lambda)$, not necessarily extremums, also saddle points. Lagrange multipliers method tells us that the "$x$" part of some of those points constitute the minimum of the function $f(x)$ with $g(x)=0$






share|cite|improve this answer





















    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "69"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2521305%2fgradient-descent-versus-lagrange-multipliers%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    I'm not sure if your understanding of the Lagrange multipliers method is correct. This method involves the construction of the Lagrangian:
    $$ mathcal{L}(x,lambda) = f(x)-lambda cdot g(x)$$
    and then finding solutions to:
    $$
    nabla_{x,lambda} mathcal{L}(x, lambda)=0 tag{1}label{1}
    $$
    Please note that this gradient is calculated with respect to both $x$ and $lambda$:
    $$
    nabla_{x,lambda} mathcal{L}(x, lambda)=left( frac{partial mathcal{L}}{partial x}, frac{partial mathcal{L}}{partial lambda} right)
    $$
    Solutions of $eqref{1}$ are of course the critical points of $mathcal{L}(x, lambda)$, not necessarily extremums, also saddle points. Lagrange multipliers method tells us that the "$x$" part of some of those points constitute the minimum of the function $f(x)$ with $g(x)=0$






    share|cite|improve this answer


























      0














      I'm not sure if your understanding of the Lagrange multipliers method is correct. This method involves the construction of the Lagrangian:
      $$ mathcal{L}(x,lambda) = f(x)-lambda cdot g(x)$$
      and then finding solutions to:
      $$
      nabla_{x,lambda} mathcal{L}(x, lambda)=0 tag{1}label{1}
      $$
      Please note that this gradient is calculated with respect to both $x$ and $lambda$:
      $$
      nabla_{x,lambda} mathcal{L}(x, lambda)=left( frac{partial mathcal{L}}{partial x}, frac{partial mathcal{L}}{partial lambda} right)
      $$
      Solutions of $eqref{1}$ are of course the critical points of $mathcal{L}(x, lambda)$, not necessarily extremums, also saddle points. Lagrange multipliers method tells us that the "$x$" part of some of those points constitute the minimum of the function $f(x)$ with $g(x)=0$






      share|cite|improve this answer
























        0












        0








        0






        I'm not sure if your understanding of the Lagrange multipliers method is correct. This method involves the construction of the Lagrangian:
        $$ mathcal{L}(x,lambda) = f(x)-lambda cdot g(x)$$
        and then finding solutions to:
        $$
        nabla_{x,lambda} mathcal{L}(x, lambda)=0 tag{1}label{1}
        $$
        Please note that this gradient is calculated with respect to both $x$ and $lambda$:
        $$
        nabla_{x,lambda} mathcal{L}(x, lambda)=left( frac{partial mathcal{L}}{partial x}, frac{partial mathcal{L}}{partial lambda} right)
        $$
        Solutions of $eqref{1}$ are of course the critical points of $mathcal{L}(x, lambda)$, not necessarily extremums, also saddle points. Lagrange multipliers method tells us that the "$x$" part of some of those points constitute the minimum of the function $f(x)$ with $g(x)=0$






        share|cite|improve this answer












        I'm not sure if your understanding of the Lagrange multipliers method is correct. This method involves the construction of the Lagrangian:
        $$ mathcal{L}(x,lambda) = f(x)-lambda cdot g(x)$$
        and then finding solutions to:
        $$
        nabla_{x,lambda} mathcal{L}(x, lambda)=0 tag{1}label{1}
        $$
        Please note that this gradient is calculated with respect to both $x$ and $lambda$:
        $$
        nabla_{x,lambda} mathcal{L}(x, lambda)=left( frac{partial mathcal{L}}{partial x}, frac{partial mathcal{L}}{partial lambda} right)
        $$
        Solutions of $eqref{1}$ are of course the critical points of $mathcal{L}(x, lambda)$, not necessarily extremums, also saddle points. Lagrange multipliers method tells us that the "$x$" part of some of those points constitute the minimum of the function $f(x)$ with $g(x)=0$







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered Apr 6 '18 at 13:13









        Piotr Suszyński

        1




        1






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Mathematics Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2521305%2fgradient-descent-versus-lagrange-multipliers%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Plaza Victoria

            In PowerPoint, is there a keyboard shortcut for bulleted / numbered list?

            How to put 3 figures in Latex with 2 figures side by side and 1 below these side by side images but in...