Convergence of Linear Neural Networks in the Easiest Framework












0












$begingroup$


EDIT: Thanks to the first answer I'm lighting some assumptions:



I'm trying to understand the basics of machine learning, and I have this theoretical question:



I have a one layer linear neural network $ f: mathbb{R}^d rightarrow mathbb{R}^2 $ and two classes to learn, hence we are talking about binary classification.



Assume that, during training, I just show my network datapoints sampled from a simple distribution $D$ which lies on a circle with radius $r$ and nothing else. Furthermore, let's assume that these points all have the same label $0$. After training with S.G.D, I want my network to perform well just on this circle, I don't care how does it behave on data sampled outside (I assume it would be close to random classification, since it only sees data from inside this circle).



How many iterations will SGD need in order to converge to a good local minimum?



How can I prove it?



My intuition is that the algorithm should be rather fast, since a function which would achieve 100% accuracy is very simple: it's the boundary of the circle itself, so we just need the network to approximate any closed curve around it and output $0$ for every point inside it. But I may be wrong.



Thank you very much!










share|cite|improve this question











$endgroup$

















    0












    $begingroup$


    EDIT: Thanks to the first answer I'm lighting some assumptions:



    I'm trying to understand the basics of machine learning, and I have this theoretical question:



    I have a one layer linear neural network $ f: mathbb{R}^d rightarrow mathbb{R}^2 $ and two classes to learn, hence we are talking about binary classification.



    Assume that, during training, I just show my network datapoints sampled from a simple distribution $D$ which lies on a circle with radius $r$ and nothing else. Furthermore, let's assume that these points all have the same label $0$. After training with S.G.D, I want my network to perform well just on this circle, I don't care how does it behave on data sampled outside (I assume it would be close to random classification, since it only sees data from inside this circle).



    How many iterations will SGD need in order to converge to a good local minimum?



    How can I prove it?



    My intuition is that the algorithm should be rather fast, since a function which would achieve 100% accuracy is very simple: it's the boundary of the circle itself, so we just need the network to approximate any closed curve around it and output $0$ for every point inside it. But I may be wrong.



    Thank you very much!










    share|cite|improve this question











    $endgroup$















      0












      0








      0





      $begingroup$


      EDIT: Thanks to the first answer I'm lighting some assumptions:



      I'm trying to understand the basics of machine learning, and I have this theoretical question:



      I have a one layer linear neural network $ f: mathbb{R}^d rightarrow mathbb{R}^2 $ and two classes to learn, hence we are talking about binary classification.



      Assume that, during training, I just show my network datapoints sampled from a simple distribution $D$ which lies on a circle with radius $r$ and nothing else. Furthermore, let's assume that these points all have the same label $0$. After training with S.G.D, I want my network to perform well just on this circle, I don't care how does it behave on data sampled outside (I assume it would be close to random classification, since it only sees data from inside this circle).



      How many iterations will SGD need in order to converge to a good local minimum?



      How can I prove it?



      My intuition is that the algorithm should be rather fast, since a function which would achieve 100% accuracy is very simple: it's the boundary of the circle itself, so we just need the network to approximate any closed curve around it and output $0$ for every point inside it. But I may be wrong.



      Thank you very much!










      share|cite|improve this question











      $endgroup$




      EDIT: Thanks to the first answer I'm lighting some assumptions:



      I'm trying to understand the basics of machine learning, and I have this theoretical question:



      I have a one layer linear neural network $ f: mathbb{R}^d rightarrow mathbb{R}^2 $ and two classes to learn, hence we are talking about binary classification.



      Assume that, during training, I just show my network datapoints sampled from a simple distribution $D$ which lies on a circle with radius $r$ and nothing else. Furthermore, let's assume that these points all have the same label $0$. After training with S.G.D, I want my network to perform well just on this circle, I don't care how does it behave on data sampled outside (I assume it would be close to random classification, since it only sees data from inside this circle).



      How many iterations will SGD need in order to converge to a good local minimum?



      How can I prove it?



      My intuition is that the algorithm should be rather fast, since a function which would achieve 100% accuracy is very simple: it's the boundary of the circle itself, so we just need the network to approximate any closed curve around it and output $0$ for every point inside it. But I may be wrong.



      Thank you very much!







      convergence algorithms machine-learning neural-networks






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Nov 29 '18 at 16:25







      Alfred

















      asked Nov 29 '18 at 15:26









      AlfredAlfred

      184




      184






















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          This question cannot be answered. There are several issues with it that you need to be more precise about:




          1. To achieve zero training error that means that the data is perfectly described by your network. This isn't unreasonable to assume that a data set could be made this way, since after all neural networks (as you've defined them) are really just cascades of function composition. In order to get zero training error, you simply need to take your neural network, completely untrained, and use it to generate your training set. Immediately you'll get zero training error.


          2. Presuming that you didn't want to do that, then it would depend way too strongly on the data itself, and therein lies the major issue with this question - if you knew how many neurons you needed to get zero training error, then you'd already know enough about the data to not need to learn anything about it. Zero training error means that you can generate a perfect neural network to describe the (training) data. If you could do that in the first place, then you'd know so much about your data that learning is pointless.



          I think you need to re-frame your question to be a lot more specific, and more possible. Also, there is in general no way to know before training how good your training error will be. This is the same as point 2 above - if you knew that you could get your training error down to X with your particular neural network, then you'd already know enough about your data to not need to learn from it.






          share|cite|improve this answer









          $endgroup$









          • 1




            $begingroup$
            Ok thank you very much, I'll try to edit it!
            $endgroup$
            – Alfred
            Nov 29 '18 at 16:09











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "69"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3018771%2fconvergence-of-linear-neural-networks-in-the-easiest-framework%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0












          $begingroup$

          This question cannot be answered. There are several issues with it that you need to be more precise about:




          1. To achieve zero training error that means that the data is perfectly described by your network. This isn't unreasonable to assume that a data set could be made this way, since after all neural networks (as you've defined them) are really just cascades of function composition. In order to get zero training error, you simply need to take your neural network, completely untrained, and use it to generate your training set. Immediately you'll get zero training error.


          2. Presuming that you didn't want to do that, then it would depend way too strongly on the data itself, and therein lies the major issue with this question - if you knew how many neurons you needed to get zero training error, then you'd already know enough about the data to not need to learn anything about it. Zero training error means that you can generate a perfect neural network to describe the (training) data. If you could do that in the first place, then you'd know so much about your data that learning is pointless.



          I think you need to re-frame your question to be a lot more specific, and more possible. Also, there is in general no way to know before training how good your training error will be. This is the same as point 2 above - if you knew that you could get your training error down to X with your particular neural network, then you'd already know enough about your data to not need to learn from it.






          share|cite|improve this answer









          $endgroup$









          • 1




            $begingroup$
            Ok thank you very much, I'll try to edit it!
            $endgroup$
            – Alfred
            Nov 29 '18 at 16:09
















          0












          $begingroup$

          This question cannot be answered. There are several issues with it that you need to be more precise about:




          1. To achieve zero training error that means that the data is perfectly described by your network. This isn't unreasonable to assume that a data set could be made this way, since after all neural networks (as you've defined them) are really just cascades of function composition. In order to get zero training error, you simply need to take your neural network, completely untrained, and use it to generate your training set. Immediately you'll get zero training error.


          2. Presuming that you didn't want to do that, then it would depend way too strongly on the data itself, and therein lies the major issue with this question - if you knew how many neurons you needed to get zero training error, then you'd already know enough about the data to not need to learn anything about it. Zero training error means that you can generate a perfect neural network to describe the (training) data. If you could do that in the first place, then you'd know so much about your data that learning is pointless.



          I think you need to re-frame your question to be a lot more specific, and more possible. Also, there is in general no way to know before training how good your training error will be. This is the same as point 2 above - if you knew that you could get your training error down to X with your particular neural network, then you'd already know enough about your data to not need to learn from it.






          share|cite|improve this answer









          $endgroup$









          • 1




            $begingroup$
            Ok thank you very much, I'll try to edit it!
            $endgroup$
            – Alfred
            Nov 29 '18 at 16:09














          0












          0








          0





          $begingroup$

          This question cannot be answered. There are several issues with it that you need to be more precise about:




          1. To achieve zero training error that means that the data is perfectly described by your network. This isn't unreasonable to assume that a data set could be made this way, since after all neural networks (as you've defined them) are really just cascades of function composition. In order to get zero training error, you simply need to take your neural network, completely untrained, and use it to generate your training set. Immediately you'll get zero training error.


          2. Presuming that you didn't want to do that, then it would depend way too strongly on the data itself, and therein lies the major issue with this question - if you knew how many neurons you needed to get zero training error, then you'd already know enough about the data to not need to learn anything about it. Zero training error means that you can generate a perfect neural network to describe the (training) data. If you could do that in the first place, then you'd know so much about your data that learning is pointless.



          I think you need to re-frame your question to be a lot more specific, and more possible. Also, there is in general no way to know before training how good your training error will be. This is the same as point 2 above - if you knew that you could get your training error down to X with your particular neural network, then you'd already know enough about your data to not need to learn from it.






          share|cite|improve this answer









          $endgroup$



          This question cannot be answered. There are several issues with it that you need to be more precise about:




          1. To achieve zero training error that means that the data is perfectly described by your network. This isn't unreasonable to assume that a data set could be made this way, since after all neural networks (as you've defined them) are really just cascades of function composition. In order to get zero training error, you simply need to take your neural network, completely untrained, and use it to generate your training set. Immediately you'll get zero training error.


          2. Presuming that you didn't want to do that, then it would depend way too strongly on the data itself, and therein lies the major issue with this question - if you knew how many neurons you needed to get zero training error, then you'd already know enough about the data to not need to learn anything about it. Zero training error means that you can generate a perfect neural network to describe the (training) data. If you could do that in the first place, then you'd know so much about your data that learning is pointless.



          I think you need to re-frame your question to be a lot more specific, and more possible. Also, there is in general no way to know before training how good your training error will be. This is the same as point 2 above - if you knew that you could get your training error down to X with your particular neural network, then you'd already know enough about your data to not need to learn from it.







          share|cite|improve this answer












          share|cite|improve this answer



          share|cite|improve this answer










          answered Nov 29 '18 at 15:55









          Michael StachowskyMichael Stachowsky

          1,250417




          1,250417








          • 1




            $begingroup$
            Ok thank you very much, I'll try to edit it!
            $endgroup$
            – Alfred
            Nov 29 '18 at 16:09














          • 1




            $begingroup$
            Ok thank you very much, I'll try to edit it!
            $endgroup$
            – Alfred
            Nov 29 '18 at 16:09








          1




          1




          $begingroup$
          Ok thank you very much, I'll try to edit it!
          $endgroup$
          – Alfred
          Nov 29 '18 at 16:09




          $begingroup$
          Ok thank you very much, I'll try to edit it!
          $endgroup$
          – Alfred
          Nov 29 '18 at 16:09


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Mathematics Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3018771%2fconvergence-of-linear-neural-networks-in-the-easiest-framework%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Plaza Victoria

          In PowerPoint, is there a keyboard shortcut for bulleted / numbered list?

          How to put 3 figures in Latex with 2 figures side by side and 1 below these side by side images but in...