Should computer code be included within publications that present numerical results?












28














Many research papers include numerical results obtained through computation. Most of the time such computations are performed using software that is used by many mathematicians, i.e., Maple, Mathematica, or even C/C++ code. Should such code be included in the body of the published paper?



I've heard arguments from both sides:




  • Including such code can greatly decrease the time taken by a referee to replicate the results,

  • The code can be easily modified by further authors who wish to extend the result,

  • The reader does not need to spend time searching the journal website or the Internet for any "auxilliary files" containing the code.


On the other hand,




  • Pages of code degrade the aesthetic nature of the publication,

  • The author might need to spend additional space explaining the coding decisions that were made in the algorithms,

  • It is likely that there exist (much) better ways to write the same algorithms in the given, or any other, language.


So what is the standard in mathematical research papers that present numerical results, either as a main or as a side result? Should code be included within the body of the publication, as an auxilliary file, or not at all?










share|cite|improve this question




















  • 17




    Just provide a github.com link, that should suffice, and helps ensure reproducibility if your code actually runs -- that said, ymmv as some reviewers refuse to "believe" results (has happened to me more than once!) despite having code available, which makes one think, what was the point of working had to release code....
    – Suvrit
    Nov 25 '18 at 16:04






  • 8




    @Suvrit Interesting comment. Github is indeed a good idea, however hosting the code on a separate website removes the stand-alone nature of the publication - what if github.com ceases to exist, or the code becomes "no longer available"?
    – Klangen
    Nov 25 '18 at 16:10






  • 6




    Whatever you do with the code you produce, I think the paper accompanying the code should explain in detail what the code is doing, so that someone else who is interested in your research and who is somewhat proficient at coding could write the appropriate code themselves and verify your results.
    – Sam Hopkins
    Nov 25 '18 at 20:21






  • 1




    @SamHopkins I think you mean "independently confirm" which is stronger than "verify" (simply re-running the original code could "verify" the result, assuming you've eye-balled what the code is doing to achieve its output).
    – literature-searcher
    Nov 26 '18 at 0:04








  • 2




    @Flermat even if github goes away, if the code is anywhere on the internet, a search engine will find it -- as long as the code is findable via a search, it seems that the problem is not really a problem. Github has a lateral benefit too -- if your code is of wider appeal, somebody may fork the repo and carry the work further, and likely contribute bugfixes to your code -- so overall, worth putting it there....
    – Suvrit
    Nov 26 '18 at 1:46
















28














Many research papers include numerical results obtained through computation. Most of the time such computations are performed using software that is used by many mathematicians, i.e., Maple, Mathematica, or even C/C++ code. Should such code be included in the body of the published paper?



I've heard arguments from both sides:




  • Including such code can greatly decrease the time taken by a referee to replicate the results,

  • The code can be easily modified by further authors who wish to extend the result,

  • The reader does not need to spend time searching the journal website or the Internet for any "auxilliary files" containing the code.


On the other hand,




  • Pages of code degrade the aesthetic nature of the publication,

  • The author might need to spend additional space explaining the coding decisions that were made in the algorithms,

  • It is likely that there exist (much) better ways to write the same algorithms in the given, or any other, language.


So what is the standard in mathematical research papers that present numerical results, either as a main or as a side result? Should code be included within the body of the publication, as an auxilliary file, or not at all?










share|cite|improve this question




















  • 17




    Just provide a github.com link, that should suffice, and helps ensure reproducibility if your code actually runs -- that said, ymmv as some reviewers refuse to "believe" results (has happened to me more than once!) despite having code available, which makes one think, what was the point of working had to release code....
    – Suvrit
    Nov 25 '18 at 16:04






  • 8




    @Suvrit Interesting comment. Github is indeed a good idea, however hosting the code on a separate website removes the stand-alone nature of the publication - what if github.com ceases to exist, or the code becomes "no longer available"?
    – Klangen
    Nov 25 '18 at 16:10






  • 6




    Whatever you do with the code you produce, I think the paper accompanying the code should explain in detail what the code is doing, so that someone else who is interested in your research and who is somewhat proficient at coding could write the appropriate code themselves and verify your results.
    – Sam Hopkins
    Nov 25 '18 at 20:21






  • 1




    @SamHopkins I think you mean "independently confirm" which is stronger than "verify" (simply re-running the original code could "verify" the result, assuming you've eye-balled what the code is doing to achieve its output).
    – literature-searcher
    Nov 26 '18 at 0:04








  • 2




    @Flermat even if github goes away, if the code is anywhere on the internet, a search engine will find it -- as long as the code is findable via a search, it seems that the problem is not really a problem. Github has a lateral benefit too -- if your code is of wider appeal, somebody may fork the repo and carry the work further, and likely contribute bugfixes to your code -- so overall, worth putting it there....
    – Suvrit
    Nov 26 '18 at 1:46














28












28








28


5





Many research papers include numerical results obtained through computation. Most of the time such computations are performed using software that is used by many mathematicians, i.e., Maple, Mathematica, or even C/C++ code. Should such code be included in the body of the published paper?



I've heard arguments from both sides:




  • Including such code can greatly decrease the time taken by a referee to replicate the results,

  • The code can be easily modified by further authors who wish to extend the result,

  • The reader does not need to spend time searching the journal website or the Internet for any "auxilliary files" containing the code.


On the other hand,




  • Pages of code degrade the aesthetic nature of the publication,

  • The author might need to spend additional space explaining the coding decisions that were made in the algorithms,

  • It is likely that there exist (much) better ways to write the same algorithms in the given, or any other, language.


So what is the standard in mathematical research papers that present numerical results, either as a main or as a side result? Should code be included within the body of the publication, as an auxilliary file, or not at all?










share|cite|improve this question















Many research papers include numerical results obtained through computation. Most of the time such computations are performed using software that is used by many mathematicians, i.e., Maple, Mathematica, or even C/C++ code. Should such code be included in the body of the published paper?



I've heard arguments from both sides:




  • Including such code can greatly decrease the time taken by a referee to replicate the results,

  • The code can be easily modified by further authors who wish to extend the result,

  • The reader does not need to spend time searching the journal website or the Internet for any "auxilliary files" containing the code.


On the other hand,




  • Pages of code degrade the aesthetic nature of the publication,

  • The author might need to spend additional space explaining the coding decisions that were made in the algorithms,

  • It is likely that there exist (much) better ways to write the same algorithms in the given, or any other, language.


So what is the standard in mathematical research papers that present numerical results, either as a main or as a side result? Should code be included within the body of the publication, as an auxilliary file, or not at all?







soft-question na.numerical-analysis journals






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








asked Nov 25 '18 at 16:00


























community wiki





Klangen









  • 17




    Just provide a github.com link, that should suffice, and helps ensure reproducibility if your code actually runs -- that said, ymmv as some reviewers refuse to "believe" results (has happened to me more than once!) despite having code available, which makes one think, what was the point of working had to release code....
    – Suvrit
    Nov 25 '18 at 16:04






  • 8




    @Suvrit Interesting comment. Github is indeed a good idea, however hosting the code on a separate website removes the stand-alone nature of the publication - what if github.com ceases to exist, or the code becomes "no longer available"?
    – Klangen
    Nov 25 '18 at 16:10






  • 6




    Whatever you do with the code you produce, I think the paper accompanying the code should explain in detail what the code is doing, so that someone else who is interested in your research and who is somewhat proficient at coding could write the appropriate code themselves and verify your results.
    – Sam Hopkins
    Nov 25 '18 at 20:21






  • 1




    @SamHopkins I think you mean "independently confirm" which is stronger than "verify" (simply re-running the original code could "verify" the result, assuming you've eye-balled what the code is doing to achieve its output).
    – literature-searcher
    Nov 26 '18 at 0:04








  • 2




    @Flermat even if github goes away, if the code is anywhere on the internet, a search engine will find it -- as long as the code is findable via a search, it seems that the problem is not really a problem. Github has a lateral benefit too -- if your code is of wider appeal, somebody may fork the repo and carry the work further, and likely contribute bugfixes to your code -- so overall, worth putting it there....
    – Suvrit
    Nov 26 '18 at 1:46














  • 17




    Just provide a github.com link, that should suffice, and helps ensure reproducibility if your code actually runs -- that said, ymmv as some reviewers refuse to "believe" results (has happened to me more than once!) despite having code available, which makes one think, what was the point of working had to release code....
    – Suvrit
    Nov 25 '18 at 16:04






  • 8




    @Suvrit Interesting comment. Github is indeed a good idea, however hosting the code on a separate website removes the stand-alone nature of the publication - what if github.com ceases to exist, or the code becomes "no longer available"?
    – Klangen
    Nov 25 '18 at 16:10






  • 6




    Whatever you do with the code you produce, I think the paper accompanying the code should explain in detail what the code is doing, so that someone else who is interested in your research and who is somewhat proficient at coding could write the appropriate code themselves and verify your results.
    – Sam Hopkins
    Nov 25 '18 at 20:21






  • 1




    @SamHopkins I think you mean "independently confirm" which is stronger than "verify" (simply re-running the original code could "verify" the result, assuming you've eye-balled what the code is doing to achieve its output).
    – literature-searcher
    Nov 26 '18 at 0:04








  • 2




    @Flermat even if github goes away, if the code is anywhere on the internet, a search engine will find it -- as long as the code is findable via a search, it seems that the problem is not really a problem. Github has a lateral benefit too -- if your code is of wider appeal, somebody may fork the repo and carry the work further, and likely contribute bugfixes to your code -- so overall, worth putting it there....
    – Suvrit
    Nov 26 '18 at 1:46








17




17




Just provide a github.com link, that should suffice, and helps ensure reproducibility if your code actually runs -- that said, ymmv as some reviewers refuse to "believe" results (has happened to me more than once!) despite having code available, which makes one think, what was the point of working had to release code....
– Suvrit
Nov 25 '18 at 16:04




Just provide a github.com link, that should suffice, and helps ensure reproducibility if your code actually runs -- that said, ymmv as some reviewers refuse to "believe" results (has happened to me more than once!) despite having code available, which makes one think, what was the point of working had to release code....
– Suvrit
Nov 25 '18 at 16:04




8




8




@Suvrit Interesting comment. Github is indeed a good idea, however hosting the code on a separate website removes the stand-alone nature of the publication - what if github.com ceases to exist, or the code becomes "no longer available"?
– Klangen
Nov 25 '18 at 16:10




@Suvrit Interesting comment. Github is indeed a good idea, however hosting the code on a separate website removes the stand-alone nature of the publication - what if github.com ceases to exist, or the code becomes "no longer available"?
– Klangen
Nov 25 '18 at 16:10




6




6




Whatever you do with the code you produce, I think the paper accompanying the code should explain in detail what the code is doing, so that someone else who is interested in your research and who is somewhat proficient at coding could write the appropriate code themselves and verify your results.
– Sam Hopkins
Nov 25 '18 at 20:21




Whatever you do with the code you produce, I think the paper accompanying the code should explain in detail what the code is doing, so that someone else who is interested in your research and who is somewhat proficient at coding could write the appropriate code themselves and verify your results.
– Sam Hopkins
Nov 25 '18 at 20:21




1




1




@SamHopkins I think you mean "independently confirm" which is stronger than "verify" (simply re-running the original code could "verify" the result, assuming you've eye-balled what the code is doing to achieve its output).
– literature-searcher
Nov 26 '18 at 0:04






@SamHopkins I think you mean "independently confirm" which is stronger than "verify" (simply re-running the original code could "verify" the result, assuming you've eye-balled what the code is doing to achieve its output).
– literature-searcher
Nov 26 '18 at 0:04






2




2




@Flermat even if github goes away, if the code is anywhere on the internet, a search engine will find it -- as long as the code is findable via a search, it seems that the problem is not really a problem. Github has a lateral benefit too -- if your code is of wider appeal, somebody may fork the repo and carry the work further, and likely contribute bugfixes to your code -- so overall, worth putting it there....
– Suvrit
Nov 26 '18 at 1:46




@Flermat even if github goes away, if the code is anywhere on the internet, a search engine will find it -- as long as the code is findable via a search, it seems that the problem is not really a problem. Github has a lateral benefit too -- if your code is of wider appeal, somebody may fork the repo and carry the work further, and likely contribute bugfixes to your code -- so overall, worth putting it there....
– Suvrit
Nov 26 '18 at 1:46










5 Answers
5






active

oldest

votes


















26














My answer is:




Don't put code in your paper. Do: put pseudocode in your paper, version control your code on Github, and add a link to your Github repository to your paper.





  • The purpose of a paper is to be read; the purpose of code is to be executed by a computer. These purposes should not be mixed, so a readable representation of your code should be included in your paper. That is exactly why pseudocode was invented.

  • All code intended to be used by more than one person should be version controlled. This balances the two most relevant concerns: the original version of the code is preserved for posterity, but the author retains the ability to update it as bugs or improvements are discovered. (Additionally, the forking mechanism in Github allows others to transparently modify your code or apply it to their own ends.)


In fact, I am willing to make a more general argument: mathematicians should version control their papers as well. The reasons are the same: the original version still exists (with a timestamp) so that priority disputes can be settled easily, but the paper can be maintained and updated - no more errata for old papers / textbooks!



The underlying premise of this answer is that maintaining and distributing code is a software engineering problem, and to the extent that mathematicians need to solve it they should follow software engineers' lead. This has two advantages: on one hand software engineers have a much more severe version of the problem and will therefore solve it better, and on the other hand as the solutions inevitably change they will be accompanied by tools and strategies for migrating old code into new frameworks which is ultimately the best way to ensure that the code survives as long as possible.






share|cite|improve this answer



















  • 6




    I would suggest augmenting this with a fixed snapshot of the code as auxiliary files on the arxiv (as an addition to github, not a replacement).
    – Neil Strickland
    Nov 26 '18 at 9:50






  • 1




    Why specifically github? Anyone sharing code must have a github account and use git?
    – Dror Speiser
    Nov 26 '18 at 14:22










  • @Dror you neither need an account nor git to access the source code on github. If github goes down, well, that's another matter.
    – rubenvb
    Nov 26 '18 at 15:43






  • 2




    @rubenvb I believe Dror Speiser was referring more to the restriction of using git/Github specifically, as there are many possible (free) online repository sites to use and multiple version control tools available. Git is the most popular right now, but Mercury is another viable one, and many people still use SVN. Ultimately, though, you do have to make a choice of which to use if you want to use web-hosted version control. (A personal server could work as well but may be less reliable and may introduce security risks if you are not used to self-hosting.)
    – JAB
    Nov 26 '18 at 16:41








  • 1




    @DrorSpeiser The reason I specifically recommended Github is basically the last paragraph of my answer: mathematicians don't really have specialized needs for distributing and maintaining code (quite the opposite), so they should use whatever the industry standard is. Not only is it easier to learn and maintain, but if/when Github falls out of favor the software engineering community will produce a robust method for migrating onto some new system. Basically the minor differences between the various version control systems are outweighed by the advantages of standardization.
    – Paul Siegel
    Nov 26 '18 at 17:04



















15














At least in my field (numerical linear algebra), the current standard is that including the full source code is not mandatory for a publication. That said, there are many reasons why sharing your code is a good idea; for instance this article on SIAM news makes some very compelling arguments.



Unless it's just a few lines, it is quite unusual to have code included verbatim in the publications. It would be cumbersome to copy and paste, for instance. Common solutions are:




  • hosting it on your institutional page

  • offering to share the source code to interested parties via e-mail

  • having a Github repository

  • including it into the Arxiv version of your paper as an ancillary_file

  • sharing it on Zenodo.


If you are concerned about long-time archival, the last two items in my list are meant to solve this problem; although it could be argued that also Github is becoming "too big to fail" these days.






share|cite|improve this answer























  • These are all interesting options. But how do they fare within a copyright context? I.e., some journals explicitly forbid the publication of parts of the paper in public. Hosting the code on Github would therefore be in violation of their terms.
    – Klangen
    Nov 25 '18 at 21:31






  • 2




    @Flermat If you don't include the source code in the body of the paper, then the publishers cannot have any copyright claim on it.
    – Federico Poloni
    Nov 25 '18 at 21:33










  • @FredericoPoloni Yet it is part of the "work", however the journal decides to interpret that term.
    – Klangen
    Nov 25 '18 at 21:36






  • 8




    @Flermat They can claim they have copyright on the code, but (1) I don't think they ever did in practice, (2) I don't think it's going to hold up in court anyway, and (3) the journals would have to fear some serious backlash from the mathematical community if they tried doing that.
    – Federico Poloni
    Nov 25 '18 at 21:45



















9














I think that Federico Poloni's answer gives good advice as of 2018, but as a mathematical community I think we should be thinking harder about this question. Simply making source code available, even via something like the arXiv which will be around "forever", is not a complete solution, because source code may be nearly useless after (say) 50 years because the compilers are no longer readily available, or worse, the code runs only on some proprietary software that no longer exists. This concern applies even if the computation has been formalized in a proof assistant, since who knows if today's proof assistants will be around 50 years from now?



One idea would be for professional societies such as the American Mathematical Society to develop a long-term archival plan, perhaps collaborating with government entities such as the Library of Congress.






share|cite|improve this answer



















  • 11




    Even if today's compilers die off in a few decades, there will be emulators and interpreters. Even better, there may be high level translators which will convert the source code to future source code automatically, so that the old ideas can be propagated. Gerhard "See, It's Really About Ideas" Paseman, 2018.11.25.
    – Gerhard Paseman
    Nov 25 '18 at 20:35






  • 3




    Yup - emulation has done a lot to mitigate this problem in the past years. You can even run an IBM PC, or C64 games inside your browser, for instance.
    – Federico Poloni
    Nov 25 '18 at 20:51








  • 9




    Emulators are only a partial solution. Suppose my code requires a specific version of Mathematica or CPLEX or even of Sage (which in turn might require a specific version of Python). Forget about 50 years in the future---I often have trouble running a colleague's code on my machine today.
    – Timothy Chow
    Nov 25 '18 at 21:21






  • 5




    @GerhardPaseman : The dream of automatic conversion to new formats is an old one and has already been shattered. The Library of Congress already has a ton of old electronic media that is effectively inaccessible and lacks the budget to deal with it. It's not just the software but the manpower to perform the conversions on a massive scale. I remember reading about how the LC had to borrow a machine from the Smithsonian to try to read some old electronic media.
    – Timothy Chow
    Nov 25 '18 at 21:25






  • 4




    @AndreiSmolensky : I believe that this is changing. To cite one example I know well: the proof of the q-TSPP conjecture by Kauers, Koutschan and Zeilberger relies crucially on some Mathematica computations. The authors make the Mathematica notebook available but there is still the problem that Mathematica is proprietary. And I am confident that the q-TSPP result will be of interest 50 years from now. By then there may be a shorter proof, but there is no guarantee of that. Or for a more famous example, what about the Kepler conjecture?
    – Timothy Chow
    Nov 25 '18 at 23:07





















3














This is more of a very extended comment than a complete answer.



I tend to find "should" questions boiling down as much to values as much as anything; "should" in order to achieve what?



Let me suggest that we need to understand a few things:




  • The advantages

  • The disadvantages

  • Is there a real problem with reproducibility that needs fixed?

  • The cost of not doing so or of doing so halfheartedly.

  • The opportunity cost or motivational/ funding challenges

  • Variation between sub fields

  • Technical challenges, short and long term

  • Expectations or even standards

  • Cultural challenges


I'll try to avoid repeating the observations from the existing answers and comments, but let me add some thoughts:




  • In software engineering, the process for shipping code is very different from the typical mathematical program. A key reason for this is quality, and of these correctness is the most important element. That is a value of overwhelming importance in any proof, so maybe open code and peer review would be a good thing.

  • Related to that: writing code to be read is different to writing code to just convince oneself; how are mathematicians to learn that?

  • There is a difference between learning enough about programming to get a result and the skills needed to write good tests, make code readable and convince readers that the code is valid. I'd ask, if you have not done that well, how do you expect credibility of your conclusions?

  • What is the penalty for coding errors as things stand? I would have thought that in maths, publishing results that are subsequently proven false would not do one's career any good. This compares interestingly to other fields in science where to some extent one expects many "results" in papers to be subsequently not borne out. Interesting to hear feedback on this one as to what happens in practice.

  • Do people feel that time spent publishing code would be unproductive?

  • A software engineering style code review is not anonymous (at least usually); is this a problem?

  • There is an argument to use "lowest common denominator" languages that might be old but that proves their longevity and wide accessibility; e.g. 'C'.

  • Timothy Chow noted use of notebooks; they provide a great way to document code and the overall approach; I can see these becoming more and more used. Interestingly, I think this might conflict with "lowest common denominator" languages, as the notebook hosting language (Jupiter or Mathematica) might have less longevity.






share|cite|improve this answer































    3














    There are some issues that are not emphasised enough in the previous comments and answers. Having the source code used by an author does not let you check that the author's theorems are correct. It only lets you check that the program does what the author claims. Transcribing the program output to the published paper is the step where an error is least likely to have occurred. Much more likely is an error in the program.



    So, can you eyeball the program to check if it is correct? Not unless it is a very short simple program. I publish articles that rely on tens of thousands of lines of code that took me and others months of hard work to write and debug. Your chances of looking at it and checking its correctness in a reasonable amount of time are next to zero. One day there will be programs that can check correctness for you; the beginnings exist today but generally useful checkers are still a long way off.



    So what to do? If you are an author, get a coauthor and aim for separately implemented programs that get the same result, hopefully using different methods. (An axiom of software engineering is that programmers solving the same problem using the same method tend to make the same mistakes.) Intermediate results are very useful for checking, especially when the final answer has low entropy (like "yes" or "empty set").



    Another fact is that problems which needed very tricky programming and bulk computer time 20 years ago can now be solved in a reasonable time using simpler programs. Presumably that trend will continue. Any computational result that is important enough will eventually be replicated independently without so much effort.






    share|cite|improve this answer























      Your Answer





      StackExchange.ifUsing("editor", function () {
      return StackExchange.using("mathjaxEditing", function () {
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      });
      });
      }, "mathjax-editing");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "504"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      noCode: true, onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathoverflow.net%2fquestions%2f316155%2fshould-computer-code-be-included-within-publications-that-present-numerical-resu%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      5 Answers
      5






      active

      oldest

      votes








      5 Answers
      5






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      26














      My answer is:




      Don't put code in your paper. Do: put pseudocode in your paper, version control your code on Github, and add a link to your Github repository to your paper.





      • The purpose of a paper is to be read; the purpose of code is to be executed by a computer. These purposes should not be mixed, so a readable representation of your code should be included in your paper. That is exactly why pseudocode was invented.

      • All code intended to be used by more than one person should be version controlled. This balances the two most relevant concerns: the original version of the code is preserved for posterity, but the author retains the ability to update it as bugs or improvements are discovered. (Additionally, the forking mechanism in Github allows others to transparently modify your code or apply it to their own ends.)


      In fact, I am willing to make a more general argument: mathematicians should version control their papers as well. The reasons are the same: the original version still exists (with a timestamp) so that priority disputes can be settled easily, but the paper can be maintained and updated - no more errata for old papers / textbooks!



      The underlying premise of this answer is that maintaining and distributing code is a software engineering problem, and to the extent that mathematicians need to solve it they should follow software engineers' lead. This has two advantages: on one hand software engineers have a much more severe version of the problem and will therefore solve it better, and on the other hand as the solutions inevitably change they will be accompanied by tools and strategies for migrating old code into new frameworks which is ultimately the best way to ensure that the code survives as long as possible.






      share|cite|improve this answer



















      • 6




        I would suggest augmenting this with a fixed snapshot of the code as auxiliary files on the arxiv (as an addition to github, not a replacement).
        – Neil Strickland
        Nov 26 '18 at 9:50






      • 1




        Why specifically github? Anyone sharing code must have a github account and use git?
        – Dror Speiser
        Nov 26 '18 at 14:22










      • @Dror you neither need an account nor git to access the source code on github. If github goes down, well, that's another matter.
        – rubenvb
        Nov 26 '18 at 15:43






      • 2




        @rubenvb I believe Dror Speiser was referring more to the restriction of using git/Github specifically, as there are many possible (free) online repository sites to use and multiple version control tools available. Git is the most popular right now, but Mercury is another viable one, and many people still use SVN. Ultimately, though, you do have to make a choice of which to use if you want to use web-hosted version control. (A personal server could work as well but may be less reliable and may introduce security risks if you are not used to self-hosting.)
        – JAB
        Nov 26 '18 at 16:41








      • 1




        @DrorSpeiser The reason I specifically recommended Github is basically the last paragraph of my answer: mathematicians don't really have specialized needs for distributing and maintaining code (quite the opposite), so they should use whatever the industry standard is. Not only is it easier to learn and maintain, but if/when Github falls out of favor the software engineering community will produce a robust method for migrating onto some new system. Basically the minor differences between the various version control systems are outweighed by the advantages of standardization.
        – Paul Siegel
        Nov 26 '18 at 17:04
















      26














      My answer is:




      Don't put code in your paper. Do: put pseudocode in your paper, version control your code on Github, and add a link to your Github repository to your paper.





      • The purpose of a paper is to be read; the purpose of code is to be executed by a computer. These purposes should not be mixed, so a readable representation of your code should be included in your paper. That is exactly why pseudocode was invented.

      • All code intended to be used by more than one person should be version controlled. This balances the two most relevant concerns: the original version of the code is preserved for posterity, but the author retains the ability to update it as bugs or improvements are discovered. (Additionally, the forking mechanism in Github allows others to transparently modify your code or apply it to their own ends.)


      In fact, I am willing to make a more general argument: mathematicians should version control their papers as well. The reasons are the same: the original version still exists (with a timestamp) so that priority disputes can be settled easily, but the paper can be maintained and updated - no more errata for old papers / textbooks!



      The underlying premise of this answer is that maintaining and distributing code is a software engineering problem, and to the extent that mathematicians need to solve it they should follow software engineers' lead. This has two advantages: on one hand software engineers have a much more severe version of the problem and will therefore solve it better, and on the other hand as the solutions inevitably change they will be accompanied by tools and strategies for migrating old code into new frameworks which is ultimately the best way to ensure that the code survives as long as possible.






      share|cite|improve this answer



















      • 6




        I would suggest augmenting this with a fixed snapshot of the code as auxiliary files on the arxiv (as an addition to github, not a replacement).
        – Neil Strickland
        Nov 26 '18 at 9:50






      • 1




        Why specifically github? Anyone sharing code must have a github account and use git?
        – Dror Speiser
        Nov 26 '18 at 14:22










      • @Dror you neither need an account nor git to access the source code on github. If github goes down, well, that's another matter.
        – rubenvb
        Nov 26 '18 at 15:43






      • 2




        @rubenvb I believe Dror Speiser was referring more to the restriction of using git/Github specifically, as there are many possible (free) online repository sites to use and multiple version control tools available. Git is the most popular right now, but Mercury is another viable one, and many people still use SVN. Ultimately, though, you do have to make a choice of which to use if you want to use web-hosted version control. (A personal server could work as well but may be less reliable and may introduce security risks if you are not used to self-hosting.)
        – JAB
        Nov 26 '18 at 16:41








      • 1




        @DrorSpeiser The reason I specifically recommended Github is basically the last paragraph of my answer: mathematicians don't really have specialized needs for distributing and maintaining code (quite the opposite), so they should use whatever the industry standard is. Not only is it easier to learn and maintain, but if/when Github falls out of favor the software engineering community will produce a robust method for migrating onto some new system. Basically the minor differences between the various version control systems are outweighed by the advantages of standardization.
        – Paul Siegel
        Nov 26 '18 at 17:04














      26












      26








      26






      My answer is:




      Don't put code in your paper. Do: put pseudocode in your paper, version control your code on Github, and add a link to your Github repository to your paper.





      • The purpose of a paper is to be read; the purpose of code is to be executed by a computer. These purposes should not be mixed, so a readable representation of your code should be included in your paper. That is exactly why pseudocode was invented.

      • All code intended to be used by more than one person should be version controlled. This balances the two most relevant concerns: the original version of the code is preserved for posterity, but the author retains the ability to update it as bugs or improvements are discovered. (Additionally, the forking mechanism in Github allows others to transparently modify your code or apply it to their own ends.)


      In fact, I am willing to make a more general argument: mathematicians should version control their papers as well. The reasons are the same: the original version still exists (with a timestamp) so that priority disputes can be settled easily, but the paper can be maintained and updated - no more errata for old papers / textbooks!



      The underlying premise of this answer is that maintaining and distributing code is a software engineering problem, and to the extent that mathematicians need to solve it they should follow software engineers' lead. This has two advantages: on one hand software engineers have a much more severe version of the problem and will therefore solve it better, and on the other hand as the solutions inevitably change they will be accompanied by tools and strategies for migrating old code into new frameworks which is ultimately the best way to ensure that the code survives as long as possible.






      share|cite|improve this answer














      My answer is:




      Don't put code in your paper. Do: put pseudocode in your paper, version control your code on Github, and add a link to your Github repository to your paper.





      • The purpose of a paper is to be read; the purpose of code is to be executed by a computer. These purposes should not be mixed, so a readable representation of your code should be included in your paper. That is exactly why pseudocode was invented.

      • All code intended to be used by more than one person should be version controlled. This balances the two most relevant concerns: the original version of the code is preserved for posterity, but the author retains the ability to update it as bugs or improvements are discovered. (Additionally, the forking mechanism in Github allows others to transparently modify your code or apply it to their own ends.)


      In fact, I am willing to make a more general argument: mathematicians should version control their papers as well. The reasons are the same: the original version still exists (with a timestamp) so that priority disputes can be settled easily, but the paper can be maintained and updated - no more errata for old papers / textbooks!



      The underlying premise of this answer is that maintaining and distributing code is a software engineering problem, and to the extent that mathematicians need to solve it they should follow software engineers' lead. This has two advantages: on one hand software engineers have a much more severe version of the problem and will therefore solve it better, and on the other hand as the solutions inevitably change they will be accompanied by tools and strategies for migrating old code into new frameworks which is ultimately the best way to ensure that the code survives as long as possible.







      share|cite|improve this answer














      share|cite|improve this answer



      share|cite|improve this answer








      answered Nov 26 '18 at 1:03


























      community wiki





      Paul Siegel









      • 6




        I would suggest augmenting this with a fixed snapshot of the code as auxiliary files on the arxiv (as an addition to github, not a replacement).
        – Neil Strickland
        Nov 26 '18 at 9:50






      • 1




        Why specifically github? Anyone sharing code must have a github account and use git?
        – Dror Speiser
        Nov 26 '18 at 14:22










      • @Dror you neither need an account nor git to access the source code on github. If github goes down, well, that's another matter.
        – rubenvb
        Nov 26 '18 at 15:43






      • 2




        @rubenvb I believe Dror Speiser was referring more to the restriction of using git/Github specifically, as there are many possible (free) online repository sites to use and multiple version control tools available. Git is the most popular right now, but Mercury is another viable one, and many people still use SVN. Ultimately, though, you do have to make a choice of which to use if you want to use web-hosted version control. (A personal server could work as well but may be less reliable and may introduce security risks if you are not used to self-hosting.)
        – JAB
        Nov 26 '18 at 16:41








      • 1




        @DrorSpeiser The reason I specifically recommended Github is basically the last paragraph of my answer: mathematicians don't really have specialized needs for distributing and maintaining code (quite the opposite), so they should use whatever the industry standard is. Not only is it easier to learn and maintain, but if/when Github falls out of favor the software engineering community will produce a robust method for migrating onto some new system. Basically the minor differences between the various version control systems are outweighed by the advantages of standardization.
        – Paul Siegel
        Nov 26 '18 at 17:04














      • 6




        I would suggest augmenting this with a fixed snapshot of the code as auxiliary files on the arxiv (as an addition to github, not a replacement).
        – Neil Strickland
        Nov 26 '18 at 9:50






      • 1




        Why specifically github? Anyone sharing code must have a github account and use git?
        – Dror Speiser
        Nov 26 '18 at 14:22










      • @Dror you neither need an account nor git to access the source code on github. If github goes down, well, that's another matter.
        – rubenvb
        Nov 26 '18 at 15:43






      • 2




        @rubenvb I believe Dror Speiser was referring more to the restriction of using git/Github specifically, as there are many possible (free) online repository sites to use and multiple version control tools available. Git is the most popular right now, but Mercury is another viable one, and many people still use SVN. Ultimately, though, you do have to make a choice of which to use if you want to use web-hosted version control. (A personal server could work as well but may be less reliable and may introduce security risks if you are not used to self-hosting.)
        – JAB
        Nov 26 '18 at 16:41








      • 1




        @DrorSpeiser The reason I specifically recommended Github is basically the last paragraph of my answer: mathematicians don't really have specialized needs for distributing and maintaining code (quite the opposite), so they should use whatever the industry standard is. Not only is it easier to learn and maintain, but if/when Github falls out of favor the software engineering community will produce a robust method for migrating onto some new system. Basically the minor differences between the various version control systems are outweighed by the advantages of standardization.
        – Paul Siegel
        Nov 26 '18 at 17:04








      6




      6




      I would suggest augmenting this with a fixed snapshot of the code as auxiliary files on the arxiv (as an addition to github, not a replacement).
      – Neil Strickland
      Nov 26 '18 at 9:50




      I would suggest augmenting this with a fixed snapshot of the code as auxiliary files on the arxiv (as an addition to github, not a replacement).
      – Neil Strickland
      Nov 26 '18 at 9:50




      1




      1




      Why specifically github? Anyone sharing code must have a github account and use git?
      – Dror Speiser
      Nov 26 '18 at 14:22




      Why specifically github? Anyone sharing code must have a github account and use git?
      – Dror Speiser
      Nov 26 '18 at 14:22












      @Dror you neither need an account nor git to access the source code on github. If github goes down, well, that's another matter.
      – rubenvb
      Nov 26 '18 at 15:43




      @Dror you neither need an account nor git to access the source code on github. If github goes down, well, that's another matter.
      – rubenvb
      Nov 26 '18 at 15:43




      2




      2




      @rubenvb I believe Dror Speiser was referring more to the restriction of using git/Github specifically, as there are many possible (free) online repository sites to use and multiple version control tools available. Git is the most popular right now, but Mercury is another viable one, and many people still use SVN. Ultimately, though, you do have to make a choice of which to use if you want to use web-hosted version control. (A personal server could work as well but may be less reliable and may introduce security risks if you are not used to self-hosting.)
      – JAB
      Nov 26 '18 at 16:41






      @rubenvb I believe Dror Speiser was referring more to the restriction of using git/Github specifically, as there are many possible (free) online repository sites to use and multiple version control tools available. Git is the most popular right now, but Mercury is another viable one, and many people still use SVN. Ultimately, though, you do have to make a choice of which to use if you want to use web-hosted version control. (A personal server could work as well but may be less reliable and may introduce security risks if you are not used to self-hosting.)
      – JAB
      Nov 26 '18 at 16:41






      1




      1




      @DrorSpeiser The reason I specifically recommended Github is basically the last paragraph of my answer: mathematicians don't really have specialized needs for distributing and maintaining code (quite the opposite), so they should use whatever the industry standard is. Not only is it easier to learn and maintain, but if/when Github falls out of favor the software engineering community will produce a robust method for migrating onto some new system. Basically the minor differences between the various version control systems are outweighed by the advantages of standardization.
      – Paul Siegel
      Nov 26 '18 at 17:04




      @DrorSpeiser The reason I specifically recommended Github is basically the last paragraph of my answer: mathematicians don't really have specialized needs for distributing and maintaining code (quite the opposite), so they should use whatever the industry standard is. Not only is it easier to learn and maintain, but if/when Github falls out of favor the software engineering community will produce a robust method for migrating onto some new system. Basically the minor differences between the various version control systems are outweighed by the advantages of standardization.
      – Paul Siegel
      Nov 26 '18 at 17:04











      15














      At least in my field (numerical linear algebra), the current standard is that including the full source code is not mandatory for a publication. That said, there are many reasons why sharing your code is a good idea; for instance this article on SIAM news makes some very compelling arguments.



      Unless it's just a few lines, it is quite unusual to have code included verbatim in the publications. It would be cumbersome to copy and paste, for instance. Common solutions are:




      • hosting it on your institutional page

      • offering to share the source code to interested parties via e-mail

      • having a Github repository

      • including it into the Arxiv version of your paper as an ancillary_file

      • sharing it on Zenodo.


      If you are concerned about long-time archival, the last two items in my list are meant to solve this problem; although it could be argued that also Github is becoming "too big to fail" these days.






      share|cite|improve this answer























      • These are all interesting options. But how do they fare within a copyright context? I.e., some journals explicitly forbid the publication of parts of the paper in public. Hosting the code on Github would therefore be in violation of their terms.
        – Klangen
        Nov 25 '18 at 21:31






      • 2




        @Flermat If you don't include the source code in the body of the paper, then the publishers cannot have any copyright claim on it.
        – Federico Poloni
        Nov 25 '18 at 21:33










      • @FredericoPoloni Yet it is part of the "work", however the journal decides to interpret that term.
        – Klangen
        Nov 25 '18 at 21:36






      • 8




        @Flermat They can claim they have copyright on the code, but (1) I don't think they ever did in practice, (2) I don't think it's going to hold up in court anyway, and (3) the journals would have to fear some serious backlash from the mathematical community if they tried doing that.
        – Federico Poloni
        Nov 25 '18 at 21:45
















      15














      At least in my field (numerical linear algebra), the current standard is that including the full source code is not mandatory for a publication. That said, there are many reasons why sharing your code is a good idea; for instance this article on SIAM news makes some very compelling arguments.



      Unless it's just a few lines, it is quite unusual to have code included verbatim in the publications. It would be cumbersome to copy and paste, for instance. Common solutions are:




      • hosting it on your institutional page

      • offering to share the source code to interested parties via e-mail

      • having a Github repository

      • including it into the Arxiv version of your paper as an ancillary_file

      • sharing it on Zenodo.


      If you are concerned about long-time archival, the last two items in my list are meant to solve this problem; although it could be argued that also Github is becoming "too big to fail" these days.






      share|cite|improve this answer























      • These are all interesting options. But how do they fare within a copyright context? I.e., some journals explicitly forbid the publication of parts of the paper in public. Hosting the code on Github would therefore be in violation of their terms.
        – Klangen
        Nov 25 '18 at 21:31






      • 2




        @Flermat If you don't include the source code in the body of the paper, then the publishers cannot have any copyright claim on it.
        – Federico Poloni
        Nov 25 '18 at 21:33










      • @FredericoPoloni Yet it is part of the "work", however the journal decides to interpret that term.
        – Klangen
        Nov 25 '18 at 21:36






      • 8




        @Flermat They can claim they have copyright on the code, but (1) I don't think they ever did in practice, (2) I don't think it's going to hold up in court anyway, and (3) the journals would have to fear some serious backlash from the mathematical community if they tried doing that.
        – Federico Poloni
        Nov 25 '18 at 21:45














      15












      15








      15






      At least in my field (numerical linear algebra), the current standard is that including the full source code is not mandatory for a publication. That said, there are many reasons why sharing your code is a good idea; for instance this article on SIAM news makes some very compelling arguments.



      Unless it's just a few lines, it is quite unusual to have code included verbatim in the publications. It would be cumbersome to copy and paste, for instance. Common solutions are:




      • hosting it on your institutional page

      • offering to share the source code to interested parties via e-mail

      • having a Github repository

      • including it into the Arxiv version of your paper as an ancillary_file

      • sharing it on Zenodo.


      If you are concerned about long-time archival, the last two items in my list are meant to solve this problem; although it could be argued that also Github is becoming "too big to fail" these days.






      share|cite|improve this answer














      At least in my field (numerical linear algebra), the current standard is that including the full source code is not mandatory for a publication. That said, there are many reasons why sharing your code is a good idea; for instance this article on SIAM news makes some very compelling arguments.



      Unless it's just a few lines, it is quite unusual to have code included verbatim in the publications. It would be cumbersome to copy and paste, for instance. Common solutions are:




      • hosting it on your institutional page

      • offering to share the source code to interested parties via e-mail

      • having a Github repository

      • including it into the Arxiv version of your paper as an ancillary_file

      • sharing it on Zenodo.


      If you are concerned about long-time archival, the last two items in my list are meant to solve this problem; although it could be argued that also Github is becoming "too big to fail" these days.







      share|cite|improve this answer














      share|cite|improve this answer



      share|cite|improve this answer








      answered Nov 25 '18 at 16:48


























      community wiki





      Federico Poloni













      • These are all interesting options. But how do they fare within a copyright context? I.e., some journals explicitly forbid the publication of parts of the paper in public. Hosting the code on Github would therefore be in violation of their terms.
        – Klangen
        Nov 25 '18 at 21:31






      • 2




        @Flermat If you don't include the source code in the body of the paper, then the publishers cannot have any copyright claim on it.
        – Federico Poloni
        Nov 25 '18 at 21:33










      • @FredericoPoloni Yet it is part of the "work", however the journal decides to interpret that term.
        – Klangen
        Nov 25 '18 at 21:36






      • 8




        @Flermat They can claim they have copyright on the code, but (1) I don't think they ever did in practice, (2) I don't think it's going to hold up in court anyway, and (3) the journals would have to fear some serious backlash from the mathematical community if they tried doing that.
        – Federico Poloni
        Nov 25 '18 at 21:45


















      • These are all interesting options. But how do they fare within a copyright context? I.e., some journals explicitly forbid the publication of parts of the paper in public. Hosting the code on Github would therefore be in violation of their terms.
        – Klangen
        Nov 25 '18 at 21:31






      • 2




        @Flermat If you don't include the source code in the body of the paper, then the publishers cannot have any copyright claim on it.
        – Federico Poloni
        Nov 25 '18 at 21:33










      • @FredericoPoloni Yet it is part of the "work", however the journal decides to interpret that term.
        – Klangen
        Nov 25 '18 at 21:36






      • 8




        @Flermat They can claim they have copyright on the code, but (1) I don't think they ever did in practice, (2) I don't think it's going to hold up in court anyway, and (3) the journals would have to fear some serious backlash from the mathematical community if they tried doing that.
        – Federico Poloni
        Nov 25 '18 at 21:45
















      These are all interesting options. But how do they fare within a copyright context? I.e., some journals explicitly forbid the publication of parts of the paper in public. Hosting the code on Github would therefore be in violation of their terms.
      – Klangen
      Nov 25 '18 at 21:31




      These are all interesting options. But how do they fare within a copyright context? I.e., some journals explicitly forbid the publication of parts of the paper in public. Hosting the code on Github would therefore be in violation of their terms.
      – Klangen
      Nov 25 '18 at 21:31




      2




      2




      @Flermat If you don't include the source code in the body of the paper, then the publishers cannot have any copyright claim on it.
      – Federico Poloni
      Nov 25 '18 at 21:33




      @Flermat If you don't include the source code in the body of the paper, then the publishers cannot have any copyright claim on it.
      – Federico Poloni
      Nov 25 '18 at 21:33












      @FredericoPoloni Yet it is part of the "work", however the journal decides to interpret that term.
      – Klangen
      Nov 25 '18 at 21:36




      @FredericoPoloni Yet it is part of the "work", however the journal decides to interpret that term.
      – Klangen
      Nov 25 '18 at 21:36




      8




      8




      @Flermat They can claim they have copyright on the code, but (1) I don't think they ever did in practice, (2) I don't think it's going to hold up in court anyway, and (3) the journals would have to fear some serious backlash from the mathematical community if they tried doing that.
      – Federico Poloni
      Nov 25 '18 at 21:45




      @Flermat They can claim they have copyright on the code, but (1) I don't think they ever did in practice, (2) I don't think it's going to hold up in court anyway, and (3) the journals would have to fear some serious backlash from the mathematical community if they tried doing that.
      – Federico Poloni
      Nov 25 '18 at 21:45











      9














      I think that Federico Poloni's answer gives good advice as of 2018, but as a mathematical community I think we should be thinking harder about this question. Simply making source code available, even via something like the arXiv which will be around "forever", is not a complete solution, because source code may be nearly useless after (say) 50 years because the compilers are no longer readily available, or worse, the code runs only on some proprietary software that no longer exists. This concern applies even if the computation has been formalized in a proof assistant, since who knows if today's proof assistants will be around 50 years from now?



      One idea would be for professional societies such as the American Mathematical Society to develop a long-term archival plan, perhaps collaborating with government entities such as the Library of Congress.






      share|cite|improve this answer



















      • 11




        Even if today's compilers die off in a few decades, there will be emulators and interpreters. Even better, there may be high level translators which will convert the source code to future source code automatically, so that the old ideas can be propagated. Gerhard "See, It's Really About Ideas" Paseman, 2018.11.25.
        – Gerhard Paseman
        Nov 25 '18 at 20:35






      • 3




        Yup - emulation has done a lot to mitigate this problem in the past years. You can even run an IBM PC, or C64 games inside your browser, for instance.
        – Federico Poloni
        Nov 25 '18 at 20:51








      • 9




        Emulators are only a partial solution. Suppose my code requires a specific version of Mathematica or CPLEX or even of Sage (which in turn might require a specific version of Python). Forget about 50 years in the future---I often have trouble running a colleague's code on my machine today.
        – Timothy Chow
        Nov 25 '18 at 21:21






      • 5




        @GerhardPaseman : The dream of automatic conversion to new formats is an old one and has already been shattered. The Library of Congress already has a ton of old electronic media that is effectively inaccessible and lacks the budget to deal with it. It's not just the software but the manpower to perform the conversions on a massive scale. I remember reading about how the LC had to borrow a machine from the Smithsonian to try to read some old electronic media.
        – Timothy Chow
        Nov 25 '18 at 21:25






      • 4




        @AndreiSmolensky : I believe that this is changing. To cite one example I know well: the proof of the q-TSPP conjecture by Kauers, Koutschan and Zeilberger relies crucially on some Mathematica computations. The authors make the Mathematica notebook available but there is still the problem that Mathematica is proprietary. And I am confident that the q-TSPP result will be of interest 50 years from now. By then there may be a shorter proof, but there is no guarantee of that. Or for a more famous example, what about the Kepler conjecture?
        – Timothy Chow
        Nov 25 '18 at 23:07


















      9














      I think that Federico Poloni's answer gives good advice as of 2018, but as a mathematical community I think we should be thinking harder about this question. Simply making source code available, even via something like the arXiv which will be around "forever", is not a complete solution, because source code may be nearly useless after (say) 50 years because the compilers are no longer readily available, or worse, the code runs only on some proprietary software that no longer exists. This concern applies even if the computation has been formalized in a proof assistant, since who knows if today's proof assistants will be around 50 years from now?



      One idea would be for professional societies such as the American Mathematical Society to develop a long-term archival plan, perhaps collaborating with government entities such as the Library of Congress.






      share|cite|improve this answer



















      • 11




        Even if today's compilers die off in a few decades, there will be emulators and interpreters. Even better, there may be high level translators which will convert the source code to future source code automatically, so that the old ideas can be propagated. Gerhard "See, It's Really About Ideas" Paseman, 2018.11.25.
        – Gerhard Paseman
        Nov 25 '18 at 20:35






      • 3




        Yup - emulation has done a lot to mitigate this problem in the past years. You can even run an IBM PC, or C64 games inside your browser, for instance.
        – Federico Poloni
        Nov 25 '18 at 20:51








      • 9




        Emulators are only a partial solution. Suppose my code requires a specific version of Mathematica or CPLEX or even of Sage (which in turn might require a specific version of Python). Forget about 50 years in the future---I often have trouble running a colleague's code on my machine today.
        – Timothy Chow
        Nov 25 '18 at 21:21






      • 5




        @GerhardPaseman : The dream of automatic conversion to new formats is an old one and has already been shattered. The Library of Congress already has a ton of old electronic media that is effectively inaccessible and lacks the budget to deal with it. It's not just the software but the manpower to perform the conversions on a massive scale. I remember reading about how the LC had to borrow a machine from the Smithsonian to try to read some old electronic media.
        – Timothy Chow
        Nov 25 '18 at 21:25






      • 4




        @AndreiSmolensky : I believe that this is changing. To cite one example I know well: the proof of the q-TSPP conjecture by Kauers, Koutschan and Zeilberger relies crucially on some Mathematica computations. The authors make the Mathematica notebook available but there is still the problem that Mathematica is proprietary. And I am confident that the q-TSPP result will be of interest 50 years from now. By then there may be a shorter proof, but there is no guarantee of that. Or for a more famous example, what about the Kepler conjecture?
        – Timothy Chow
        Nov 25 '18 at 23:07
















      9












      9








      9






      I think that Federico Poloni's answer gives good advice as of 2018, but as a mathematical community I think we should be thinking harder about this question. Simply making source code available, even via something like the arXiv which will be around "forever", is not a complete solution, because source code may be nearly useless after (say) 50 years because the compilers are no longer readily available, or worse, the code runs only on some proprietary software that no longer exists. This concern applies even if the computation has been formalized in a proof assistant, since who knows if today's proof assistants will be around 50 years from now?



      One idea would be for professional societies such as the American Mathematical Society to develop a long-term archival plan, perhaps collaborating with government entities such as the Library of Congress.






      share|cite|improve this answer














      I think that Federico Poloni's answer gives good advice as of 2018, but as a mathematical community I think we should be thinking harder about this question. Simply making source code available, even via something like the arXiv which will be around "forever", is not a complete solution, because source code may be nearly useless after (say) 50 years because the compilers are no longer readily available, or worse, the code runs only on some proprietary software that no longer exists. This concern applies even if the computation has been formalized in a proof assistant, since who knows if today's proof assistants will be around 50 years from now?



      One idea would be for professional societies such as the American Mathematical Society to develop a long-term archival plan, perhaps collaborating with government entities such as the Library of Congress.







      share|cite|improve this answer














      share|cite|improve this answer



      share|cite|improve this answer








      answered Nov 25 '18 at 20:16


























      community wiki





      Timothy Chow









      • 11




        Even if today's compilers die off in a few decades, there will be emulators and interpreters. Even better, there may be high level translators which will convert the source code to future source code automatically, so that the old ideas can be propagated. Gerhard "See, It's Really About Ideas" Paseman, 2018.11.25.
        – Gerhard Paseman
        Nov 25 '18 at 20:35






      • 3




        Yup - emulation has done a lot to mitigate this problem in the past years. You can even run an IBM PC, or C64 games inside your browser, for instance.
        – Federico Poloni
        Nov 25 '18 at 20:51








      • 9




        Emulators are only a partial solution. Suppose my code requires a specific version of Mathematica or CPLEX or even of Sage (which in turn might require a specific version of Python). Forget about 50 years in the future---I often have trouble running a colleague's code on my machine today.
        – Timothy Chow
        Nov 25 '18 at 21:21






      • 5




        @GerhardPaseman : The dream of automatic conversion to new formats is an old one and has already been shattered. The Library of Congress already has a ton of old electronic media that is effectively inaccessible and lacks the budget to deal with it. It's not just the software but the manpower to perform the conversions on a massive scale. I remember reading about how the LC had to borrow a machine from the Smithsonian to try to read some old electronic media.
        – Timothy Chow
        Nov 25 '18 at 21:25






      • 4




        @AndreiSmolensky : I believe that this is changing. To cite one example I know well: the proof of the q-TSPP conjecture by Kauers, Koutschan and Zeilberger relies crucially on some Mathematica computations. The authors make the Mathematica notebook available but there is still the problem that Mathematica is proprietary. And I am confident that the q-TSPP result will be of interest 50 years from now. By then there may be a shorter proof, but there is no guarantee of that. Or for a more famous example, what about the Kepler conjecture?
        – Timothy Chow
        Nov 25 '18 at 23:07
















      • 11




        Even if today's compilers die off in a few decades, there will be emulators and interpreters. Even better, there may be high level translators which will convert the source code to future source code automatically, so that the old ideas can be propagated. Gerhard "See, It's Really About Ideas" Paseman, 2018.11.25.
        – Gerhard Paseman
        Nov 25 '18 at 20:35






      • 3




        Yup - emulation has done a lot to mitigate this problem in the past years. You can even run an IBM PC, or C64 games inside your browser, for instance.
        – Federico Poloni
        Nov 25 '18 at 20:51








      • 9




        Emulators are only a partial solution. Suppose my code requires a specific version of Mathematica or CPLEX or even of Sage (which in turn might require a specific version of Python). Forget about 50 years in the future---I often have trouble running a colleague's code on my machine today.
        – Timothy Chow
        Nov 25 '18 at 21:21






      • 5




        @GerhardPaseman : The dream of automatic conversion to new formats is an old one and has already been shattered. The Library of Congress already has a ton of old electronic media that is effectively inaccessible and lacks the budget to deal with it. It's not just the software but the manpower to perform the conversions on a massive scale. I remember reading about how the LC had to borrow a machine from the Smithsonian to try to read some old electronic media.
        – Timothy Chow
        Nov 25 '18 at 21:25






      • 4




        @AndreiSmolensky : I believe that this is changing. To cite one example I know well: the proof of the q-TSPP conjecture by Kauers, Koutschan and Zeilberger relies crucially on some Mathematica computations. The authors make the Mathematica notebook available but there is still the problem that Mathematica is proprietary. And I am confident that the q-TSPP result will be of interest 50 years from now. By then there may be a shorter proof, but there is no guarantee of that. Or for a more famous example, what about the Kepler conjecture?
        – Timothy Chow
        Nov 25 '18 at 23:07










      11




      11




      Even if today's compilers die off in a few decades, there will be emulators and interpreters. Even better, there may be high level translators which will convert the source code to future source code automatically, so that the old ideas can be propagated. Gerhard "See, It's Really About Ideas" Paseman, 2018.11.25.
      – Gerhard Paseman
      Nov 25 '18 at 20:35




      Even if today's compilers die off in a few decades, there will be emulators and interpreters. Even better, there may be high level translators which will convert the source code to future source code automatically, so that the old ideas can be propagated. Gerhard "See, It's Really About Ideas" Paseman, 2018.11.25.
      – Gerhard Paseman
      Nov 25 '18 at 20:35




      3




      3




      Yup - emulation has done a lot to mitigate this problem in the past years. You can even run an IBM PC, or C64 games inside your browser, for instance.
      – Federico Poloni
      Nov 25 '18 at 20:51






      Yup - emulation has done a lot to mitigate this problem in the past years. You can even run an IBM PC, or C64 games inside your browser, for instance.
      – Federico Poloni
      Nov 25 '18 at 20:51






      9




      9




      Emulators are only a partial solution. Suppose my code requires a specific version of Mathematica or CPLEX or even of Sage (which in turn might require a specific version of Python). Forget about 50 years in the future---I often have trouble running a colleague's code on my machine today.
      – Timothy Chow
      Nov 25 '18 at 21:21




      Emulators are only a partial solution. Suppose my code requires a specific version of Mathematica or CPLEX or even of Sage (which in turn might require a specific version of Python). Forget about 50 years in the future---I often have trouble running a colleague's code on my machine today.
      – Timothy Chow
      Nov 25 '18 at 21:21




      5




      5




      @GerhardPaseman : The dream of automatic conversion to new formats is an old one and has already been shattered. The Library of Congress already has a ton of old electronic media that is effectively inaccessible and lacks the budget to deal with it. It's not just the software but the manpower to perform the conversions on a massive scale. I remember reading about how the LC had to borrow a machine from the Smithsonian to try to read some old electronic media.
      – Timothy Chow
      Nov 25 '18 at 21:25




      @GerhardPaseman : The dream of automatic conversion to new formats is an old one and has already been shattered. The Library of Congress already has a ton of old electronic media that is effectively inaccessible and lacks the budget to deal with it. It's not just the software but the manpower to perform the conversions on a massive scale. I remember reading about how the LC had to borrow a machine from the Smithsonian to try to read some old electronic media.
      – Timothy Chow
      Nov 25 '18 at 21:25




      4




      4




      @AndreiSmolensky : I believe that this is changing. To cite one example I know well: the proof of the q-TSPP conjecture by Kauers, Koutschan and Zeilberger relies crucially on some Mathematica computations. The authors make the Mathematica notebook available but there is still the problem that Mathematica is proprietary. And I am confident that the q-TSPP result will be of interest 50 years from now. By then there may be a shorter proof, but there is no guarantee of that. Or for a more famous example, what about the Kepler conjecture?
      – Timothy Chow
      Nov 25 '18 at 23:07






      @AndreiSmolensky : I believe that this is changing. To cite one example I know well: the proof of the q-TSPP conjecture by Kauers, Koutschan and Zeilberger relies crucially on some Mathematica computations. The authors make the Mathematica notebook available but there is still the problem that Mathematica is proprietary. And I am confident that the q-TSPP result will be of interest 50 years from now. By then there may be a shorter proof, but there is no guarantee of that. Or for a more famous example, what about the Kepler conjecture?
      – Timothy Chow
      Nov 25 '18 at 23:07













      3














      This is more of a very extended comment than a complete answer.



      I tend to find "should" questions boiling down as much to values as much as anything; "should" in order to achieve what?



      Let me suggest that we need to understand a few things:




      • The advantages

      • The disadvantages

      • Is there a real problem with reproducibility that needs fixed?

      • The cost of not doing so or of doing so halfheartedly.

      • The opportunity cost or motivational/ funding challenges

      • Variation between sub fields

      • Technical challenges, short and long term

      • Expectations or even standards

      • Cultural challenges


      I'll try to avoid repeating the observations from the existing answers and comments, but let me add some thoughts:




      • In software engineering, the process for shipping code is very different from the typical mathematical program. A key reason for this is quality, and of these correctness is the most important element. That is a value of overwhelming importance in any proof, so maybe open code and peer review would be a good thing.

      • Related to that: writing code to be read is different to writing code to just convince oneself; how are mathematicians to learn that?

      • There is a difference between learning enough about programming to get a result and the skills needed to write good tests, make code readable and convince readers that the code is valid. I'd ask, if you have not done that well, how do you expect credibility of your conclusions?

      • What is the penalty for coding errors as things stand? I would have thought that in maths, publishing results that are subsequently proven false would not do one's career any good. This compares interestingly to other fields in science where to some extent one expects many "results" in papers to be subsequently not borne out. Interesting to hear feedback on this one as to what happens in practice.

      • Do people feel that time spent publishing code would be unproductive?

      • A software engineering style code review is not anonymous (at least usually); is this a problem?

      • There is an argument to use "lowest common denominator" languages that might be old but that proves their longevity and wide accessibility; e.g. 'C'.

      • Timothy Chow noted use of notebooks; they provide a great way to document code and the overall approach; I can see these becoming more and more used. Interestingly, I think this might conflict with "lowest common denominator" languages, as the notebook hosting language (Jupiter or Mathematica) might have less longevity.






      share|cite|improve this answer




























        3














        This is more of a very extended comment than a complete answer.



        I tend to find "should" questions boiling down as much to values as much as anything; "should" in order to achieve what?



        Let me suggest that we need to understand a few things:




        • The advantages

        • The disadvantages

        • Is there a real problem with reproducibility that needs fixed?

        • The cost of not doing so or of doing so halfheartedly.

        • The opportunity cost or motivational/ funding challenges

        • Variation between sub fields

        • Technical challenges, short and long term

        • Expectations or even standards

        • Cultural challenges


        I'll try to avoid repeating the observations from the existing answers and comments, but let me add some thoughts:




        • In software engineering, the process for shipping code is very different from the typical mathematical program. A key reason for this is quality, and of these correctness is the most important element. That is a value of overwhelming importance in any proof, so maybe open code and peer review would be a good thing.

        • Related to that: writing code to be read is different to writing code to just convince oneself; how are mathematicians to learn that?

        • There is a difference between learning enough about programming to get a result and the skills needed to write good tests, make code readable and convince readers that the code is valid. I'd ask, if you have not done that well, how do you expect credibility of your conclusions?

        • What is the penalty for coding errors as things stand? I would have thought that in maths, publishing results that are subsequently proven false would not do one's career any good. This compares interestingly to other fields in science where to some extent one expects many "results" in papers to be subsequently not borne out. Interesting to hear feedback on this one as to what happens in practice.

        • Do people feel that time spent publishing code would be unproductive?

        • A software engineering style code review is not anonymous (at least usually); is this a problem?

        • There is an argument to use "lowest common denominator" languages that might be old but that proves their longevity and wide accessibility; e.g. 'C'.

        • Timothy Chow noted use of notebooks; they provide a great way to document code and the overall approach; I can see these becoming more and more used. Interestingly, I think this might conflict with "lowest common denominator" languages, as the notebook hosting language (Jupiter or Mathematica) might have less longevity.






        share|cite|improve this answer


























          3












          3








          3






          This is more of a very extended comment than a complete answer.



          I tend to find "should" questions boiling down as much to values as much as anything; "should" in order to achieve what?



          Let me suggest that we need to understand a few things:




          • The advantages

          • The disadvantages

          • Is there a real problem with reproducibility that needs fixed?

          • The cost of not doing so or of doing so halfheartedly.

          • The opportunity cost or motivational/ funding challenges

          • Variation between sub fields

          • Technical challenges, short and long term

          • Expectations or even standards

          • Cultural challenges


          I'll try to avoid repeating the observations from the existing answers and comments, but let me add some thoughts:




          • In software engineering, the process for shipping code is very different from the typical mathematical program. A key reason for this is quality, and of these correctness is the most important element. That is a value of overwhelming importance in any proof, so maybe open code and peer review would be a good thing.

          • Related to that: writing code to be read is different to writing code to just convince oneself; how are mathematicians to learn that?

          • There is a difference between learning enough about programming to get a result and the skills needed to write good tests, make code readable and convince readers that the code is valid. I'd ask, if you have not done that well, how do you expect credibility of your conclusions?

          • What is the penalty for coding errors as things stand? I would have thought that in maths, publishing results that are subsequently proven false would not do one's career any good. This compares interestingly to other fields in science where to some extent one expects many "results" in papers to be subsequently not borne out. Interesting to hear feedback on this one as to what happens in practice.

          • Do people feel that time spent publishing code would be unproductive?

          • A software engineering style code review is not anonymous (at least usually); is this a problem?

          • There is an argument to use "lowest common denominator" languages that might be old but that proves their longevity and wide accessibility; e.g. 'C'.

          • Timothy Chow noted use of notebooks; they provide a great way to document code and the overall approach; I can see these becoming more and more used. Interestingly, I think this might conflict with "lowest common denominator" languages, as the notebook hosting language (Jupiter or Mathematica) might have less longevity.






          share|cite|improve this answer














          This is more of a very extended comment than a complete answer.



          I tend to find "should" questions boiling down as much to values as much as anything; "should" in order to achieve what?



          Let me suggest that we need to understand a few things:




          • The advantages

          • The disadvantages

          • Is there a real problem with reproducibility that needs fixed?

          • The cost of not doing so or of doing so halfheartedly.

          • The opportunity cost or motivational/ funding challenges

          • Variation between sub fields

          • Technical challenges, short and long term

          • Expectations or even standards

          • Cultural challenges


          I'll try to avoid repeating the observations from the existing answers and comments, but let me add some thoughts:




          • In software engineering, the process for shipping code is very different from the typical mathematical program. A key reason for this is quality, and of these correctness is the most important element. That is a value of overwhelming importance in any proof, so maybe open code and peer review would be a good thing.

          • Related to that: writing code to be read is different to writing code to just convince oneself; how are mathematicians to learn that?

          • There is a difference between learning enough about programming to get a result and the skills needed to write good tests, make code readable and convince readers that the code is valid. I'd ask, if you have not done that well, how do you expect credibility of your conclusions?

          • What is the penalty for coding errors as things stand? I would have thought that in maths, publishing results that are subsequently proven false would not do one's career any good. This compares interestingly to other fields in science where to some extent one expects many "results" in papers to be subsequently not borne out. Interesting to hear feedback on this one as to what happens in practice.

          • Do people feel that time spent publishing code would be unproductive?

          • A software engineering style code review is not anonymous (at least usually); is this a problem?

          • There is an argument to use "lowest common denominator" languages that might be old but that proves their longevity and wide accessibility; e.g. 'C'.

          • Timothy Chow noted use of notebooks; they provide a great way to document code and the overall approach; I can see these becoming more and more used. Interestingly, I think this might conflict with "lowest common denominator" languages, as the notebook hosting language (Jupiter or Mathematica) might have less longevity.







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          answered Nov 25 '18 at 23:53


























          community wiki





          Keith
























              3














              There are some issues that are not emphasised enough in the previous comments and answers. Having the source code used by an author does not let you check that the author's theorems are correct. It only lets you check that the program does what the author claims. Transcribing the program output to the published paper is the step where an error is least likely to have occurred. Much more likely is an error in the program.



              So, can you eyeball the program to check if it is correct? Not unless it is a very short simple program. I publish articles that rely on tens of thousands of lines of code that took me and others months of hard work to write and debug. Your chances of looking at it and checking its correctness in a reasonable amount of time are next to zero. One day there will be programs that can check correctness for you; the beginnings exist today but generally useful checkers are still a long way off.



              So what to do? If you are an author, get a coauthor and aim for separately implemented programs that get the same result, hopefully using different methods. (An axiom of software engineering is that programmers solving the same problem using the same method tend to make the same mistakes.) Intermediate results are very useful for checking, especially when the final answer has low entropy (like "yes" or "empty set").



              Another fact is that problems which needed very tricky programming and bulk computer time 20 years ago can now be solved in a reasonable time using simpler programs. Presumably that trend will continue. Any computational result that is important enough will eventually be replicated independently without so much effort.






              share|cite|improve this answer




























                3














                There are some issues that are not emphasised enough in the previous comments and answers. Having the source code used by an author does not let you check that the author's theorems are correct. It only lets you check that the program does what the author claims. Transcribing the program output to the published paper is the step where an error is least likely to have occurred. Much more likely is an error in the program.



                So, can you eyeball the program to check if it is correct? Not unless it is a very short simple program. I publish articles that rely on tens of thousands of lines of code that took me and others months of hard work to write and debug. Your chances of looking at it and checking its correctness in a reasonable amount of time are next to zero. One day there will be programs that can check correctness for you; the beginnings exist today but generally useful checkers are still a long way off.



                So what to do? If you are an author, get a coauthor and aim for separately implemented programs that get the same result, hopefully using different methods. (An axiom of software engineering is that programmers solving the same problem using the same method tend to make the same mistakes.) Intermediate results are very useful for checking, especially when the final answer has low entropy (like "yes" or "empty set").



                Another fact is that problems which needed very tricky programming and bulk computer time 20 years ago can now be solved in a reasonable time using simpler programs. Presumably that trend will continue. Any computational result that is important enough will eventually be replicated independently without so much effort.






                share|cite|improve this answer


























                  3












                  3








                  3






                  There are some issues that are not emphasised enough in the previous comments and answers. Having the source code used by an author does not let you check that the author's theorems are correct. It only lets you check that the program does what the author claims. Transcribing the program output to the published paper is the step where an error is least likely to have occurred. Much more likely is an error in the program.



                  So, can you eyeball the program to check if it is correct? Not unless it is a very short simple program. I publish articles that rely on tens of thousands of lines of code that took me and others months of hard work to write and debug. Your chances of looking at it and checking its correctness in a reasonable amount of time are next to zero. One day there will be programs that can check correctness for you; the beginnings exist today but generally useful checkers are still a long way off.



                  So what to do? If you are an author, get a coauthor and aim for separately implemented programs that get the same result, hopefully using different methods. (An axiom of software engineering is that programmers solving the same problem using the same method tend to make the same mistakes.) Intermediate results are very useful for checking, especially when the final answer has low entropy (like "yes" or "empty set").



                  Another fact is that problems which needed very tricky programming and bulk computer time 20 years ago can now be solved in a reasonable time using simpler programs. Presumably that trend will continue. Any computational result that is important enough will eventually be replicated independently without so much effort.






                  share|cite|improve this answer














                  There are some issues that are not emphasised enough in the previous comments and answers. Having the source code used by an author does not let you check that the author's theorems are correct. It only lets you check that the program does what the author claims. Transcribing the program output to the published paper is the step where an error is least likely to have occurred. Much more likely is an error in the program.



                  So, can you eyeball the program to check if it is correct? Not unless it is a very short simple program. I publish articles that rely on tens of thousands of lines of code that took me and others months of hard work to write and debug. Your chances of looking at it and checking its correctness in a reasonable amount of time are next to zero. One day there will be programs that can check correctness for you; the beginnings exist today but generally useful checkers are still a long way off.



                  So what to do? If you are an author, get a coauthor and aim for separately implemented programs that get the same result, hopefully using different methods. (An axiom of software engineering is that programmers solving the same problem using the same method tend to make the same mistakes.) Intermediate results are very useful for checking, especially when the final answer has low entropy (like "yes" or "empty set").



                  Another fact is that problems which needed very tricky programming and bulk computer time 20 years ago can now be solved in a reasonable time using simpler programs. Presumably that trend will continue. Any computational result that is important enough will eventually be replicated independently without so much effort.







                  share|cite|improve this answer














                  share|cite|improve this answer



                  share|cite|improve this answer








                  answered Nov 26 '18 at 10:56


























                  community wiki





                  Brendan McKay































                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to MathOverflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathoverflow.net%2fquestions%2f316155%2fshould-computer-code-be-included-within-publications-that-present-numerical-resu%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Plaza Victoria

                      Puebla de Zaragoza

                      Change location of user folders through cmd or PowerShell?