Why does non-parametric bootstrap not return the same sample over and over again?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty{ margin-bottom:0;
}






up vote
6
down vote

favorite












Why does non-parametric bootstrap not return the same sample over and over again?



My notes write:



Assume data $X_1,...,X_n$.



Sample data with replacement to produce $X_1^{(p)},...,X_n^{(p)}$



Now since both are length $n$, then how does this not produce always the same sample? I'm missing something.










share|cite|improve this question




























    up vote
    6
    down vote

    favorite












    Why does non-parametric bootstrap not return the same sample over and over again?



    My notes write:



    Assume data $X_1,...,X_n$.



    Sample data with replacement to produce $X_1^{(p)},...,X_n^{(p)}$



    Now since both are length $n$, then how does this not produce always the same sample? I'm missing something.










    share|cite|improve this question
























      up vote
      6
      down vote

      favorite









      up vote
      6
      down vote

      favorite











      Why does non-parametric bootstrap not return the same sample over and over again?



      My notes write:



      Assume data $X_1,...,X_n$.



      Sample data with replacement to produce $X_1^{(p)},...,X_n^{(p)}$



      Now since both are length $n$, then how does this not produce always the same sample? I'm missing something.










      share|cite|improve this question













      Why does non-parametric bootstrap not return the same sample over and over again?



      My notes write:



      Assume data $X_1,...,X_n$.



      Sample data with replacement to produce $X_1^{(p)},...,X_n^{(p)}$



      Now since both are length $n$, then how does this not produce always the same sample? I'm missing something.







      bootstrap






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Nov 21 at 10:36









      mavavilj

      1,189724




      1,189724






















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          13
          down vote













          Each member of the bootstrap sample is selected randomly with replacement from the data set. If we were to sample without replacement, then every sample would simply be a re-ordering of the same data. But, as a consequence of replacement, the bootstrap samples differ in how many times they include each data point (which may be once, multiple times, or not at all). On average, ~63% of data points appear at least once in a given bootstrap sample.






          share|cite|improve this answer




























            up vote
            1
            down vote













            @user20160's explanation is fine. Here's an example of 10 bootstrap samples of the sequence from 1 to 5, showing that some values will be represented more than once and other values will not be represented (x <- 1:5; t(replicate(10,sort(sample(x,replace=TRUE)))))



                  [,1] [,2] [,3] [,4] [,5]
            [1,] 2 2 4 4 5
            [2,] 1 1 1 2 4
            [3,] 3 3 3 5 5
            [4,] 1 1 1 2 3
            [5,] 1 1 2 3 3
            [6,] 1 2 3 4 4
            [7,] 2 2 3 4 5
            [8,] 3 3 3 4 4
            [9,] 1 1 2 3 5
            [10,] 1 1 2 4 4





            share|cite|improve this answer




























              up vote
              0
              down vote













              Just to confirm the answers here, the key misunderstanding is the questioner believes there is no replacement in the sampling. Thus if there are 10 elements and 10 random sampling events and 2 replications, each replication is identical to the other without replacement. The number of random sampling events can never exceed the original sample size.



              However, with replacement the number of sampling events in theory could exceed the number of elements, thus the original sample size could increased to any given number. In practice however this would be erroneous because you would artificially lower the variance (which is a no no), the mean however would remain the same.



              Just to clarify, increasing the number of replications is the correct approach to stabilise both the mean and variance. I'll refrain from elaborating.



              Just to waffle, bootstrapping (nonparametric) is cool when you've no idea how to derrive the 95% confidence interval of the mean (sort the bootstrap and remove the upper and lower 2.5%). The technique has its critiques however.






              share|cite|improve this answer








              New contributor




              Michael G. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.


















                Your Answer





                StackExchange.ifUsing("editor", function () {
                return StackExchange.using("mathjaxEditing", function () {
                StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
                StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
                });
                });
                }, "mathjax-editing");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "65"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                convertImagesToLinks: false,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: null,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                 

                draft saved


                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f378091%2fwhy-does-non-parametric-bootstrap-not-return-the-same-sample-over-and-over-again%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes








                up vote
                13
                down vote













                Each member of the bootstrap sample is selected randomly with replacement from the data set. If we were to sample without replacement, then every sample would simply be a re-ordering of the same data. But, as a consequence of replacement, the bootstrap samples differ in how many times they include each data point (which may be once, multiple times, or not at all). On average, ~63% of data points appear at least once in a given bootstrap sample.






                share|cite|improve this answer

























                  up vote
                  13
                  down vote













                  Each member of the bootstrap sample is selected randomly with replacement from the data set. If we were to sample without replacement, then every sample would simply be a re-ordering of the same data. But, as a consequence of replacement, the bootstrap samples differ in how many times they include each data point (which may be once, multiple times, or not at all). On average, ~63% of data points appear at least once in a given bootstrap sample.






                  share|cite|improve this answer























                    up vote
                    13
                    down vote










                    up vote
                    13
                    down vote









                    Each member of the bootstrap sample is selected randomly with replacement from the data set. If we were to sample without replacement, then every sample would simply be a re-ordering of the same data. But, as a consequence of replacement, the bootstrap samples differ in how many times they include each data point (which may be once, multiple times, or not at all). On average, ~63% of data points appear at least once in a given bootstrap sample.






                    share|cite|improve this answer












                    Each member of the bootstrap sample is selected randomly with replacement from the data set. If we were to sample without replacement, then every sample would simply be a re-ordering of the same data. But, as a consequence of replacement, the bootstrap samples differ in how many times they include each data point (which may be once, multiple times, or not at all). On average, ~63% of data points appear at least once in a given bootstrap sample.







                    share|cite|improve this answer












                    share|cite|improve this answer



                    share|cite|improve this answer










                    answered Nov 21 at 12:05









                    user20160

                    15.5k12555




                    15.5k12555
























                        up vote
                        1
                        down vote













                        @user20160's explanation is fine. Here's an example of 10 bootstrap samples of the sequence from 1 to 5, showing that some values will be represented more than once and other values will not be represented (x <- 1:5; t(replicate(10,sort(sample(x,replace=TRUE)))))



                              [,1] [,2] [,3] [,4] [,5]
                        [1,] 2 2 4 4 5
                        [2,] 1 1 1 2 4
                        [3,] 3 3 3 5 5
                        [4,] 1 1 1 2 3
                        [5,] 1 1 2 3 3
                        [6,] 1 2 3 4 4
                        [7,] 2 2 3 4 5
                        [8,] 3 3 3 4 4
                        [9,] 1 1 2 3 5
                        [10,] 1 1 2 4 4





                        share|cite|improve this answer

























                          up vote
                          1
                          down vote













                          @user20160's explanation is fine. Here's an example of 10 bootstrap samples of the sequence from 1 to 5, showing that some values will be represented more than once and other values will not be represented (x <- 1:5; t(replicate(10,sort(sample(x,replace=TRUE)))))



                                [,1] [,2] [,3] [,4] [,5]
                          [1,] 2 2 4 4 5
                          [2,] 1 1 1 2 4
                          [3,] 3 3 3 5 5
                          [4,] 1 1 1 2 3
                          [5,] 1 1 2 3 3
                          [6,] 1 2 3 4 4
                          [7,] 2 2 3 4 5
                          [8,] 3 3 3 4 4
                          [9,] 1 1 2 3 5
                          [10,] 1 1 2 4 4





                          share|cite|improve this answer























                            up vote
                            1
                            down vote










                            up vote
                            1
                            down vote









                            @user20160's explanation is fine. Here's an example of 10 bootstrap samples of the sequence from 1 to 5, showing that some values will be represented more than once and other values will not be represented (x <- 1:5; t(replicate(10,sort(sample(x,replace=TRUE)))))



                                  [,1] [,2] [,3] [,4] [,5]
                            [1,] 2 2 4 4 5
                            [2,] 1 1 1 2 4
                            [3,] 3 3 3 5 5
                            [4,] 1 1 1 2 3
                            [5,] 1 1 2 3 3
                            [6,] 1 2 3 4 4
                            [7,] 2 2 3 4 5
                            [8,] 3 3 3 4 4
                            [9,] 1 1 2 3 5
                            [10,] 1 1 2 4 4





                            share|cite|improve this answer












                            @user20160's explanation is fine. Here's an example of 10 bootstrap samples of the sequence from 1 to 5, showing that some values will be represented more than once and other values will not be represented (x <- 1:5; t(replicate(10,sort(sample(x,replace=TRUE)))))



                                  [,1] [,2] [,3] [,4] [,5]
                            [1,] 2 2 4 4 5
                            [2,] 1 1 1 2 4
                            [3,] 3 3 3 5 5
                            [4,] 1 1 1 2 3
                            [5,] 1 1 2 3 3
                            [6,] 1 2 3 4 4
                            [7,] 2 2 3 4 5
                            [8,] 3 3 3 4 4
                            [9,] 1 1 2 3 5
                            [10,] 1 1 2 4 4






                            share|cite|improve this answer












                            share|cite|improve this answer



                            share|cite|improve this answer










                            answered Nov 21 at 17:42









                            Ben Bolker

                            21.8k15987




                            21.8k15987






















                                up vote
                                0
                                down vote













                                Just to confirm the answers here, the key misunderstanding is the questioner believes there is no replacement in the sampling. Thus if there are 10 elements and 10 random sampling events and 2 replications, each replication is identical to the other without replacement. The number of random sampling events can never exceed the original sample size.



                                However, with replacement the number of sampling events in theory could exceed the number of elements, thus the original sample size could increased to any given number. In practice however this would be erroneous because you would artificially lower the variance (which is a no no), the mean however would remain the same.



                                Just to clarify, increasing the number of replications is the correct approach to stabilise both the mean and variance. I'll refrain from elaborating.



                                Just to waffle, bootstrapping (nonparametric) is cool when you've no idea how to derrive the 95% confidence interval of the mean (sort the bootstrap and remove the upper and lower 2.5%). The technique has its critiques however.






                                share|cite|improve this answer








                                New contributor




                                Michael G. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                Check out our Code of Conduct.






















                                  up vote
                                  0
                                  down vote













                                  Just to confirm the answers here, the key misunderstanding is the questioner believes there is no replacement in the sampling. Thus if there are 10 elements and 10 random sampling events and 2 replications, each replication is identical to the other without replacement. The number of random sampling events can never exceed the original sample size.



                                  However, with replacement the number of sampling events in theory could exceed the number of elements, thus the original sample size could increased to any given number. In practice however this would be erroneous because you would artificially lower the variance (which is a no no), the mean however would remain the same.



                                  Just to clarify, increasing the number of replications is the correct approach to stabilise both the mean and variance. I'll refrain from elaborating.



                                  Just to waffle, bootstrapping (nonparametric) is cool when you've no idea how to derrive the 95% confidence interval of the mean (sort the bootstrap and remove the upper and lower 2.5%). The technique has its critiques however.






                                  share|cite|improve this answer








                                  New contributor




                                  Michael G. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                  Check out our Code of Conduct.




















                                    up vote
                                    0
                                    down vote










                                    up vote
                                    0
                                    down vote









                                    Just to confirm the answers here, the key misunderstanding is the questioner believes there is no replacement in the sampling. Thus if there are 10 elements and 10 random sampling events and 2 replications, each replication is identical to the other without replacement. The number of random sampling events can never exceed the original sample size.



                                    However, with replacement the number of sampling events in theory could exceed the number of elements, thus the original sample size could increased to any given number. In practice however this would be erroneous because you would artificially lower the variance (which is a no no), the mean however would remain the same.



                                    Just to clarify, increasing the number of replications is the correct approach to stabilise both the mean and variance. I'll refrain from elaborating.



                                    Just to waffle, bootstrapping (nonparametric) is cool when you've no idea how to derrive the 95% confidence interval of the mean (sort the bootstrap and remove the upper and lower 2.5%). The technique has its critiques however.






                                    share|cite|improve this answer








                                    New contributor




                                    Michael G. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.









                                    Just to confirm the answers here, the key misunderstanding is the questioner believes there is no replacement in the sampling. Thus if there are 10 elements and 10 random sampling events and 2 replications, each replication is identical to the other without replacement. The number of random sampling events can never exceed the original sample size.



                                    However, with replacement the number of sampling events in theory could exceed the number of elements, thus the original sample size could increased to any given number. In practice however this would be erroneous because you would artificially lower the variance (which is a no no), the mean however would remain the same.



                                    Just to clarify, increasing the number of replications is the correct approach to stabilise both the mean and variance. I'll refrain from elaborating.



                                    Just to waffle, bootstrapping (nonparametric) is cool when you've no idea how to derrive the 95% confidence interval of the mean (sort the bootstrap and remove the upper and lower 2.5%). The technique has its critiques however.







                                    share|cite|improve this answer








                                    New contributor




                                    Michael G. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.









                                    share|cite|improve this answer



                                    share|cite|improve this answer






                                    New contributor




                                    Michael G. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.









                                    answered Nov 22 at 0:59









                                    Michael G.

                                    1011




                                    1011




                                    New contributor




                                    Michael G. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.





                                    New contributor





                                    Michael G. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.






                                    Michael G. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                    Check out our Code of Conduct.






























                                         

                                        draft saved


                                        draft discarded



















































                                         


                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f378091%2fwhy-does-non-parametric-bootstrap-not-return-the-same-sample-over-and-over-again%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Plaza Victoria

                                        In PowerPoint, is there a keyboard shortcut for bulleted / numbered list?

                                        How to put 3 figures in Latex with 2 figures side by side and 1 below these side by side images but in...