Allowing multiple simultaneous rsyncs to play nice together












1















Each day I need to copy N files from a source location to a mirror at a specific time (where N is very large). Let's say I tell multiple CPUs to each run an rsync simultaneously on a subset of the files (network and disk bandwidth are not an issue). Ideally each CPU would be responsible for a disjoint subset of the N files, but in practice this is sometimes hard to guarantee. (Some of the source files might be "claimed" by more than one CPU.) As a result, sometimes rsync I and rsync J will both try to copy file F at the same time.



Using rsync -avz --delete --temp-dir=/tmp remote:/path/to/source/ /path/to/dest/, let's say rsyncs I and J both see this situation to start:



/path/to/source/:
FileA
FileB
FileC

/path/to/dest/:
FileA


Each rsync thinks it needs to copy files B and C, and each one starts doing so, first to /tmp/name_of_source_file.temp_suffix. Let's say I finishes first and moves its temporary file to /path/to/dest/FileB. Now the situation is:



/path/to/dest/:
FileA
FileB

/tmp/:
FileB.rsyncJsuffix


Now rsync J finishes copying but generates an error when it tries to move its version of FileB to /path/to/dest/ because there's already another FileB there that it didn't see when it started.



Does one of rsync's many options somehow handle this situation? Ideally I'd like an option that tells rsync, "Believe in yourself. You can do no wrong. Feel free to overwrite anything your little heart desires." so that it wouldn't complain about the FileB that has suddenly appeared mid-execution.



Thoughts?










share|improve this question













migrated from stackoverflow.com Jun 2 '11 at 3:48


This question came from our site for professional and enthusiast programmers.























    1















    Each day I need to copy N files from a source location to a mirror at a specific time (where N is very large). Let's say I tell multiple CPUs to each run an rsync simultaneously on a subset of the files (network and disk bandwidth are not an issue). Ideally each CPU would be responsible for a disjoint subset of the N files, but in practice this is sometimes hard to guarantee. (Some of the source files might be "claimed" by more than one CPU.) As a result, sometimes rsync I and rsync J will both try to copy file F at the same time.



    Using rsync -avz --delete --temp-dir=/tmp remote:/path/to/source/ /path/to/dest/, let's say rsyncs I and J both see this situation to start:



    /path/to/source/:
    FileA
    FileB
    FileC

    /path/to/dest/:
    FileA


    Each rsync thinks it needs to copy files B and C, and each one starts doing so, first to /tmp/name_of_source_file.temp_suffix. Let's say I finishes first and moves its temporary file to /path/to/dest/FileB. Now the situation is:



    /path/to/dest/:
    FileA
    FileB

    /tmp/:
    FileB.rsyncJsuffix


    Now rsync J finishes copying but generates an error when it tries to move its version of FileB to /path/to/dest/ because there's already another FileB there that it didn't see when it started.



    Does one of rsync's many options somehow handle this situation? Ideally I'd like an option that tells rsync, "Believe in yourself. You can do no wrong. Feel free to overwrite anything your little heart desires." so that it wouldn't complain about the FileB that has suddenly appeared mid-execution.



    Thoughts?










    share|improve this question













    migrated from stackoverflow.com Jun 2 '11 at 3:48


    This question came from our site for professional and enthusiast programmers.





















      1












      1








      1


      2






      Each day I need to copy N files from a source location to a mirror at a specific time (where N is very large). Let's say I tell multiple CPUs to each run an rsync simultaneously on a subset of the files (network and disk bandwidth are not an issue). Ideally each CPU would be responsible for a disjoint subset of the N files, but in practice this is sometimes hard to guarantee. (Some of the source files might be "claimed" by more than one CPU.) As a result, sometimes rsync I and rsync J will both try to copy file F at the same time.



      Using rsync -avz --delete --temp-dir=/tmp remote:/path/to/source/ /path/to/dest/, let's say rsyncs I and J both see this situation to start:



      /path/to/source/:
      FileA
      FileB
      FileC

      /path/to/dest/:
      FileA


      Each rsync thinks it needs to copy files B and C, and each one starts doing so, first to /tmp/name_of_source_file.temp_suffix. Let's say I finishes first and moves its temporary file to /path/to/dest/FileB. Now the situation is:



      /path/to/dest/:
      FileA
      FileB

      /tmp/:
      FileB.rsyncJsuffix


      Now rsync J finishes copying but generates an error when it tries to move its version of FileB to /path/to/dest/ because there's already another FileB there that it didn't see when it started.



      Does one of rsync's many options somehow handle this situation? Ideally I'd like an option that tells rsync, "Believe in yourself. You can do no wrong. Feel free to overwrite anything your little heart desires." so that it wouldn't complain about the FileB that has suddenly appeared mid-execution.



      Thoughts?










      share|improve this question














      Each day I need to copy N files from a source location to a mirror at a specific time (where N is very large). Let's say I tell multiple CPUs to each run an rsync simultaneously on a subset of the files (network and disk bandwidth are not an issue). Ideally each CPU would be responsible for a disjoint subset of the N files, but in practice this is sometimes hard to guarantee. (Some of the source files might be "claimed" by more than one CPU.) As a result, sometimes rsync I and rsync J will both try to copy file F at the same time.



      Using rsync -avz --delete --temp-dir=/tmp remote:/path/to/source/ /path/to/dest/, let's say rsyncs I and J both see this situation to start:



      /path/to/source/:
      FileA
      FileB
      FileC

      /path/to/dest/:
      FileA


      Each rsync thinks it needs to copy files B and C, and each one starts doing so, first to /tmp/name_of_source_file.temp_suffix. Let's say I finishes first and moves its temporary file to /path/to/dest/FileB. Now the situation is:



      /path/to/dest/:
      FileA
      FileB

      /tmp/:
      FileB.rsyncJsuffix


      Now rsync J finishes copying but generates an error when it tries to move its version of FileB to /path/to/dest/ because there's already another FileB there that it didn't see when it started.



      Does one of rsync's many options somehow handle this situation? Ideally I'd like an option that tells rsync, "Believe in yourself. You can do no wrong. Feel free to overwrite anything your little heart desires." so that it wouldn't complain about the FileB that has suddenly appeared mid-execution.



      Thoughts?







      linux rsync






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Jun 1 '11 at 21:38









      dg99dg99

      475310




      475310




      migrated from stackoverflow.com Jun 2 '11 at 3:48


      This question came from our site for professional and enthusiast programmers.









      migrated from stackoverflow.com Jun 2 '11 at 3:48


      This question came from our site for professional and enthusiast programmers.
























          2 Answers
          2






          active

          oldest

          votes


















          0














          I don't know why you are running rsyncs de way you are running them, but if I were you I'd seriously consider other ways to solve the problem that doesn't involve having multiple rsyncs writing to the same file tree at the same time.



          This is from the rsync man page in the --temp-dir section:




          If you are using this option for reasons other than a shortage
          of disk space, you may wish to combine it with the --delay-
          updates option, which will ensure that all copied files get put
          into subdirectories in the destination hierarchy, awaiting the
          end of the transfer. If you donât have enough room to duplicate
          all the arriving files on the destination partition, another way
          to tell rsync that you arenât overly concerned about disk space
          is to use the --partial-dir option with a relative path; because
          this tells rsync that it is OK to stash off a copy of a single
          file in a subdir in the destination hierarchy, rsync will use
          the partial-dir as a staging area to bring over the copied file,
          and then rename it into place from there. (Specifying a --par-
          tial-dir with an absolute path does not have this side-effect.)







          share|improve this answer
























          • Thanks for the note. Unfortunately waiting until the completion of the rsync before moving the files to their final destination still doesn't resolve the problem, since it's two separate rsyncs that are competing, and one will finish first for each file. I do recognize that this is a strange use of rsync, but unfortunately necessary in the environment I'm using ...

            – dg99
            Jun 2 '11 at 14:15





















          -1














          Given you have some directory structure with some empty dirs, and some files and you want its archival copy -- what I would try is to run rsync with parallel:




          1. recreate same directory structure



          find /source/dir -type f|parallel mkdir -p dest/dir/{//}





          1. rsync files:



          find /source/dir -type f|parallel rsync -a {} /dest/dir/{}





          1. then run one rsync to get empty dirs and make sure all is good



          rsync -av /source/dir /dest/dir







          share|improve this answer


























          • This proposes an alternate approach towards the goal, rather than answering the question asked. Thanks, anyway.

            – dg99
            Jan 10 '15 at 0:29











          • "Does one of rsync's many options somehow handle this situation?" No to my knowledge ;) May be you are looking for some tool like moo.nac.uci.edu/~hjm/parsync

            – dimus
            Jan 12 '15 at 16:23











          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "3"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f291885%2fallowing-multiple-simultaneous-rsyncs-to-play-nice-together%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          I don't know why you are running rsyncs de way you are running them, but if I were you I'd seriously consider other ways to solve the problem that doesn't involve having multiple rsyncs writing to the same file tree at the same time.



          This is from the rsync man page in the --temp-dir section:




          If you are using this option for reasons other than a shortage
          of disk space, you may wish to combine it with the --delay-
          updates option, which will ensure that all copied files get put
          into subdirectories in the destination hierarchy, awaiting the
          end of the transfer. If you donât have enough room to duplicate
          all the arriving files on the destination partition, another way
          to tell rsync that you arenât overly concerned about disk space
          is to use the --partial-dir option with a relative path; because
          this tells rsync that it is OK to stash off a copy of a single
          file in a subdir in the destination hierarchy, rsync will use
          the partial-dir as a staging area to bring over the copied file,
          and then rename it into place from there. (Specifying a --par-
          tial-dir with an absolute path does not have this side-effect.)







          share|improve this answer
























          • Thanks for the note. Unfortunately waiting until the completion of the rsync before moving the files to their final destination still doesn't resolve the problem, since it's two separate rsyncs that are competing, and one will finish first for each file. I do recognize that this is a strange use of rsync, but unfortunately necessary in the environment I'm using ...

            – dg99
            Jun 2 '11 at 14:15


















          0














          I don't know why you are running rsyncs de way you are running them, but if I were you I'd seriously consider other ways to solve the problem that doesn't involve having multiple rsyncs writing to the same file tree at the same time.



          This is from the rsync man page in the --temp-dir section:




          If you are using this option for reasons other than a shortage
          of disk space, you may wish to combine it with the --delay-
          updates option, which will ensure that all copied files get put
          into subdirectories in the destination hierarchy, awaiting the
          end of the transfer. If you donât have enough room to duplicate
          all the arriving files on the destination partition, another way
          to tell rsync that you arenât overly concerned about disk space
          is to use the --partial-dir option with a relative path; because
          this tells rsync that it is OK to stash off a copy of a single
          file in a subdir in the destination hierarchy, rsync will use
          the partial-dir as a staging area to bring over the copied file,
          and then rename it into place from there. (Specifying a --par-
          tial-dir with an absolute path does not have this side-effect.)







          share|improve this answer
























          • Thanks for the note. Unfortunately waiting until the completion of the rsync before moving the files to their final destination still doesn't resolve the problem, since it's two separate rsyncs that are competing, and one will finish first for each file. I do recognize that this is a strange use of rsync, but unfortunately necessary in the environment I'm using ...

            – dg99
            Jun 2 '11 at 14:15
















          0












          0








          0







          I don't know why you are running rsyncs de way you are running them, but if I were you I'd seriously consider other ways to solve the problem that doesn't involve having multiple rsyncs writing to the same file tree at the same time.



          This is from the rsync man page in the --temp-dir section:




          If you are using this option for reasons other than a shortage
          of disk space, you may wish to combine it with the --delay-
          updates option, which will ensure that all copied files get put
          into subdirectories in the destination hierarchy, awaiting the
          end of the transfer. If you donât have enough room to duplicate
          all the arriving files on the destination partition, another way
          to tell rsync that you arenât overly concerned about disk space
          is to use the --partial-dir option with a relative path; because
          this tells rsync that it is OK to stash off a copy of a single
          file in a subdir in the destination hierarchy, rsync will use
          the partial-dir as a staging area to bring over the copied file,
          and then rename it into place from there. (Specifying a --par-
          tial-dir with an absolute path does not have this side-effect.)







          share|improve this answer













          I don't know why you are running rsyncs de way you are running them, but if I were you I'd seriously consider other ways to solve the problem that doesn't involve having multiple rsyncs writing to the same file tree at the same time.



          This is from the rsync man page in the --temp-dir section:




          If you are using this option for reasons other than a shortage
          of disk space, you may wish to combine it with the --delay-
          updates option, which will ensure that all copied files get put
          into subdirectories in the destination hierarchy, awaiting the
          end of the transfer. If you donât have enough room to duplicate
          all the arriving files on the destination partition, another way
          to tell rsync that you arenât overly concerned about disk space
          is to use the --partial-dir option with a relative path; because
          this tells rsync that it is OK to stash off a copy of a single
          file in a subdir in the destination hierarchy, rsync will use
          the partial-dir as a staging area to bring over the copied file,
          and then rename it into place from there. (Specifying a --par-
          tial-dir with an absolute path does not have this side-effect.)








          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jun 1 '11 at 22:49







          user780279




















          • Thanks for the note. Unfortunately waiting until the completion of the rsync before moving the files to their final destination still doesn't resolve the problem, since it's two separate rsyncs that are competing, and one will finish first for each file. I do recognize that this is a strange use of rsync, but unfortunately necessary in the environment I'm using ...

            – dg99
            Jun 2 '11 at 14:15





















          • Thanks for the note. Unfortunately waiting until the completion of the rsync before moving the files to their final destination still doesn't resolve the problem, since it's two separate rsyncs that are competing, and one will finish first for each file. I do recognize that this is a strange use of rsync, but unfortunately necessary in the environment I'm using ...

            – dg99
            Jun 2 '11 at 14:15



















          Thanks for the note. Unfortunately waiting until the completion of the rsync before moving the files to their final destination still doesn't resolve the problem, since it's two separate rsyncs that are competing, and one will finish first for each file. I do recognize that this is a strange use of rsync, but unfortunately necessary in the environment I'm using ...

          – dg99
          Jun 2 '11 at 14:15







          Thanks for the note. Unfortunately waiting until the completion of the rsync before moving the files to their final destination still doesn't resolve the problem, since it's two separate rsyncs that are competing, and one will finish first for each file. I do recognize that this is a strange use of rsync, but unfortunately necessary in the environment I'm using ...

          – dg99
          Jun 2 '11 at 14:15















          -1














          Given you have some directory structure with some empty dirs, and some files and you want its archival copy -- what I would try is to run rsync with parallel:




          1. recreate same directory structure



          find /source/dir -type f|parallel mkdir -p dest/dir/{//}





          1. rsync files:



          find /source/dir -type f|parallel rsync -a {} /dest/dir/{}





          1. then run one rsync to get empty dirs and make sure all is good



          rsync -av /source/dir /dest/dir







          share|improve this answer


























          • This proposes an alternate approach towards the goal, rather than answering the question asked. Thanks, anyway.

            – dg99
            Jan 10 '15 at 0:29











          • "Does one of rsync's many options somehow handle this situation?" No to my knowledge ;) May be you are looking for some tool like moo.nac.uci.edu/~hjm/parsync

            – dimus
            Jan 12 '15 at 16:23
















          -1














          Given you have some directory structure with some empty dirs, and some files and you want its archival copy -- what I would try is to run rsync with parallel:




          1. recreate same directory structure



          find /source/dir -type f|parallel mkdir -p dest/dir/{//}





          1. rsync files:



          find /source/dir -type f|parallel rsync -a {} /dest/dir/{}





          1. then run one rsync to get empty dirs and make sure all is good



          rsync -av /source/dir /dest/dir







          share|improve this answer


























          • This proposes an alternate approach towards the goal, rather than answering the question asked. Thanks, anyway.

            – dg99
            Jan 10 '15 at 0:29











          • "Does one of rsync's many options somehow handle this situation?" No to my knowledge ;) May be you are looking for some tool like moo.nac.uci.edu/~hjm/parsync

            – dimus
            Jan 12 '15 at 16:23














          -1












          -1








          -1







          Given you have some directory structure with some empty dirs, and some files and you want its archival copy -- what I would try is to run rsync with parallel:




          1. recreate same directory structure



          find /source/dir -type f|parallel mkdir -p dest/dir/{//}





          1. rsync files:



          find /source/dir -type f|parallel rsync -a {} /dest/dir/{}





          1. then run one rsync to get empty dirs and make sure all is good



          rsync -av /source/dir /dest/dir







          share|improve this answer















          Given you have some directory structure with some empty dirs, and some files and you want its archival copy -- what I would try is to run rsync with parallel:




          1. recreate same directory structure



          find /source/dir -type f|parallel mkdir -p dest/dir/{//}





          1. rsync files:



          find /source/dir -type f|parallel rsync -a {} /dest/dir/{}





          1. then run one rsync to get empty dirs and make sure all is good



          rsync -av /source/dir /dest/dir








          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 10 '15 at 1:40

























          answered Jan 9 '15 at 21:47









          dimusdimus

          26929




          26929













          • This proposes an alternate approach towards the goal, rather than answering the question asked. Thanks, anyway.

            – dg99
            Jan 10 '15 at 0:29











          • "Does one of rsync's many options somehow handle this situation?" No to my knowledge ;) May be you are looking for some tool like moo.nac.uci.edu/~hjm/parsync

            – dimus
            Jan 12 '15 at 16:23



















          • This proposes an alternate approach towards the goal, rather than answering the question asked. Thanks, anyway.

            – dg99
            Jan 10 '15 at 0:29











          • "Does one of rsync's many options somehow handle this situation?" No to my knowledge ;) May be you are looking for some tool like moo.nac.uci.edu/~hjm/parsync

            – dimus
            Jan 12 '15 at 16:23

















          This proposes an alternate approach towards the goal, rather than answering the question asked. Thanks, anyway.

          – dg99
          Jan 10 '15 at 0:29





          This proposes an alternate approach towards the goal, rather than answering the question asked. Thanks, anyway.

          – dg99
          Jan 10 '15 at 0:29













          "Does one of rsync's many options somehow handle this situation?" No to my knowledge ;) May be you are looking for some tool like moo.nac.uci.edu/~hjm/parsync

          – dimus
          Jan 12 '15 at 16:23





          "Does one of rsync's many options somehow handle this situation?" No to my knowledge ;) May be you are looking for some tool like moo.nac.uci.edu/~hjm/parsync

          – dimus
          Jan 12 '15 at 16:23


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Super User!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f291885%2fallowing-multiple-simultaneous-rsyncs-to-play-nice-together%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Plaza Victoria

          In PowerPoint, is there a keyboard shortcut for bulleted / numbered list?

          How to put 3 figures in Latex with 2 figures side by side and 1 below these side by side images but in...