Allowing multiple simultaneous rsyncs to play nice together
Each day I need to copy N files from a source location to a mirror at a specific time (where N is very large). Let's say I tell multiple CPUs to each run an rsync simultaneously on a subset of the files (network and disk bandwidth are not an issue). Ideally each CPU would be responsible for a disjoint subset of the N files, but in practice this is sometimes hard to guarantee. (Some of the source files might be "claimed" by more than one CPU.) As a result, sometimes rsync I and rsync J will both try to copy file F at the same time.
Using rsync -avz --delete --temp-dir=/tmp remote:/path/to/source/ /path/to/dest/
, let's say rsyncs I and J both see this situation to start:
/path/to/source/:
FileA
FileB
FileC
/path/to/dest/:
FileA
Each rsync thinks it needs to copy files B and C, and each one starts doing so, first to /tmp/name_of_source_file.temp_suffix
. Let's say I finishes first and moves its temporary file to /path/to/dest/FileB
. Now the situation is:
/path/to/dest/:
FileA
FileB
/tmp/:
FileB.rsyncJsuffix
Now rsync J finishes copying but generates an error when it tries to move its version of FileB to /path/to/dest/
because there's already another FileB there that it didn't see when it started.
Does one of rsync's many options somehow handle this situation? Ideally I'd like an option that tells rsync, "Believe in yourself. You can do no wrong. Feel free to overwrite anything your little heart desires." so that it wouldn't complain about the FileB that has suddenly appeared mid-execution.
Thoughts?
linux rsync
migrated from stackoverflow.com Jun 2 '11 at 3:48
This question came from our site for professional and enthusiast programmers.
add a comment |
Each day I need to copy N files from a source location to a mirror at a specific time (where N is very large). Let's say I tell multiple CPUs to each run an rsync simultaneously on a subset of the files (network and disk bandwidth are not an issue). Ideally each CPU would be responsible for a disjoint subset of the N files, but in practice this is sometimes hard to guarantee. (Some of the source files might be "claimed" by more than one CPU.) As a result, sometimes rsync I and rsync J will both try to copy file F at the same time.
Using rsync -avz --delete --temp-dir=/tmp remote:/path/to/source/ /path/to/dest/
, let's say rsyncs I and J both see this situation to start:
/path/to/source/:
FileA
FileB
FileC
/path/to/dest/:
FileA
Each rsync thinks it needs to copy files B and C, and each one starts doing so, first to /tmp/name_of_source_file.temp_suffix
. Let's say I finishes first and moves its temporary file to /path/to/dest/FileB
. Now the situation is:
/path/to/dest/:
FileA
FileB
/tmp/:
FileB.rsyncJsuffix
Now rsync J finishes copying but generates an error when it tries to move its version of FileB to /path/to/dest/
because there's already another FileB there that it didn't see when it started.
Does one of rsync's many options somehow handle this situation? Ideally I'd like an option that tells rsync, "Believe in yourself. You can do no wrong. Feel free to overwrite anything your little heart desires." so that it wouldn't complain about the FileB that has suddenly appeared mid-execution.
Thoughts?
linux rsync
migrated from stackoverflow.com Jun 2 '11 at 3:48
This question came from our site for professional and enthusiast programmers.
add a comment |
Each day I need to copy N files from a source location to a mirror at a specific time (where N is very large). Let's say I tell multiple CPUs to each run an rsync simultaneously on a subset of the files (network and disk bandwidth are not an issue). Ideally each CPU would be responsible for a disjoint subset of the N files, but in practice this is sometimes hard to guarantee. (Some of the source files might be "claimed" by more than one CPU.) As a result, sometimes rsync I and rsync J will both try to copy file F at the same time.
Using rsync -avz --delete --temp-dir=/tmp remote:/path/to/source/ /path/to/dest/
, let's say rsyncs I and J both see this situation to start:
/path/to/source/:
FileA
FileB
FileC
/path/to/dest/:
FileA
Each rsync thinks it needs to copy files B and C, and each one starts doing so, first to /tmp/name_of_source_file.temp_suffix
. Let's say I finishes first and moves its temporary file to /path/to/dest/FileB
. Now the situation is:
/path/to/dest/:
FileA
FileB
/tmp/:
FileB.rsyncJsuffix
Now rsync J finishes copying but generates an error when it tries to move its version of FileB to /path/to/dest/
because there's already another FileB there that it didn't see when it started.
Does one of rsync's many options somehow handle this situation? Ideally I'd like an option that tells rsync, "Believe in yourself. You can do no wrong. Feel free to overwrite anything your little heart desires." so that it wouldn't complain about the FileB that has suddenly appeared mid-execution.
Thoughts?
linux rsync
Each day I need to copy N files from a source location to a mirror at a specific time (where N is very large). Let's say I tell multiple CPUs to each run an rsync simultaneously on a subset of the files (network and disk bandwidth are not an issue). Ideally each CPU would be responsible for a disjoint subset of the N files, but in practice this is sometimes hard to guarantee. (Some of the source files might be "claimed" by more than one CPU.) As a result, sometimes rsync I and rsync J will both try to copy file F at the same time.
Using rsync -avz --delete --temp-dir=/tmp remote:/path/to/source/ /path/to/dest/
, let's say rsyncs I and J both see this situation to start:
/path/to/source/:
FileA
FileB
FileC
/path/to/dest/:
FileA
Each rsync thinks it needs to copy files B and C, and each one starts doing so, first to /tmp/name_of_source_file.temp_suffix
. Let's say I finishes first and moves its temporary file to /path/to/dest/FileB
. Now the situation is:
/path/to/dest/:
FileA
FileB
/tmp/:
FileB.rsyncJsuffix
Now rsync J finishes copying but generates an error when it tries to move its version of FileB to /path/to/dest/
because there's already another FileB there that it didn't see when it started.
Does one of rsync's many options somehow handle this situation? Ideally I'd like an option that tells rsync, "Believe in yourself. You can do no wrong. Feel free to overwrite anything your little heart desires." so that it wouldn't complain about the FileB that has suddenly appeared mid-execution.
Thoughts?
linux rsync
linux rsync
asked Jun 1 '11 at 21:38
dg99dg99
475310
475310
migrated from stackoverflow.com Jun 2 '11 at 3:48
This question came from our site for professional and enthusiast programmers.
migrated from stackoverflow.com Jun 2 '11 at 3:48
This question came from our site for professional and enthusiast programmers.
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
I don't know why you are running rsyncs de way you are running them, but if I were you I'd seriously consider other ways to solve the problem that doesn't involve having multiple rsyncs writing to the same file tree at the same time.
This is from the rsync man page in the --temp-dir section:
If you are using this option for reasons other than a shortage
of disk space, you may wish to combine it with the --delay-
updates option, which will ensure that all copied files get put
into subdirectories in the destination hierarchy, awaiting the
end of the transfer. If you donât have enough room to duplicate
all the arriving files on the destination partition, another way
to tell rsync that you arenât overly concerned about disk space
is to use the --partial-dir option with a relative path; because
this tells rsync that it is OK to stash off a copy of a single
file in a subdir in the destination hierarchy, rsync will use
the partial-dir as a staging area to bring over the copied file,
and then rename it into place from there. (Specifying a --par-
tial-dir with an absolute path does not have this side-effect.)
Thanks for the note. Unfortunately waiting until the completion of the rsync before moving the files to their final destination still doesn't resolve the problem, since it's two separate rsyncs that are competing, and one will finish first for each file. I do recognize that this is a strange use of rsync, but unfortunately necessary in the environment I'm using ...
– dg99
Jun 2 '11 at 14:15
add a comment |
Given you have some directory structure with some empty dirs, and some files and you want its archival copy -- what I would try is to run rsync with parallel:
- recreate same directory structure
find /source/dir -type f|parallel mkdir -p dest/dir/{//}
- rsync files:
find /source/dir -type f|parallel rsync -a {} /dest/dir/{}
- then run one rsync to get empty dirs and make sure all is good
rsync -av /source/dir /dest/dir
This proposes an alternate approach towards the goal, rather than answering the question asked. Thanks, anyway.
– dg99
Jan 10 '15 at 0:29
"Does one of rsync's many options somehow handle this situation?" No to my knowledge ;) May be you are looking for some tool like moo.nac.uci.edu/~hjm/parsync
– dimus
Jan 12 '15 at 16:23
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f291885%2fallowing-multiple-simultaneous-rsyncs-to-play-nice-together%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I don't know why you are running rsyncs de way you are running them, but if I were you I'd seriously consider other ways to solve the problem that doesn't involve having multiple rsyncs writing to the same file tree at the same time.
This is from the rsync man page in the --temp-dir section:
If you are using this option for reasons other than a shortage
of disk space, you may wish to combine it with the --delay-
updates option, which will ensure that all copied files get put
into subdirectories in the destination hierarchy, awaiting the
end of the transfer. If you donât have enough room to duplicate
all the arriving files on the destination partition, another way
to tell rsync that you arenât overly concerned about disk space
is to use the --partial-dir option with a relative path; because
this tells rsync that it is OK to stash off a copy of a single
file in a subdir in the destination hierarchy, rsync will use
the partial-dir as a staging area to bring over the copied file,
and then rename it into place from there. (Specifying a --par-
tial-dir with an absolute path does not have this side-effect.)
Thanks for the note. Unfortunately waiting until the completion of the rsync before moving the files to their final destination still doesn't resolve the problem, since it's two separate rsyncs that are competing, and one will finish first for each file. I do recognize that this is a strange use of rsync, but unfortunately necessary in the environment I'm using ...
– dg99
Jun 2 '11 at 14:15
add a comment |
I don't know why you are running rsyncs de way you are running them, but if I were you I'd seriously consider other ways to solve the problem that doesn't involve having multiple rsyncs writing to the same file tree at the same time.
This is from the rsync man page in the --temp-dir section:
If you are using this option for reasons other than a shortage
of disk space, you may wish to combine it with the --delay-
updates option, which will ensure that all copied files get put
into subdirectories in the destination hierarchy, awaiting the
end of the transfer. If you donât have enough room to duplicate
all the arriving files on the destination partition, another way
to tell rsync that you arenât overly concerned about disk space
is to use the --partial-dir option with a relative path; because
this tells rsync that it is OK to stash off a copy of a single
file in a subdir in the destination hierarchy, rsync will use
the partial-dir as a staging area to bring over the copied file,
and then rename it into place from there. (Specifying a --par-
tial-dir with an absolute path does not have this side-effect.)
Thanks for the note. Unfortunately waiting until the completion of the rsync before moving the files to their final destination still doesn't resolve the problem, since it's two separate rsyncs that are competing, and one will finish first for each file. I do recognize that this is a strange use of rsync, but unfortunately necessary in the environment I'm using ...
– dg99
Jun 2 '11 at 14:15
add a comment |
I don't know why you are running rsyncs de way you are running them, but if I were you I'd seriously consider other ways to solve the problem that doesn't involve having multiple rsyncs writing to the same file tree at the same time.
This is from the rsync man page in the --temp-dir section:
If you are using this option for reasons other than a shortage
of disk space, you may wish to combine it with the --delay-
updates option, which will ensure that all copied files get put
into subdirectories in the destination hierarchy, awaiting the
end of the transfer. If you donât have enough room to duplicate
all the arriving files on the destination partition, another way
to tell rsync that you arenât overly concerned about disk space
is to use the --partial-dir option with a relative path; because
this tells rsync that it is OK to stash off a copy of a single
file in a subdir in the destination hierarchy, rsync will use
the partial-dir as a staging area to bring over the copied file,
and then rename it into place from there. (Specifying a --par-
tial-dir with an absolute path does not have this side-effect.)
I don't know why you are running rsyncs de way you are running them, but if I were you I'd seriously consider other ways to solve the problem that doesn't involve having multiple rsyncs writing to the same file tree at the same time.
This is from the rsync man page in the --temp-dir section:
If you are using this option for reasons other than a shortage
of disk space, you may wish to combine it with the --delay-
updates option, which will ensure that all copied files get put
into subdirectories in the destination hierarchy, awaiting the
end of the transfer. If you donât have enough room to duplicate
all the arriving files on the destination partition, another way
to tell rsync that you arenât overly concerned about disk space
is to use the --partial-dir option with a relative path; because
this tells rsync that it is OK to stash off a copy of a single
file in a subdir in the destination hierarchy, rsync will use
the partial-dir as a staging area to bring over the copied file,
and then rename it into place from there. (Specifying a --par-
tial-dir with an absolute path does not have this side-effect.)
answered Jun 1 '11 at 22:49
user780279
Thanks for the note. Unfortunately waiting until the completion of the rsync before moving the files to their final destination still doesn't resolve the problem, since it's two separate rsyncs that are competing, and one will finish first for each file. I do recognize that this is a strange use of rsync, but unfortunately necessary in the environment I'm using ...
– dg99
Jun 2 '11 at 14:15
add a comment |
Thanks for the note. Unfortunately waiting until the completion of the rsync before moving the files to their final destination still doesn't resolve the problem, since it's two separate rsyncs that are competing, and one will finish first for each file. I do recognize that this is a strange use of rsync, but unfortunately necessary in the environment I'm using ...
– dg99
Jun 2 '11 at 14:15
Thanks for the note. Unfortunately waiting until the completion of the rsync before moving the files to their final destination still doesn't resolve the problem, since it's two separate rsyncs that are competing, and one will finish first for each file. I do recognize that this is a strange use of rsync, but unfortunately necessary in the environment I'm using ...
– dg99
Jun 2 '11 at 14:15
Thanks for the note. Unfortunately waiting until the completion of the rsync before moving the files to their final destination still doesn't resolve the problem, since it's two separate rsyncs that are competing, and one will finish first for each file. I do recognize that this is a strange use of rsync, but unfortunately necessary in the environment I'm using ...
– dg99
Jun 2 '11 at 14:15
add a comment |
Given you have some directory structure with some empty dirs, and some files and you want its archival copy -- what I would try is to run rsync with parallel:
- recreate same directory structure
find /source/dir -type f|parallel mkdir -p dest/dir/{//}
- rsync files:
find /source/dir -type f|parallel rsync -a {} /dest/dir/{}
- then run one rsync to get empty dirs and make sure all is good
rsync -av /source/dir /dest/dir
This proposes an alternate approach towards the goal, rather than answering the question asked. Thanks, anyway.
– dg99
Jan 10 '15 at 0:29
"Does one of rsync's many options somehow handle this situation?" No to my knowledge ;) May be you are looking for some tool like moo.nac.uci.edu/~hjm/parsync
– dimus
Jan 12 '15 at 16:23
add a comment |
Given you have some directory structure with some empty dirs, and some files and you want its archival copy -- what I would try is to run rsync with parallel:
- recreate same directory structure
find /source/dir -type f|parallel mkdir -p dest/dir/{//}
- rsync files:
find /source/dir -type f|parallel rsync -a {} /dest/dir/{}
- then run one rsync to get empty dirs and make sure all is good
rsync -av /source/dir /dest/dir
This proposes an alternate approach towards the goal, rather than answering the question asked. Thanks, anyway.
– dg99
Jan 10 '15 at 0:29
"Does one of rsync's many options somehow handle this situation?" No to my knowledge ;) May be you are looking for some tool like moo.nac.uci.edu/~hjm/parsync
– dimus
Jan 12 '15 at 16:23
add a comment |
Given you have some directory structure with some empty dirs, and some files and you want its archival copy -- what I would try is to run rsync with parallel:
- recreate same directory structure
find /source/dir -type f|parallel mkdir -p dest/dir/{//}
- rsync files:
find /source/dir -type f|parallel rsync -a {} /dest/dir/{}
- then run one rsync to get empty dirs and make sure all is good
rsync -av /source/dir /dest/dir
Given you have some directory structure with some empty dirs, and some files and you want its archival copy -- what I would try is to run rsync with parallel:
- recreate same directory structure
find /source/dir -type f|parallel mkdir -p dest/dir/{//}
- rsync files:
find /source/dir -type f|parallel rsync -a {} /dest/dir/{}
- then run one rsync to get empty dirs and make sure all is good
rsync -av /source/dir /dest/dir
edited Jan 10 '15 at 1:40
answered Jan 9 '15 at 21:47
dimusdimus
26929
26929
This proposes an alternate approach towards the goal, rather than answering the question asked. Thanks, anyway.
– dg99
Jan 10 '15 at 0:29
"Does one of rsync's many options somehow handle this situation?" No to my knowledge ;) May be you are looking for some tool like moo.nac.uci.edu/~hjm/parsync
– dimus
Jan 12 '15 at 16:23
add a comment |
This proposes an alternate approach towards the goal, rather than answering the question asked. Thanks, anyway.
– dg99
Jan 10 '15 at 0:29
"Does one of rsync's many options somehow handle this situation?" No to my knowledge ;) May be you are looking for some tool like moo.nac.uci.edu/~hjm/parsync
– dimus
Jan 12 '15 at 16:23
This proposes an alternate approach towards the goal, rather than answering the question asked. Thanks, anyway.
– dg99
Jan 10 '15 at 0:29
This proposes an alternate approach towards the goal, rather than answering the question asked. Thanks, anyway.
– dg99
Jan 10 '15 at 0:29
"Does one of rsync's many options somehow handle this situation?" No to my knowledge ;) May be you are looking for some tool like moo.nac.uci.edu/~hjm/parsync
– dimus
Jan 12 '15 at 16:23
"Does one of rsync's many options somehow handle this situation?" No to my knowledge ;) May be you are looking for some tool like moo.nac.uci.edu/~hjm/parsync
– dimus
Jan 12 '15 at 16:23
add a comment |
Thanks for contributing an answer to Super User!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f291885%2fallowing-multiple-simultaneous-rsyncs-to-play-nice-together%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown