How to compare parts of files by hash?
I have one successfully downloaded file and another failed download (only the first 100 MB of a large file) which I suspect is the same file.
To verify this, I'd like to check their hashes, but since I only have a part of the unsuccessfully downloaded file, I only want to hash the first few megabytes or so.
How do I do this?
OS would be windows, but I have cygwin and MinGW installed.
bash hashing
add a comment |
I have one successfully downloaded file and another failed download (only the first 100 MB of a large file) which I suspect is the same file.
To verify this, I'd like to check their hashes, but since I only have a part of the unsuccessfully downloaded file, I only want to hash the first few megabytes or so.
How do I do this?
OS would be windows, but I have cygwin and MinGW installed.
bash hashing
1
Efficiently comparing one file on a local computer with another file on a distant computer is a key part of rsync, which compares parts of the files with a special hash function.
– David Cary
Dec 8 at 0:19
@DavidCary In my case, I do not have shell access to the remote computer, but thanks for the hint, I will read the manpage
– sinned
Dec 8 at 11:18
add a comment |
I have one successfully downloaded file and another failed download (only the first 100 MB of a large file) which I suspect is the same file.
To verify this, I'd like to check their hashes, but since I only have a part of the unsuccessfully downloaded file, I only want to hash the first few megabytes or so.
How do I do this?
OS would be windows, but I have cygwin and MinGW installed.
bash hashing
I have one successfully downloaded file and another failed download (only the first 100 MB of a large file) which I suspect is the same file.
To verify this, I'd like to check their hashes, but since I only have a part of the unsuccessfully downloaded file, I only want to hash the first few megabytes or so.
How do I do this?
OS would be windows, but I have cygwin and MinGW installed.
bash hashing
bash hashing
edited Dec 6 at 10:00
asked Dec 6 at 9:49
sinned
259417
259417
1
Efficiently comparing one file on a local computer with another file on a distant computer is a key part of rsync, which compares parts of the files with a special hash function.
– David Cary
Dec 8 at 0:19
@DavidCary In my case, I do not have shell access to the remote computer, but thanks for the hint, I will read the manpage
– sinned
Dec 8 at 11:18
add a comment |
1
Efficiently comparing one file on a local computer with another file on a distant computer is a key part of rsync, which compares parts of the files with a special hash function.
– David Cary
Dec 8 at 0:19
@DavidCary In my case, I do not have shell access to the remote computer, but thanks for the hint, I will read the manpage
– sinned
Dec 8 at 11:18
1
1
Efficiently comparing one file on a local computer with another file on a distant computer is a key part of rsync, which compares parts of the files with a special hash function.
– David Cary
Dec 8 at 0:19
Efficiently comparing one file on a local computer with another file on a distant computer is a key part of rsync, which compares parts of the files with a special hash function.
– David Cary
Dec 8 at 0:19
@DavidCary In my case, I do not have shell access to the remote computer, but thanks for the hint, I will read the manpage
– sinned
Dec 8 at 11:18
@DavidCary In my case, I do not have shell access to the remote computer, but thanks for the hint, I will read the manpage
– sinned
Dec 8 at 11:18
add a comment |
7 Answers
7
active
oldest
votes
Creating hashes to compare files makes sense if you compare one file against many, or when comparing many files against each other.
It does not make sense when comparing two files only once: The effort to compute the hashes is at least as high as walking over the files and comparing them directly.
An efficient file comparison tool is cmp
:
cmp --bytes $((100 * 1024 * 1024)) file1 file2 && echo "File fragments are identical"
You can also combine it with dd
to compare arbitrary parts (not necessarily from the beginning) of two files, e.g.:
cmp
<(dd if=file1 bs=100M count=1 skip=1 2>/dev/null)
<(dd if=file2 bs=100M count=1 skip=1 2>/dev/null)
&& echo "File fragments are identical"
6
Note: creating hashes to compare files also makes sense if you want to avoid reading two files at the same time.
– Kamil Maciorowski
Dec 6 at 12:45
1
@KamilMaciorowski Yes, true. But this method will still usually be faster than comparing hashes in the pairwise case.
– Konrad Rudolph
Dec 6 at 12:54
7
This is the to-go solution.cmp
is 99.99% certain to be already installed if you havebash
running, and it does the job. Indeed,cmp -n 131072 one.zip two.zip
will do the job, too. Fewest characters to type, and fastest execution. Calculating a hash is nonsensical. It requires the entire 100MB file to be read, plus a 100MB portion of the complete file, which is pointless. If they're zip files and they're different, there will be a difference within the first few hundred bytes. Readahead delivers 128k by default though, so you can as well compare 128k (same cost as comparing 1 byte).
– Damon
Dec 6 at 13:47
19
The--bytes
option is only complicating the task. Just runcmp
without this option and it will show you the first byte which differs between the files. If all the bytes are the same then it will showEOF
on the shorter file. This will give you more information than your example - how many bytes are correct.
– pabouk
Dec 6 at 14:13
2
If you have GNUcmp
(and, I think pretty much everybody does), you can use--ignore-initial
and--bytes
arguments instead of complicating things with invocations ofdd
.
– Christopher Schultz
Dec 7 at 14:27
|
show 6 more comments
I am sorry I can't exactly try that, but this way will work
dd if=yourfile.zip of=first100mb1.dat bs=100M count=1
dd if=yourotherfile.zip of=first100mb2.dat bs=100M count=1
This will get you the first 100 Megabyte of both files.
Now get the hashes:
sha256sum first100mb1.dat && sha256sum first100mb2.dat
You can also run it directly:
dd if=yourfile.zip bs=100M count=1 | sha256sum
dd if=yourotherfile.zip bs=100M count=1 | sha256sum
1
Is there a way to pipe dd somehow into sha256sum without the intermediate file?
– sinned
Dec 6 at 10:10
1
I added another way according to your request
– davidbaumann
Dec 6 at 10:15
8
Why create the hashes? That’s much less efficient than just comparing the file fragments directly (usingcmp
).
– Konrad Rudolph
Dec 6 at 12:34
In your middle code sample you say first100mb1.dat twice. Did you mean first100mb2.dat for the second one?
– doppelgreener
Dec 6 at 14:39
@KonradRudolph, "Why create the hashes?" Your solution (usingcmp
) is a winner without a doubt. But this way of solving the problem (using hashes) also has right to exist as long as it actually solves the problem (:
– VL-80
Dec 8 at 0:49
add a comment |
You could just directly compare the files, with a binary / hex diff program like vbindiff
. It quickly compares files up to 4GB on Linux & Windows.
Looks something like this, only with the difference highlighted in red (1B vs 1C):
one
0000 0000: 30 5C 72 A7 1B 6D FB FC 08 00 00 00 00 00 00 00 0r..m.. ........
0000 0010: 00 00 00 00 ....
0000 0020:
0000 0030:
0000 0040:
0000 0050:
0000 0060:
0000 0070:
0000 0080:
0000 0090:
0000 00A0:
two
0000 0000: 30 5C 72 A7 1C 6D FB FC 08 00 00 00 00 00 00 00 0r..m.. ........
0000 0010: 00 00 00 00 ....
0000 0020:
0000 0030:
0000 0040:
0000 0050:
0000 0060:
0000 0070:
0000 0080:
0000 0090:
0000 00A0:
┌──────────────────────────────────────────────────────────────────────────────┐
│Arrow keys move F find RET next difference ESC quit T move top │
│C ASCII/EBCDIC E edit file G goto position Q quit B move bottom │
└──────────────────────────────────────────────────────────────────────────────┘
In my case, the files are zip archives, so no meaningful text in there. Comparing the hash value should be faster and less error prone.
– sinned
Dec 6 at 12:44
2
If you mean ASCII text, then that's irrelevant.vbindiff
(and Konrad'scmp
) compares binary data, byte for byte. In fact has values are much more likely to experience collisions
– Xen2050
Dec 6 at 13:12
* Meant "In fact HASH values are much more likely to experience collisions" in the above comment, missed the h!
– Xen2050
Dec 7 at 7:47
add a comment |
Everybody seems to go the Unix/Linux route with this, but just comparing 2 files can easily be done with Windows standard commands:FC /B file file2
FC is present on every Windows NT version ever made. And (if I recall correctly) was also present in DOS.
It is a bit slow, but that doesn't matter for a one-time use.
add a comment |
I know it says for Bash, but OP also states that they have Windows. For anyone that wants/requires a Windows solution, there's a program called HxD which is a Hex Editor that can compare two files. If the files are different sizes, it will tell if the available parts are the same. And if need be, it's capable of running checksums for whatever is currently selected. It's free and can be downloaded from: the HxD website. I don't have any connection to the author(s), I've just been using it for years.
add a comment |
cmp will tell you when two files are identical up to the length of the smaller file:
$ dd if=/dev/random bs=8192 count=8192 > a
8192+0 records in
8192+0 records out
67108864 bytes transferred in 0.514571 secs (130417197 bytes/sec)
$ cp a b
$ dd if=/dev/random bs=8192 count=8192 >> b
8192+0 records in
8192+0 records out
67108864 bytes transferred in 0.512228 secs (131013601 bytes/sec)
$ cmp a b
cmp: EOF on a
cmp is telling you that the comparison encountered an EOF on file a before it detected any difference between the two files.
Good point. If you haven't seen it, this is what pabouk already commented on the accepted answer.
– sinned
Dec 13 at 13:50
add a comment |
If you can access a shell session the remote system, then you can break the source file up into pieces using the split
command. To split a big file into (binary) bits of one million bytes or less each:
split -b 1000000 bigfile.tgz
will create pieces xaa
xab
etc. From there it is trivial to concatenate the pieces to reconstruct the file:
cat x?? > reconstructed_bigfile.tgz
Of course you have control over the names of the file components. I am just illustrating using the defaults.
No, the zip download is from an unrealiable ticket system, to which I do not have shell access.
– sinned
Dec 7 at 13:37
-1. The question is "how to compare parts of files?" So how to compare? For now your "answer" is just a comment on how to get parts of files.
– Kamil Maciorowski
Dec 7 at 13:41
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1381285%2fhow-to-compare-parts-of-files-by-hash%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
7 Answers
7
active
oldest
votes
7 Answers
7
active
oldest
votes
active
oldest
votes
active
oldest
votes
Creating hashes to compare files makes sense if you compare one file against many, or when comparing many files against each other.
It does not make sense when comparing two files only once: The effort to compute the hashes is at least as high as walking over the files and comparing them directly.
An efficient file comparison tool is cmp
:
cmp --bytes $((100 * 1024 * 1024)) file1 file2 && echo "File fragments are identical"
You can also combine it with dd
to compare arbitrary parts (not necessarily from the beginning) of two files, e.g.:
cmp
<(dd if=file1 bs=100M count=1 skip=1 2>/dev/null)
<(dd if=file2 bs=100M count=1 skip=1 2>/dev/null)
&& echo "File fragments are identical"
6
Note: creating hashes to compare files also makes sense if you want to avoid reading two files at the same time.
– Kamil Maciorowski
Dec 6 at 12:45
1
@KamilMaciorowski Yes, true. But this method will still usually be faster than comparing hashes in the pairwise case.
– Konrad Rudolph
Dec 6 at 12:54
7
This is the to-go solution.cmp
is 99.99% certain to be already installed if you havebash
running, and it does the job. Indeed,cmp -n 131072 one.zip two.zip
will do the job, too. Fewest characters to type, and fastest execution. Calculating a hash is nonsensical. It requires the entire 100MB file to be read, plus a 100MB portion of the complete file, which is pointless. If they're zip files and they're different, there will be a difference within the first few hundred bytes. Readahead delivers 128k by default though, so you can as well compare 128k (same cost as comparing 1 byte).
– Damon
Dec 6 at 13:47
19
The--bytes
option is only complicating the task. Just runcmp
without this option and it will show you the first byte which differs between the files. If all the bytes are the same then it will showEOF
on the shorter file. This will give you more information than your example - how many bytes are correct.
– pabouk
Dec 6 at 14:13
2
If you have GNUcmp
(and, I think pretty much everybody does), you can use--ignore-initial
and--bytes
arguments instead of complicating things with invocations ofdd
.
– Christopher Schultz
Dec 7 at 14:27
|
show 6 more comments
Creating hashes to compare files makes sense if you compare one file against many, or when comparing many files against each other.
It does not make sense when comparing two files only once: The effort to compute the hashes is at least as high as walking over the files and comparing them directly.
An efficient file comparison tool is cmp
:
cmp --bytes $((100 * 1024 * 1024)) file1 file2 && echo "File fragments are identical"
You can also combine it with dd
to compare arbitrary parts (not necessarily from the beginning) of two files, e.g.:
cmp
<(dd if=file1 bs=100M count=1 skip=1 2>/dev/null)
<(dd if=file2 bs=100M count=1 skip=1 2>/dev/null)
&& echo "File fragments are identical"
6
Note: creating hashes to compare files also makes sense if you want to avoid reading two files at the same time.
– Kamil Maciorowski
Dec 6 at 12:45
1
@KamilMaciorowski Yes, true. But this method will still usually be faster than comparing hashes in the pairwise case.
– Konrad Rudolph
Dec 6 at 12:54
7
This is the to-go solution.cmp
is 99.99% certain to be already installed if you havebash
running, and it does the job. Indeed,cmp -n 131072 one.zip two.zip
will do the job, too. Fewest characters to type, and fastest execution. Calculating a hash is nonsensical. It requires the entire 100MB file to be read, plus a 100MB portion of the complete file, which is pointless. If they're zip files and they're different, there will be a difference within the first few hundred bytes. Readahead delivers 128k by default though, so you can as well compare 128k (same cost as comparing 1 byte).
– Damon
Dec 6 at 13:47
19
The--bytes
option is only complicating the task. Just runcmp
without this option and it will show you the first byte which differs between the files. If all the bytes are the same then it will showEOF
on the shorter file. This will give you more information than your example - how many bytes are correct.
– pabouk
Dec 6 at 14:13
2
If you have GNUcmp
(and, I think pretty much everybody does), you can use--ignore-initial
and--bytes
arguments instead of complicating things with invocations ofdd
.
– Christopher Schultz
Dec 7 at 14:27
|
show 6 more comments
Creating hashes to compare files makes sense if you compare one file against many, or when comparing many files against each other.
It does not make sense when comparing two files only once: The effort to compute the hashes is at least as high as walking over the files and comparing them directly.
An efficient file comparison tool is cmp
:
cmp --bytes $((100 * 1024 * 1024)) file1 file2 && echo "File fragments are identical"
You can also combine it with dd
to compare arbitrary parts (not necessarily from the beginning) of two files, e.g.:
cmp
<(dd if=file1 bs=100M count=1 skip=1 2>/dev/null)
<(dd if=file2 bs=100M count=1 skip=1 2>/dev/null)
&& echo "File fragments are identical"
Creating hashes to compare files makes sense if you compare one file against many, or when comparing many files against each other.
It does not make sense when comparing two files only once: The effort to compute the hashes is at least as high as walking over the files and comparing them directly.
An efficient file comparison tool is cmp
:
cmp --bytes $((100 * 1024 * 1024)) file1 file2 && echo "File fragments are identical"
You can also combine it with dd
to compare arbitrary parts (not necessarily from the beginning) of two files, e.g.:
cmp
<(dd if=file1 bs=100M count=1 skip=1 2>/dev/null)
<(dd if=file2 bs=100M count=1 skip=1 2>/dev/null)
&& echo "File fragments are identical"
edited Dec 6 at 14:15
answered Dec 6 at 12:42
Konrad Rudolph
4,40962234
4,40962234
6
Note: creating hashes to compare files also makes sense if you want to avoid reading two files at the same time.
– Kamil Maciorowski
Dec 6 at 12:45
1
@KamilMaciorowski Yes, true. But this method will still usually be faster than comparing hashes in the pairwise case.
– Konrad Rudolph
Dec 6 at 12:54
7
This is the to-go solution.cmp
is 99.99% certain to be already installed if you havebash
running, and it does the job. Indeed,cmp -n 131072 one.zip two.zip
will do the job, too. Fewest characters to type, and fastest execution. Calculating a hash is nonsensical. It requires the entire 100MB file to be read, plus a 100MB portion of the complete file, which is pointless. If they're zip files and they're different, there will be a difference within the first few hundred bytes. Readahead delivers 128k by default though, so you can as well compare 128k (same cost as comparing 1 byte).
– Damon
Dec 6 at 13:47
19
The--bytes
option is only complicating the task. Just runcmp
without this option and it will show you the first byte which differs between the files. If all the bytes are the same then it will showEOF
on the shorter file. This will give you more information than your example - how many bytes are correct.
– pabouk
Dec 6 at 14:13
2
If you have GNUcmp
(and, I think pretty much everybody does), you can use--ignore-initial
and--bytes
arguments instead of complicating things with invocations ofdd
.
– Christopher Schultz
Dec 7 at 14:27
|
show 6 more comments
6
Note: creating hashes to compare files also makes sense if you want to avoid reading two files at the same time.
– Kamil Maciorowski
Dec 6 at 12:45
1
@KamilMaciorowski Yes, true. But this method will still usually be faster than comparing hashes in the pairwise case.
– Konrad Rudolph
Dec 6 at 12:54
7
This is the to-go solution.cmp
is 99.99% certain to be already installed if you havebash
running, and it does the job. Indeed,cmp -n 131072 one.zip two.zip
will do the job, too. Fewest characters to type, and fastest execution. Calculating a hash is nonsensical. It requires the entire 100MB file to be read, plus a 100MB portion of the complete file, which is pointless. If they're zip files and they're different, there will be a difference within the first few hundred bytes. Readahead delivers 128k by default though, so you can as well compare 128k (same cost as comparing 1 byte).
– Damon
Dec 6 at 13:47
19
The--bytes
option is only complicating the task. Just runcmp
without this option and it will show you the first byte which differs between the files. If all the bytes are the same then it will showEOF
on the shorter file. This will give you more information than your example - how many bytes are correct.
– pabouk
Dec 6 at 14:13
2
If you have GNUcmp
(and, I think pretty much everybody does), you can use--ignore-initial
and--bytes
arguments instead of complicating things with invocations ofdd
.
– Christopher Schultz
Dec 7 at 14:27
6
6
Note: creating hashes to compare files also makes sense if you want to avoid reading two files at the same time.
– Kamil Maciorowski
Dec 6 at 12:45
Note: creating hashes to compare files also makes sense if you want to avoid reading two files at the same time.
– Kamil Maciorowski
Dec 6 at 12:45
1
1
@KamilMaciorowski Yes, true. But this method will still usually be faster than comparing hashes in the pairwise case.
– Konrad Rudolph
Dec 6 at 12:54
@KamilMaciorowski Yes, true. But this method will still usually be faster than comparing hashes in the pairwise case.
– Konrad Rudolph
Dec 6 at 12:54
7
7
This is the to-go solution.
cmp
is 99.99% certain to be already installed if you have bash
running, and it does the job. Indeed, cmp -n 131072 one.zip two.zip
will do the job, too. Fewest characters to type, and fastest execution. Calculating a hash is nonsensical. It requires the entire 100MB file to be read, plus a 100MB portion of the complete file, which is pointless. If they're zip files and they're different, there will be a difference within the first few hundred bytes. Readahead delivers 128k by default though, so you can as well compare 128k (same cost as comparing 1 byte).– Damon
Dec 6 at 13:47
This is the to-go solution.
cmp
is 99.99% certain to be already installed if you have bash
running, and it does the job. Indeed, cmp -n 131072 one.zip two.zip
will do the job, too. Fewest characters to type, and fastest execution. Calculating a hash is nonsensical. It requires the entire 100MB file to be read, plus a 100MB portion of the complete file, which is pointless. If they're zip files and they're different, there will be a difference within the first few hundred bytes. Readahead delivers 128k by default though, so you can as well compare 128k (same cost as comparing 1 byte).– Damon
Dec 6 at 13:47
19
19
The
--bytes
option is only complicating the task. Just run cmp
without this option and it will show you the first byte which differs between the files. If all the bytes are the same then it will show EOF
on the shorter file. This will give you more information than your example - how many bytes are correct.– pabouk
Dec 6 at 14:13
The
--bytes
option is only complicating the task. Just run cmp
without this option and it will show you the first byte which differs between the files. If all the bytes are the same then it will show EOF
on the shorter file. This will give you more information than your example - how many bytes are correct.– pabouk
Dec 6 at 14:13
2
2
If you have GNU
cmp
(and, I think pretty much everybody does), you can use --ignore-initial
and --bytes
arguments instead of complicating things with invocations of dd
.– Christopher Schultz
Dec 7 at 14:27
If you have GNU
cmp
(and, I think pretty much everybody does), you can use --ignore-initial
and --bytes
arguments instead of complicating things with invocations of dd
.– Christopher Schultz
Dec 7 at 14:27
|
show 6 more comments
I am sorry I can't exactly try that, but this way will work
dd if=yourfile.zip of=first100mb1.dat bs=100M count=1
dd if=yourotherfile.zip of=first100mb2.dat bs=100M count=1
This will get you the first 100 Megabyte of both files.
Now get the hashes:
sha256sum first100mb1.dat && sha256sum first100mb2.dat
You can also run it directly:
dd if=yourfile.zip bs=100M count=1 | sha256sum
dd if=yourotherfile.zip bs=100M count=1 | sha256sum
1
Is there a way to pipe dd somehow into sha256sum without the intermediate file?
– sinned
Dec 6 at 10:10
1
I added another way according to your request
– davidbaumann
Dec 6 at 10:15
8
Why create the hashes? That’s much less efficient than just comparing the file fragments directly (usingcmp
).
– Konrad Rudolph
Dec 6 at 12:34
In your middle code sample you say first100mb1.dat twice. Did you mean first100mb2.dat for the second one?
– doppelgreener
Dec 6 at 14:39
@KonradRudolph, "Why create the hashes?" Your solution (usingcmp
) is a winner without a doubt. But this way of solving the problem (using hashes) also has right to exist as long as it actually solves the problem (:
– VL-80
Dec 8 at 0:49
add a comment |
I am sorry I can't exactly try that, but this way will work
dd if=yourfile.zip of=first100mb1.dat bs=100M count=1
dd if=yourotherfile.zip of=first100mb2.dat bs=100M count=1
This will get you the first 100 Megabyte of both files.
Now get the hashes:
sha256sum first100mb1.dat && sha256sum first100mb2.dat
You can also run it directly:
dd if=yourfile.zip bs=100M count=1 | sha256sum
dd if=yourotherfile.zip bs=100M count=1 | sha256sum
1
Is there a way to pipe dd somehow into sha256sum without the intermediate file?
– sinned
Dec 6 at 10:10
1
I added another way according to your request
– davidbaumann
Dec 6 at 10:15
8
Why create the hashes? That’s much less efficient than just comparing the file fragments directly (usingcmp
).
– Konrad Rudolph
Dec 6 at 12:34
In your middle code sample you say first100mb1.dat twice. Did you mean first100mb2.dat for the second one?
– doppelgreener
Dec 6 at 14:39
@KonradRudolph, "Why create the hashes?" Your solution (usingcmp
) is a winner without a doubt. But this way of solving the problem (using hashes) also has right to exist as long as it actually solves the problem (:
– VL-80
Dec 8 at 0:49
add a comment |
I am sorry I can't exactly try that, but this way will work
dd if=yourfile.zip of=first100mb1.dat bs=100M count=1
dd if=yourotherfile.zip of=first100mb2.dat bs=100M count=1
This will get you the first 100 Megabyte of both files.
Now get the hashes:
sha256sum first100mb1.dat && sha256sum first100mb2.dat
You can also run it directly:
dd if=yourfile.zip bs=100M count=1 | sha256sum
dd if=yourotherfile.zip bs=100M count=1 | sha256sum
I am sorry I can't exactly try that, but this way will work
dd if=yourfile.zip of=first100mb1.dat bs=100M count=1
dd if=yourotherfile.zip of=first100mb2.dat bs=100M count=1
This will get you the first 100 Megabyte of both files.
Now get the hashes:
sha256sum first100mb1.dat && sha256sum first100mb2.dat
You can also run it directly:
dd if=yourfile.zip bs=100M count=1 | sha256sum
dd if=yourotherfile.zip bs=100M count=1 | sha256sum
edited Dec 6 at 14:58
answered Dec 6 at 10:04
davidbaumann
1,842722
1,842722
1
Is there a way to pipe dd somehow into sha256sum without the intermediate file?
– sinned
Dec 6 at 10:10
1
I added another way according to your request
– davidbaumann
Dec 6 at 10:15
8
Why create the hashes? That’s much less efficient than just comparing the file fragments directly (usingcmp
).
– Konrad Rudolph
Dec 6 at 12:34
In your middle code sample you say first100mb1.dat twice. Did you mean first100mb2.dat for the second one?
– doppelgreener
Dec 6 at 14:39
@KonradRudolph, "Why create the hashes?" Your solution (usingcmp
) is a winner without a doubt. But this way of solving the problem (using hashes) also has right to exist as long as it actually solves the problem (:
– VL-80
Dec 8 at 0:49
add a comment |
1
Is there a way to pipe dd somehow into sha256sum without the intermediate file?
– sinned
Dec 6 at 10:10
1
I added another way according to your request
– davidbaumann
Dec 6 at 10:15
8
Why create the hashes? That’s much less efficient than just comparing the file fragments directly (usingcmp
).
– Konrad Rudolph
Dec 6 at 12:34
In your middle code sample you say first100mb1.dat twice. Did you mean first100mb2.dat for the second one?
– doppelgreener
Dec 6 at 14:39
@KonradRudolph, "Why create the hashes?" Your solution (usingcmp
) is a winner without a doubt. But this way of solving the problem (using hashes) also has right to exist as long as it actually solves the problem (:
– VL-80
Dec 8 at 0:49
1
1
Is there a way to pipe dd somehow into sha256sum without the intermediate file?
– sinned
Dec 6 at 10:10
Is there a way to pipe dd somehow into sha256sum without the intermediate file?
– sinned
Dec 6 at 10:10
1
1
I added another way according to your request
– davidbaumann
Dec 6 at 10:15
I added another way according to your request
– davidbaumann
Dec 6 at 10:15
8
8
Why create the hashes? That’s much less efficient than just comparing the file fragments directly (using
cmp
).– Konrad Rudolph
Dec 6 at 12:34
Why create the hashes? That’s much less efficient than just comparing the file fragments directly (using
cmp
).– Konrad Rudolph
Dec 6 at 12:34
In your middle code sample you say first100mb1.dat twice. Did you mean first100mb2.dat for the second one?
– doppelgreener
Dec 6 at 14:39
In your middle code sample you say first100mb1.dat twice. Did you mean first100mb2.dat for the second one?
– doppelgreener
Dec 6 at 14:39
@KonradRudolph, "Why create the hashes?" Your solution (using
cmp
) is a winner without a doubt. But this way of solving the problem (using hashes) also has right to exist as long as it actually solves the problem (:– VL-80
Dec 8 at 0:49
@KonradRudolph, "Why create the hashes?" Your solution (using
cmp
) is a winner without a doubt. But this way of solving the problem (using hashes) also has right to exist as long as it actually solves the problem (:– VL-80
Dec 8 at 0:49
add a comment |
You could just directly compare the files, with a binary / hex diff program like vbindiff
. It quickly compares files up to 4GB on Linux & Windows.
Looks something like this, only with the difference highlighted in red (1B vs 1C):
one
0000 0000: 30 5C 72 A7 1B 6D FB FC 08 00 00 00 00 00 00 00 0r..m.. ........
0000 0010: 00 00 00 00 ....
0000 0020:
0000 0030:
0000 0040:
0000 0050:
0000 0060:
0000 0070:
0000 0080:
0000 0090:
0000 00A0:
two
0000 0000: 30 5C 72 A7 1C 6D FB FC 08 00 00 00 00 00 00 00 0r..m.. ........
0000 0010: 00 00 00 00 ....
0000 0020:
0000 0030:
0000 0040:
0000 0050:
0000 0060:
0000 0070:
0000 0080:
0000 0090:
0000 00A0:
┌──────────────────────────────────────────────────────────────────────────────┐
│Arrow keys move F find RET next difference ESC quit T move top │
│C ASCII/EBCDIC E edit file G goto position Q quit B move bottom │
└──────────────────────────────────────────────────────────────────────────────┘
In my case, the files are zip archives, so no meaningful text in there. Comparing the hash value should be faster and less error prone.
– sinned
Dec 6 at 12:44
2
If you mean ASCII text, then that's irrelevant.vbindiff
(and Konrad'scmp
) compares binary data, byte for byte. In fact has values are much more likely to experience collisions
– Xen2050
Dec 6 at 13:12
* Meant "In fact HASH values are much more likely to experience collisions" in the above comment, missed the h!
– Xen2050
Dec 7 at 7:47
add a comment |
You could just directly compare the files, with a binary / hex diff program like vbindiff
. It quickly compares files up to 4GB on Linux & Windows.
Looks something like this, only with the difference highlighted in red (1B vs 1C):
one
0000 0000: 30 5C 72 A7 1B 6D FB FC 08 00 00 00 00 00 00 00 0r..m.. ........
0000 0010: 00 00 00 00 ....
0000 0020:
0000 0030:
0000 0040:
0000 0050:
0000 0060:
0000 0070:
0000 0080:
0000 0090:
0000 00A0:
two
0000 0000: 30 5C 72 A7 1C 6D FB FC 08 00 00 00 00 00 00 00 0r..m.. ........
0000 0010: 00 00 00 00 ....
0000 0020:
0000 0030:
0000 0040:
0000 0050:
0000 0060:
0000 0070:
0000 0080:
0000 0090:
0000 00A0:
┌──────────────────────────────────────────────────────────────────────────────┐
│Arrow keys move F find RET next difference ESC quit T move top │
│C ASCII/EBCDIC E edit file G goto position Q quit B move bottom │
└──────────────────────────────────────────────────────────────────────────────┘
In my case, the files are zip archives, so no meaningful text in there. Comparing the hash value should be faster and less error prone.
– sinned
Dec 6 at 12:44
2
If you mean ASCII text, then that's irrelevant.vbindiff
(and Konrad'scmp
) compares binary data, byte for byte. In fact has values are much more likely to experience collisions
– Xen2050
Dec 6 at 13:12
* Meant "In fact HASH values are much more likely to experience collisions" in the above comment, missed the h!
– Xen2050
Dec 7 at 7:47
add a comment |
You could just directly compare the files, with a binary / hex diff program like vbindiff
. It quickly compares files up to 4GB on Linux & Windows.
Looks something like this, only with the difference highlighted in red (1B vs 1C):
one
0000 0000: 30 5C 72 A7 1B 6D FB FC 08 00 00 00 00 00 00 00 0r..m.. ........
0000 0010: 00 00 00 00 ....
0000 0020:
0000 0030:
0000 0040:
0000 0050:
0000 0060:
0000 0070:
0000 0080:
0000 0090:
0000 00A0:
two
0000 0000: 30 5C 72 A7 1C 6D FB FC 08 00 00 00 00 00 00 00 0r..m.. ........
0000 0010: 00 00 00 00 ....
0000 0020:
0000 0030:
0000 0040:
0000 0050:
0000 0060:
0000 0070:
0000 0080:
0000 0090:
0000 00A0:
┌──────────────────────────────────────────────────────────────────────────────┐
│Arrow keys move F find RET next difference ESC quit T move top │
│C ASCII/EBCDIC E edit file G goto position Q quit B move bottom │
└──────────────────────────────────────────────────────────────────────────────┘
You could just directly compare the files, with a binary / hex diff program like vbindiff
. It quickly compares files up to 4GB on Linux & Windows.
Looks something like this, only with the difference highlighted in red (1B vs 1C):
one
0000 0000: 30 5C 72 A7 1B 6D FB FC 08 00 00 00 00 00 00 00 0r..m.. ........
0000 0010: 00 00 00 00 ....
0000 0020:
0000 0030:
0000 0040:
0000 0050:
0000 0060:
0000 0070:
0000 0080:
0000 0090:
0000 00A0:
two
0000 0000: 30 5C 72 A7 1C 6D FB FC 08 00 00 00 00 00 00 00 0r..m.. ........
0000 0010: 00 00 00 00 ....
0000 0020:
0000 0030:
0000 0040:
0000 0050:
0000 0060:
0000 0070:
0000 0080:
0000 0090:
0000 00A0:
┌──────────────────────────────────────────────────────────────────────────────┐
│Arrow keys move F find RET next difference ESC quit T move top │
│C ASCII/EBCDIC E edit file G goto position Q quit B move bottom │
└──────────────────────────────────────────────────────────────────────────────┘
answered Dec 6 at 11:24
Xen2050
9,96431536
9,96431536
In my case, the files are zip archives, so no meaningful text in there. Comparing the hash value should be faster and less error prone.
– sinned
Dec 6 at 12:44
2
If you mean ASCII text, then that's irrelevant.vbindiff
(and Konrad'scmp
) compares binary data, byte for byte. In fact has values are much more likely to experience collisions
– Xen2050
Dec 6 at 13:12
* Meant "In fact HASH values are much more likely to experience collisions" in the above comment, missed the h!
– Xen2050
Dec 7 at 7:47
add a comment |
In my case, the files are zip archives, so no meaningful text in there. Comparing the hash value should be faster and less error prone.
– sinned
Dec 6 at 12:44
2
If you mean ASCII text, then that's irrelevant.vbindiff
(and Konrad'scmp
) compares binary data, byte for byte. In fact has values are much more likely to experience collisions
– Xen2050
Dec 6 at 13:12
* Meant "In fact HASH values are much more likely to experience collisions" in the above comment, missed the h!
– Xen2050
Dec 7 at 7:47
In my case, the files are zip archives, so no meaningful text in there. Comparing the hash value should be faster and less error prone.
– sinned
Dec 6 at 12:44
In my case, the files are zip archives, so no meaningful text in there. Comparing the hash value should be faster and less error prone.
– sinned
Dec 6 at 12:44
2
2
If you mean ASCII text, then that's irrelevant.
vbindiff
(and Konrad's cmp
) compares binary data, byte for byte. In fact has values are much more likely to experience collisions– Xen2050
Dec 6 at 13:12
If you mean ASCII text, then that's irrelevant.
vbindiff
(and Konrad's cmp
) compares binary data, byte for byte. In fact has values are much more likely to experience collisions– Xen2050
Dec 6 at 13:12
* Meant "In fact HASH values are much more likely to experience collisions" in the above comment, missed the h!
– Xen2050
Dec 7 at 7:47
* Meant "In fact HASH values are much more likely to experience collisions" in the above comment, missed the h!
– Xen2050
Dec 7 at 7:47
add a comment |
Everybody seems to go the Unix/Linux route with this, but just comparing 2 files can easily be done with Windows standard commands:FC /B file file2
FC is present on every Windows NT version ever made. And (if I recall correctly) was also present in DOS.
It is a bit slow, but that doesn't matter for a one-time use.
add a comment |
Everybody seems to go the Unix/Linux route with this, but just comparing 2 files can easily be done with Windows standard commands:FC /B file file2
FC is present on every Windows NT version ever made. And (if I recall correctly) was also present in DOS.
It is a bit slow, but that doesn't matter for a one-time use.
add a comment |
Everybody seems to go the Unix/Linux route with this, but just comparing 2 files can easily be done with Windows standard commands:FC /B file file2
FC is present on every Windows NT version ever made. And (if I recall correctly) was also present in DOS.
It is a bit slow, but that doesn't matter for a one-time use.
Everybody seems to go the Unix/Linux route with this, but just comparing 2 files can easily be done with Windows standard commands:FC /B file file2
FC is present on every Windows NT version ever made. And (if I recall correctly) was also present in DOS.
It is a bit slow, but that doesn't matter for a one-time use.
answered Dec 7 at 14:20
Tonny
16.9k33353
16.9k33353
add a comment |
add a comment |
I know it says for Bash, but OP also states that they have Windows. For anyone that wants/requires a Windows solution, there's a program called HxD which is a Hex Editor that can compare two files. If the files are different sizes, it will tell if the available parts are the same. And if need be, it's capable of running checksums for whatever is currently selected. It's free and can be downloaded from: the HxD website. I don't have any connection to the author(s), I've just been using it for years.
add a comment |
I know it says for Bash, but OP also states that they have Windows. For anyone that wants/requires a Windows solution, there's a program called HxD which is a Hex Editor that can compare two files. If the files are different sizes, it will tell if the available parts are the same. And if need be, it's capable of running checksums for whatever is currently selected. It's free and can be downloaded from: the HxD website. I don't have any connection to the author(s), I've just been using it for years.
add a comment |
I know it says for Bash, but OP also states that they have Windows. For anyone that wants/requires a Windows solution, there's a program called HxD which is a Hex Editor that can compare two files. If the files are different sizes, it will tell if the available parts are the same. And if need be, it's capable of running checksums for whatever is currently selected. It's free and can be downloaded from: the HxD website. I don't have any connection to the author(s), I've just been using it for years.
I know it says for Bash, but OP also states that they have Windows. For anyone that wants/requires a Windows solution, there's a program called HxD which is a Hex Editor that can compare two files. If the files are different sizes, it will tell if the available parts are the same. And if need be, it's capable of running checksums for whatever is currently selected. It's free and can be downloaded from: the HxD website. I don't have any connection to the author(s), I've just been using it for years.
answered Dec 8 at 2:11
Blerg
984313
984313
add a comment |
add a comment |
cmp will tell you when two files are identical up to the length of the smaller file:
$ dd if=/dev/random bs=8192 count=8192 > a
8192+0 records in
8192+0 records out
67108864 bytes transferred in 0.514571 secs (130417197 bytes/sec)
$ cp a b
$ dd if=/dev/random bs=8192 count=8192 >> b
8192+0 records in
8192+0 records out
67108864 bytes transferred in 0.512228 secs (131013601 bytes/sec)
$ cmp a b
cmp: EOF on a
cmp is telling you that the comparison encountered an EOF on file a before it detected any difference between the two files.
Good point. If you haven't seen it, this is what pabouk already commented on the accepted answer.
– sinned
Dec 13 at 13:50
add a comment |
cmp will tell you when two files are identical up to the length of the smaller file:
$ dd if=/dev/random bs=8192 count=8192 > a
8192+0 records in
8192+0 records out
67108864 bytes transferred in 0.514571 secs (130417197 bytes/sec)
$ cp a b
$ dd if=/dev/random bs=8192 count=8192 >> b
8192+0 records in
8192+0 records out
67108864 bytes transferred in 0.512228 secs (131013601 bytes/sec)
$ cmp a b
cmp: EOF on a
cmp is telling you that the comparison encountered an EOF on file a before it detected any difference between the two files.
Good point. If you haven't seen it, this is what pabouk already commented on the accepted answer.
– sinned
Dec 13 at 13:50
add a comment |
cmp will tell you when two files are identical up to the length of the smaller file:
$ dd if=/dev/random bs=8192 count=8192 > a
8192+0 records in
8192+0 records out
67108864 bytes transferred in 0.514571 secs (130417197 bytes/sec)
$ cp a b
$ dd if=/dev/random bs=8192 count=8192 >> b
8192+0 records in
8192+0 records out
67108864 bytes transferred in 0.512228 secs (131013601 bytes/sec)
$ cmp a b
cmp: EOF on a
cmp is telling you that the comparison encountered an EOF on file a before it detected any difference between the two files.
cmp will tell you when two files are identical up to the length of the smaller file:
$ dd if=/dev/random bs=8192 count=8192 > a
8192+0 records in
8192+0 records out
67108864 bytes transferred in 0.514571 secs (130417197 bytes/sec)
$ cp a b
$ dd if=/dev/random bs=8192 count=8192 >> b
8192+0 records in
8192+0 records out
67108864 bytes transferred in 0.512228 secs (131013601 bytes/sec)
$ cmp a b
cmp: EOF on a
cmp is telling you that the comparison encountered an EOF on file a before it detected any difference between the two files.
answered Dec 12 at 23:14
Jim L.
913
913
Good point. If you haven't seen it, this is what pabouk already commented on the accepted answer.
– sinned
Dec 13 at 13:50
add a comment |
Good point. If you haven't seen it, this is what pabouk already commented on the accepted answer.
– sinned
Dec 13 at 13:50
Good point. If you haven't seen it, this is what pabouk already commented on the accepted answer.
– sinned
Dec 13 at 13:50
Good point. If you haven't seen it, this is what pabouk already commented on the accepted answer.
– sinned
Dec 13 at 13:50
add a comment |
If you can access a shell session the remote system, then you can break the source file up into pieces using the split
command. To split a big file into (binary) bits of one million bytes or less each:
split -b 1000000 bigfile.tgz
will create pieces xaa
xab
etc. From there it is trivial to concatenate the pieces to reconstruct the file:
cat x?? > reconstructed_bigfile.tgz
Of course you have control over the names of the file components. I am just illustrating using the defaults.
No, the zip download is from an unrealiable ticket system, to which I do not have shell access.
– sinned
Dec 7 at 13:37
-1. The question is "how to compare parts of files?" So how to compare? For now your "answer" is just a comment on how to get parts of files.
– Kamil Maciorowski
Dec 7 at 13:41
add a comment |
If you can access a shell session the remote system, then you can break the source file up into pieces using the split
command. To split a big file into (binary) bits of one million bytes or less each:
split -b 1000000 bigfile.tgz
will create pieces xaa
xab
etc. From there it is trivial to concatenate the pieces to reconstruct the file:
cat x?? > reconstructed_bigfile.tgz
Of course you have control over the names of the file components. I am just illustrating using the defaults.
No, the zip download is from an unrealiable ticket system, to which I do not have shell access.
– sinned
Dec 7 at 13:37
-1. The question is "how to compare parts of files?" So how to compare? For now your "answer" is just a comment on how to get parts of files.
– Kamil Maciorowski
Dec 7 at 13:41
add a comment |
If you can access a shell session the remote system, then you can break the source file up into pieces using the split
command. To split a big file into (binary) bits of one million bytes or less each:
split -b 1000000 bigfile.tgz
will create pieces xaa
xab
etc. From there it is trivial to concatenate the pieces to reconstruct the file:
cat x?? > reconstructed_bigfile.tgz
Of course you have control over the names of the file components. I am just illustrating using the defaults.
If you can access a shell session the remote system, then you can break the source file up into pieces using the split
command. To split a big file into (binary) bits of one million bytes or less each:
split -b 1000000 bigfile.tgz
will create pieces xaa
xab
etc. From there it is trivial to concatenate the pieces to reconstruct the file:
cat x?? > reconstructed_bigfile.tgz
Of course you have control over the names of the file components. I am just illustrating using the defaults.
answered Dec 7 at 13:36
user48918
1371
1371
No, the zip download is from an unrealiable ticket system, to which I do not have shell access.
– sinned
Dec 7 at 13:37
-1. The question is "how to compare parts of files?" So how to compare? For now your "answer" is just a comment on how to get parts of files.
– Kamil Maciorowski
Dec 7 at 13:41
add a comment |
No, the zip download is from an unrealiable ticket system, to which I do not have shell access.
– sinned
Dec 7 at 13:37
-1. The question is "how to compare parts of files?" So how to compare? For now your "answer" is just a comment on how to get parts of files.
– Kamil Maciorowski
Dec 7 at 13:41
No, the zip download is from an unrealiable ticket system, to which I do not have shell access.
– sinned
Dec 7 at 13:37
No, the zip download is from an unrealiable ticket system, to which I do not have shell access.
– sinned
Dec 7 at 13:37
-1. The question is "how to compare parts of files?" So how to compare? For now your "answer" is just a comment on how to get parts of files.
– Kamil Maciorowski
Dec 7 at 13:41
-1. The question is "how to compare parts of files?" So how to compare? For now your "answer" is just a comment on how to get parts of files.
– Kamil Maciorowski
Dec 7 at 13:41
add a comment |
Thanks for contributing an answer to Super User!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1381285%2fhow-to-compare-parts-of-files-by-hash%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Efficiently comparing one file on a local computer with another file on a distant computer is a key part of rsync, which compares parts of the files with a special hash function.
– David Cary
Dec 8 at 0:19
@DavidCary In my case, I do not have shell access to the remote computer, but thanks for the hint, I will read the manpage
– sinned
Dec 8 at 11:18