How to fetch files by mime type with wget
up vote
0
down vote
favorite
Some URLs are like this:
/foo/bar
In that, they don't have an extension like this:
/foo/bar.txt
If there is an extension it's easy:
wget -r -A .txt http://asdf.com
But if there isn't, then I'm not sure how to fetch the files. Basically, there are some files like PDFs or other things that are at a path like /0du8qj8quqjc9
with no extension, or maybe even /download.php?pdf=124u0cje8u
. The question is how to download these files only if it matches a mime-type. So for example something like:
wget -r --accept-mime text/plain,application/pdf http://asdf.com
Wondering if there's anything to do like that.
wget mime-types
add a comment |
up vote
0
down vote
favorite
Some URLs are like this:
/foo/bar
In that, they don't have an extension like this:
/foo/bar.txt
If there is an extension it's easy:
wget -r -A .txt http://asdf.com
But if there isn't, then I'm not sure how to fetch the files. Basically, there are some files like PDFs or other things that are at a path like /0du8qj8quqjc9
with no extension, or maybe even /download.php?pdf=124u0cje8u
. The question is how to download these files only if it matches a mime-type. So for example something like:
wget -r --accept-mime text/plain,application/pdf http://asdf.com
Wondering if there's anything to do like that.
wget mime-types
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
Some URLs are like this:
/foo/bar
In that, they don't have an extension like this:
/foo/bar.txt
If there is an extension it's easy:
wget -r -A .txt http://asdf.com
But if there isn't, then I'm not sure how to fetch the files. Basically, there are some files like PDFs or other things that are at a path like /0du8qj8quqjc9
with no extension, or maybe even /download.php?pdf=124u0cje8u
. The question is how to download these files only if it matches a mime-type. So for example something like:
wget -r --accept-mime text/plain,application/pdf http://asdf.com
Wondering if there's anything to do like that.
wget mime-types
Some URLs are like this:
/foo/bar
In that, they don't have an extension like this:
/foo/bar.txt
If there is an extension it's easy:
wget -r -A .txt http://asdf.com
But if there isn't, then I'm not sure how to fetch the files. Basically, there are some files like PDFs or other things that are at a path like /0du8qj8quqjc9
with no extension, or maybe even /download.php?pdf=124u0cje8u
. The question is how to download these files only if it matches a mime-type. So for example something like:
wget -r --accept-mime text/plain,application/pdf http://asdf.com
Wondering if there's anything to do like that.
wget mime-types
wget mime-types
asked Nov 8 at 3:21
Lance Pollard
2911310
2911310
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
Wget2 already has this feature :-)
--filter-mime-type Specify a list of mime types to be saved or ignored`
### `--filter-mime-type=list`
Specify a comma-separated list of MIME types that will be downloaded. Elements of list may contain wildcards.
If a MIME type starts with the character '!' it won't be downloaded, this is useful when trying to download
something with exceptions. For example, download everything except images:
wget2 -r https://<site>/<document> --filter-mime-type=*,!image/*
It is also useful to download files that are compatible with an application of your system. For instance,
download every file that is compatible with LibreOffice Writer from a website using the recursive mode:
wget2 -r https://<site>/<document> --filter-mime-type=$(sed -r '/^MimeType=/!d;s/^MimeType=//;s/;/,/g' /usr/share/applications/libreoffice-writer.desktop)
Wget2 has not been released as of today, but will be soon. Debian unstable already has an alpha version shipped.
Look at https://gitlab.com/gnuwget/wget2 for more info. You can post questions/comments directly to bug-wget@gnu.org.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
Wget2 already has this feature :-)
--filter-mime-type Specify a list of mime types to be saved or ignored`
### `--filter-mime-type=list`
Specify a comma-separated list of MIME types that will be downloaded. Elements of list may contain wildcards.
If a MIME type starts with the character '!' it won't be downloaded, this is useful when trying to download
something with exceptions. For example, download everything except images:
wget2 -r https://<site>/<document> --filter-mime-type=*,!image/*
It is also useful to download files that are compatible with an application of your system. For instance,
download every file that is compatible with LibreOffice Writer from a website using the recursive mode:
wget2 -r https://<site>/<document> --filter-mime-type=$(sed -r '/^MimeType=/!d;s/^MimeType=//;s/;/,/g' /usr/share/applications/libreoffice-writer.desktop)
Wget2 has not been released as of today, but will be soon. Debian unstable already has an alpha version shipped.
Look at https://gitlab.com/gnuwget/wget2 for more info. You can post questions/comments directly to bug-wget@gnu.org.
add a comment |
up vote
1
down vote
Wget2 already has this feature :-)
--filter-mime-type Specify a list of mime types to be saved or ignored`
### `--filter-mime-type=list`
Specify a comma-separated list of MIME types that will be downloaded. Elements of list may contain wildcards.
If a MIME type starts with the character '!' it won't be downloaded, this is useful when trying to download
something with exceptions. For example, download everything except images:
wget2 -r https://<site>/<document> --filter-mime-type=*,!image/*
It is also useful to download files that are compatible with an application of your system. For instance,
download every file that is compatible with LibreOffice Writer from a website using the recursive mode:
wget2 -r https://<site>/<document> --filter-mime-type=$(sed -r '/^MimeType=/!d;s/^MimeType=//;s/;/,/g' /usr/share/applications/libreoffice-writer.desktop)
Wget2 has not been released as of today, but will be soon. Debian unstable already has an alpha version shipped.
Look at https://gitlab.com/gnuwget/wget2 for more info. You can post questions/comments directly to bug-wget@gnu.org.
add a comment |
up vote
1
down vote
up vote
1
down vote
Wget2 already has this feature :-)
--filter-mime-type Specify a list of mime types to be saved or ignored`
### `--filter-mime-type=list`
Specify a comma-separated list of MIME types that will be downloaded. Elements of list may contain wildcards.
If a MIME type starts with the character '!' it won't be downloaded, this is useful when trying to download
something with exceptions. For example, download everything except images:
wget2 -r https://<site>/<document> --filter-mime-type=*,!image/*
It is also useful to download files that are compatible with an application of your system. For instance,
download every file that is compatible with LibreOffice Writer from a website using the recursive mode:
wget2 -r https://<site>/<document> --filter-mime-type=$(sed -r '/^MimeType=/!d;s/^MimeType=//;s/;/,/g' /usr/share/applications/libreoffice-writer.desktop)
Wget2 has not been released as of today, but will be soon. Debian unstable already has an alpha version shipped.
Look at https://gitlab.com/gnuwget/wget2 for more info. You can post questions/comments directly to bug-wget@gnu.org.
Wget2 already has this feature :-)
--filter-mime-type Specify a list of mime types to be saved or ignored`
### `--filter-mime-type=list`
Specify a comma-separated list of MIME types that will be downloaded. Elements of list may contain wildcards.
If a MIME type starts with the character '!' it won't be downloaded, this is useful when trying to download
something with exceptions. For example, download everything except images:
wget2 -r https://<site>/<document> --filter-mime-type=*,!image/*
It is also useful to download files that are compatible with an application of your system. For instance,
download every file that is compatible with LibreOffice Writer from a website using the recursive mode:
wget2 -r https://<site>/<document> --filter-mime-type=$(sed -r '/^MimeType=/!d;s/^MimeType=//;s/;/,/g' /usr/share/applications/libreoffice-writer.desktop)
Wget2 has not been released as of today, but will be soon. Debian unstable already has an alpha version shipped.
Look at https://gitlab.com/gnuwget/wget2 for more info. You can post questions/comments directly to bug-wget@gnu.org.
edited Nov 14 at 8:30
answered Nov 14 at 8:24
Tim Ruehsen rockdaboot
1113
1113
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1373665%2fhow-to-fetch-files-by-mime-type-with-wget%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown