How to fetch files by mime type with wget











up vote
0
down vote

favorite












Some URLs are like this:



/foo/bar


In that, they don't have an extension like this:



/foo/bar.txt


If there is an extension it's easy:



wget -r -A .txt http://asdf.com


But if there isn't, then I'm not sure how to fetch the files. Basically, there are some files like PDFs or other things that are at a path like /0du8qj8quqjc9 with no extension, or maybe even /download.php?pdf=124u0cje8u. The question is how to download these files only if it matches a mime-type. So for example something like:



wget -r --accept-mime text/plain,application/pdf http://asdf.com


Wondering if there's anything to do like that.










share|improve this question


























    up vote
    0
    down vote

    favorite












    Some URLs are like this:



    /foo/bar


    In that, they don't have an extension like this:



    /foo/bar.txt


    If there is an extension it's easy:



    wget -r -A .txt http://asdf.com


    But if there isn't, then I'm not sure how to fetch the files. Basically, there are some files like PDFs or other things that are at a path like /0du8qj8quqjc9 with no extension, or maybe even /download.php?pdf=124u0cje8u. The question is how to download these files only if it matches a mime-type. So for example something like:



    wget -r --accept-mime text/plain,application/pdf http://asdf.com


    Wondering if there's anything to do like that.










    share|improve this question
























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      Some URLs are like this:



      /foo/bar


      In that, they don't have an extension like this:



      /foo/bar.txt


      If there is an extension it's easy:



      wget -r -A .txt http://asdf.com


      But if there isn't, then I'm not sure how to fetch the files. Basically, there are some files like PDFs or other things that are at a path like /0du8qj8quqjc9 with no extension, or maybe even /download.php?pdf=124u0cje8u. The question is how to download these files only if it matches a mime-type. So for example something like:



      wget -r --accept-mime text/plain,application/pdf http://asdf.com


      Wondering if there's anything to do like that.










      share|improve this question













      Some URLs are like this:



      /foo/bar


      In that, they don't have an extension like this:



      /foo/bar.txt


      If there is an extension it's easy:



      wget -r -A .txt http://asdf.com


      But if there isn't, then I'm not sure how to fetch the files. Basically, there are some files like PDFs or other things that are at a path like /0du8qj8quqjc9 with no extension, or maybe even /download.php?pdf=124u0cje8u. The question is how to download these files only if it matches a mime-type. So for example something like:



      wget -r --accept-mime text/plain,application/pdf http://asdf.com


      Wondering if there's anything to do like that.







      wget mime-types






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 8 at 3:21









      Lance Pollard

      2911310




      2911310






















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote













          Wget2 already has this feature :-)



          --filter-mime-type    Specify a list of mime types to be saved or ignored`

          ### `--filter-mime-type=list`

          Specify a comma-separated list of MIME types that will be downloaded. Elements of list may contain wildcards.
          If a MIME type starts with the character '!' it won't be downloaded, this is useful when trying to download
          something with exceptions. For example, download everything except images:

          wget2 -r https://<site>/<document> --filter-mime-type=*,!image/*

          It is also useful to download files that are compatible with an application of your system. For instance,
          download every file that is compatible with LibreOffice Writer from a website using the recursive mode:

          wget2 -r https://<site>/<document> --filter-mime-type=$(sed -r '/^MimeType=/!d;s/^MimeType=//;s/;/,/g' /usr/share/applications/libreoffice-writer.desktop)


          Wget2 has not been released as of today, but will be soon. Debian unstable already has an alpha version shipped.



          Look at https://gitlab.com/gnuwget/wget2 for more info. You can post questions/comments directly to bug-wget@gnu.org.






          share|improve this answer























            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "3"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














             

            draft saved


            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1373665%2fhow-to-fetch-files-by-mime-type-with-wget%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote













            Wget2 already has this feature :-)



            --filter-mime-type    Specify a list of mime types to be saved or ignored`

            ### `--filter-mime-type=list`

            Specify a comma-separated list of MIME types that will be downloaded. Elements of list may contain wildcards.
            If a MIME type starts with the character '!' it won't be downloaded, this is useful when trying to download
            something with exceptions. For example, download everything except images:

            wget2 -r https://<site>/<document> --filter-mime-type=*,!image/*

            It is also useful to download files that are compatible with an application of your system. For instance,
            download every file that is compatible with LibreOffice Writer from a website using the recursive mode:

            wget2 -r https://<site>/<document> --filter-mime-type=$(sed -r '/^MimeType=/!d;s/^MimeType=//;s/;/,/g' /usr/share/applications/libreoffice-writer.desktop)


            Wget2 has not been released as of today, but will be soon. Debian unstable already has an alpha version shipped.



            Look at https://gitlab.com/gnuwget/wget2 for more info. You can post questions/comments directly to bug-wget@gnu.org.






            share|improve this answer



























              up vote
              1
              down vote













              Wget2 already has this feature :-)



              --filter-mime-type    Specify a list of mime types to be saved or ignored`

              ### `--filter-mime-type=list`

              Specify a comma-separated list of MIME types that will be downloaded. Elements of list may contain wildcards.
              If a MIME type starts with the character '!' it won't be downloaded, this is useful when trying to download
              something with exceptions. For example, download everything except images:

              wget2 -r https://<site>/<document> --filter-mime-type=*,!image/*

              It is also useful to download files that are compatible with an application of your system. For instance,
              download every file that is compatible with LibreOffice Writer from a website using the recursive mode:

              wget2 -r https://<site>/<document> --filter-mime-type=$(sed -r '/^MimeType=/!d;s/^MimeType=//;s/;/,/g' /usr/share/applications/libreoffice-writer.desktop)


              Wget2 has not been released as of today, but will be soon. Debian unstable already has an alpha version shipped.



              Look at https://gitlab.com/gnuwget/wget2 for more info. You can post questions/comments directly to bug-wget@gnu.org.






              share|improve this answer

























                up vote
                1
                down vote










                up vote
                1
                down vote









                Wget2 already has this feature :-)



                --filter-mime-type    Specify a list of mime types to be saved or ignored`

                ### `--filter-mime-type=list`

                Specify a comma-separated list of MIME types that will be downloaded. Elements of list may contain wildcards.
                If a MIME type starts with the character '!' it won't be downloaded, this is useful when trying to download
                something with exceptions. For example, download everything except images:

                wget2 -r https://<site>/<document> --filter-mime-type=*,!image/*

                It is also useful to download files that are compatible with an application of your system. For instance,
                download every file that is compatible with LibreOffice Writer from a website using the recursive mode:

                wget2 -r https://<site>/<document> --filter-mime-type=$(sed -r '/^MimeType=/!d;s/^MimeType=//;s/;/,/g' /usr/share/applications/libreoffice-writer.desktop)


                Wget2 has not been released as of today, but will be soon. Debian unstable already has an alpha version shipped.



                Look at https://gitlab.com/gnuwget/wget2 for more info. You can post questions/comments directly to bug-wget@gnu.org.






                share|improve this answer














                Wget2 already has this feature :-)



                --filter-mime-type    Specify a list of mime types to be saved or ignored`

                ### `--filter-mime-type=list`

                Specify a comma-separated list of MIME types that will be downloaded. Elements of list may contain wildcards.
                If a MIME type starts with the character '!' it won't be downloaded, this is useful when trying to download
                something with exceptions. For example, download everything except images:

                wget2 -r https://<site>/<document> --filter-mime-type=*,!image/*

                It is also useful to download files that are compatible with an application of your system. For instance,
                download every file that is compatible with LibreOffice Writer from a website using the recursive mode:

                wget2 -r https://<site>/<document> --filter-mime-type=$(sed -r '/^MimeType=/!d;s/^MimeType=//;s/;/,/g' /usr/share/applications/libreoffice-writer.desktop)


                Wget2 has not been released as of today, but will be soon. Debian unstable already has an alpha version shipped.



                Look at https://gitlab.com/gnuwget/wget2 for more info. You can post questions/comments directly to bug-wget@gnu.org.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 14 at 8:30

























                answered Nov 14 at 8:24









                Tim Ruehsen rockdaboot

                1113




                1113






























                     

                    draft saved


                    draft discarded



















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1373665%2fhow-to-fetch-files-by-mime-type-with-wget%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Plaza Victoria

                    Puebla de Zaragoza

                    Musa