wget decides not to load because of black list












1















I'm trying to make a full copy of a web site; e.g.,



http://vfilesarchive.bgmod.com/files/


I'm running



wget -r -level=inf -R "index.html*" --debug http://vfilesarchive.bgmod.com/files/


and getting, for example



Deciding whether to enqueue "http://vfilesarchive.bgmod.com/files/Half-Life%D0%92%D0%86/".
Already on the black list.
Decided NOT to load it.


What is happening? 
What does wget mean by "black list",
why is it downloading only parts of what is there,
and what should I do to get the entire web site?



The version of wget is



GNU Wget 1.20 built on mingw32


(running on Windows 10 x64).



P.S. I think I've managed to solve this with



wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>


even though the filenames are slightly crippled
due to special chars in URLs. 
Is there a better solution?










share|improve this question

























  • Welcome to Super User, and kudos for solving the problem. The site's Q&A format relies on questions being just questions, and solutions being in answer posts. With your clarification, the question has been taken off hold. Please move you solution to an answer (you can answer your own question). Two days after posting the question, you can accept your own answer by clicking the checkmark there. That will indicate that the problem has been solved.

    – fixer1234
    Jan 27 at 20:52











  • @fixer1234: When you posted the above comment,  I was in the process of editing the question into a broader “why?” / “what does it mean?” query.

    – Scott
    Jan 27 at 21:06
















1















I'm trying to make a full copy of a web site; e.g.,



http://vfilesarchive.bgmod.com/files/


I'm running



wget -r -level=inf -R "index.html*" --debug http://vfilesarchive.bgmod.com/files/


and getting, for example



Deciding whether to enqueue "http://vfilesarchive.bgmod.com/files/Half-Life%D0%92%D0%86/".
Already on the black list.
Decided NOT to load it.


What is happening? 
What does wget mean by "black list",
why is it downloading only parts of what is there,
and what should I do to get the entire web site?



The version of wget is



GNU Wget 1.20 built on mingw32


(running on Windows 10 x64).



P.S. I think I've managed to solve this with



wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>


even though the filenames are slightly crippled
due to special chars in URLs. 
Is there a better solution?










share|improve this question

























  • Welcome to Super User, and kudos for solving the problem. The site's Q&A format relies on questions being just questions, and solutions being in answer posts. With your clarification, the question has been taken off hold. Please move you solution to an answer (you can answer your own question). Two days after posting the question, you can accept your own answer by clicking the checkmark there. That will indicate that the problem has been solved.

    – fixer1234
    Jan 27 at 20:52











  • @fixer1234: When you posted the above comment,  I was in the process of editing the question into a broader “why?” / “what does it mean?” query.

    – Scott
    Jan 27 at 21:06














1












1








1








I'm trying to make a full copy of a web site; e.g.,



http://vfilesarchive.bgmod.com/files/


I'm running



wget -r -level=inf -R "index.html*" --debug http://vfilesarchive.bgmod.com/files/


and getting, for example



Deciding whether to enqueue "http://vfilesarchive.bgmod.com/files/Half-Life%D0%92%D0%86/".
Already on the black list.
Decided NOT to load it.


What is happening? 
What does wget mean by "black list",
why is it downloading only parts of what is there,
and what should I do to get the entire web site?



The version of wget is



GNU Wget 1.20 built on mingw32


(running on Windows 10 x64).



P.S. I think I've managed to solve this with



wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>


even though the filenames are slightly crippled
due to special chars in URLs. 
Is there a better solution?










share|improve this question
















I'm trying to make a full copy of a web site; e.g.,



http://vfilesarchive.bgmod.com/files/


I'm running



wget -r -level=inf -R "index.html*" --debug http://vfilesarchive.bgmod.com/files/


and getting, for example



Deciding whether to enqueue "http://vfilesarchive.bgmod.com/files/Half-Life%D0%92%D0%86/".
Already on the black list.
Decided NOT to load it.


What is happening? 
What does wget mean by "black list",
why is it downloading only parts of what is there,
and what should I do to get the entire web site?



The version of wget is



GNU Wget 1.20 built on mingw32


(running on Windows 10 x64).



P.S. I think I've managed to solve this with



wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>


even though the filenames are slightly crippled
due to special chars in URLs. 
Is there a better solution?







download wget web-crawler






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 27 at 21:00









Scott

15.9k113990




15.9k113990










asked Jan 27 at 3:38









McUrgdMcUrgd

93




93













  • Welcome to Super User, and kudos for solving the problem. The site's Q&A format relies on questions being just questions, and solutions being in answer posts. With your clarification, the question has been taken off hold. Please move you solution to an answer (you can answer your own question). Two days after posting the question, you can accept your own answer by clicking the checkmark there. That will indicate that the problem has been solved.

    – fixer1234
    Jan 27 at 20:52











  • @fixer1234: When you posted the above comment,  I was in the process of editing the question into a broader “why?” / “what does it mean?” query.

    – Scott
    Jan 27 at 21:06



















  • Welcome to Super User, and kudos for solving the problem. The site's Q&A format relies on questions being just questions, and solutions being in answer posts. With your clarification, the question has been taken off hold. Please move you solution to an answer (you can answer your own question). Two days after posting the question, you can accept your own answer by clicking the checkmark there. That will indicate that the problem has been solved.

    – fixer1234
    Jan 27 at 20:52











  • @fixer1234: When you posted the above comment,  I was in the process of editing the question into a broader “why?” / “what does it mean?” query.

    – Scott
    Jan 27 at 21:06

















Welcome to Super User, and kudos for solving the problem. The site's Q&A format relies on questions being just questions, and solutions being in answer posts. With your clarification, the question has been taken off hold. Please move you solution to an answer (you can answer your own question). Two days after posting the question, you can accept your own answer by clicking the checkmark there. That will indicate that the problem has been solved.

– fixer1234
Jan 27 at 20:52





Welcome to Super User, and kudos for solving the problem. The site's Q&A format relies on questions being just questions, and solutions being in answer posts. With your clarification, the question has been taken off hold. Please move you solution to an answer (you can answer your own question). Two days after posting the question, you can accept your own answer by clicking the checkmark there. That will indicate that the problem has been solved.

– fixer1234
Jan 27 at 20:52













@fixer1234: When you posted the above comment,  I was in the process of editing the question into a broader “why?” / “what does it mean?” query.

– Scott
Jan 27 at 21:06





@fixer1234: When you posted the above comment,  I was in the process of editing the question into a broader “why?” / “what does it mean?” query.

– Scott
Jan 27 at 21:06










1 Answer
1






active

oldest

votes


















0














I think I've managed to solve this with



wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>


even though the filenames are slightly crippled due to special chars in URLs.






share|improve this answer























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "3"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1398858%2fwget-decides-not-to-load-because-of-black-list%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    I think I've managed to solve this with



    wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>


    even though the filenames are slightly crippled due to special chars in URLs.






    share|improve this answer




























      0














      I think I've managed to solve this with



      wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>


      even though the filenames are slightly crippled due to special chars in URLs.






      share|improve this answer


























        0












        0








        0







        I think I've managed to solve this with



        wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>


        even though the filenames are slightly crippled due to special chars in URLs.






        share|improve this answer













        I think I've managed to solve this with



        wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>


        even though the filenames are slightly crippled due to special chars in URLs.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 28 at 14:30









        McUrgdMcUrgd

        93




        93






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Super User!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1398858%2fwget-decides-not-to-load-because-of-black-list%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Plaza Victoria

            Puebla de Zaragoza

            Musa