wget decides not to load because of black list

I'm trying to make a full copy of a web site; e.g.,

http://vfilesarchive.bgmod.com/files/

I'm running

wget -r -level=inf -R "index.html*" --debug http://vfilesarchive.bgmod.com/files/

and getting, for example

Deciding whether to enqueue "http://vfilesarchive.bgmod.com/files/Half-Life%D0%92%D0%86/".

Already on the black list.

Decided NOT to load it.

What is happening?
What does wget mean by "black list",
why is it downloading only parts of what is there,
and what should I do to get the entire web site?

The version of wget is

GNU Wget 1.20 built on mingw32

(running on Windows 10 x64).

P.S. I think I've managed to solve this with

wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>

even though the filenames are slightly crippled
due to special chars in URLs.
Is there a better solution?

edited Jan 27 at 21:00

Scott

15.9k113990

asked Jan 27 at 3:38

McUrgd

Welcome to Super User, and kudos for solving the problem. The site's Q&A format relies on questions being just questions, and solutions being in answer posts. With your clarification, the question has been taken off hold. Please move you solution to an answer (you can answer your own question). Two days after posting the question, you can accept your own answer by clicking the checkmark there. That will indicate that the problem has been solved.

– fixer1234
Jan 27 at 20:52

@fixer1234: When you posted the above comment, I was in the process of editing the question into a broader “why?” / “what does it mean?” query.

– Scott
Jan 27 at 21:06

add a comment |

I'm trying to make a full copy of a web site; e.g.,

http://vfilesarchive.bgmod.com/files/

I'm running

wget -r -level=inf -R "index.html*" --debug http://vfilesarchive.bgmod.com/files/

and getting, for example

Deciding whether to enqueue "http://vfilesarchive.bgmod.com/files/Half-Life%D0%92%D0%86/".

Already on the black list.

Decided NOT to load it.

What is happening?
What does wget mean by "black list",
why is it downloading only parts of what is there,
and what should I do to get the entire web site?

The version of wget is

GNU Wget 1.20 built on mingw32

(running on Windows 10 x64).

P.S. I think I've managed to solve this with

wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>

even though the filenames are slightly crippled
due to special chars in URLs.
Is there a better solution?

edited Jan 27 at 21:00

Scott

15.9k113990

asked Jan 27 at 3:38

McUrgd

Welcome to Super User, and kudos for solving the problem. The site's Q&A format relies on questions being just questions, and solutions being in answer posts. With your clarification, the question has been taken off hold. Please move you solution to an answer (you can answer your own question). Two days after posting the question, you can accept your own answer by clicking the checkmark there. That will indicate that the problem has been solved.

– fixer1234
Jan 27 at 20:52

@fixer1234: When you posted the above comment, I was in the process of editing the question into a broader “why?” / “what does it mean?” query.

– Scott
Jan 27 at 21:06

add a comment |

I'm trying to make a full copy of a web site; e.g.,

http://vfilesarchive.bgmod.com/files/

I'm running

wget -r -level=inf -R "index.html*" --debug http://vfilesarchive.bgmod.com/files/

and getting, for example

Deciding whether to enqueue "http://vfilesarchive.bgmod.com/files/Half-Life%D0%92%D0%86/".

Already on the black list.

Decided NOT to load it.

What is happening?
What does wget mean by "black list",
why is it downloading only parts of what is there,
and what should I do to get the entire web site?

The version of wget is

GNU Wget 1.20 built on mingw32

(running on Windows 10 x64).

P.S. I think I've managed to solve this with

wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>

even though the filenames are slightly crippled
due to special chars in URLs.
Is there a better solution?

edited Jan 27 at 21:00

Scott

15.9k113990

asked Jan 27 at 3:38

McUrgd

I'm trying to make a full copy of a web site; e.g.,

http://vfilesarchive.bgmod.com/files/

I'm running

wget -r -level=inf -R "index.html*" --debug http://vfilesarchive.bgmod.com/files/

and getting, for example

Deciding whether to enqueue "http://vfilesarchive.bgmod.com/files/Half-Life%D0%92%D0%86/".

Already on the black list.

Decided NOT to load it.

What is happening?
What does wget mean by "black list",
why is it downloading only parts of what is there,
and what should I do to get the entire web site?

The version of wget is

GNU Wget 1.20 built on mingw32

(running on Windows 10 x64).

P.S. I think I've managed to solve this with

wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>

even though the filenames are slightly crippled
due to special chars in URLs.
Is there a better solution?

download wget web-crawler

edited Jan 27 at 21:00

Scott

15.9k113990

asked Jan 27 at 3:38

McUrgd

edited Jan 27 at 21:00

Scott

15.9k113990

asked Jan 27 at 3:38

McUrgd

edited Jan 27 at 21:00

Scott

15.9k113990

edited Jan 27 at 21:00

Scott

15.9k113990

edited Jan 27 at 21:00

Scott

15.9k113990

asked Jan 27 at 3:38

McUrgd

asked Jan 27 at 3:38

McUrgd

asked Jan 27 at 3:38

McUrgd

Welcome to Super User, and kudos for solving the problem. The site's Q&A format relies on questions being just questions, and solutions being in answer posts. With your clarification, the question has been taken off hold. Please move you solution to an answer (you can answer your own question). Two days after posting the question, you can accept your own answer by clicking the checkmark there. That will indicate that the problem has been solved.

– fixer1234
Jan 27 at 20:52

@fixer1234: When you posted the above comment, I was in the process of editing the question into a broader “why?” / “what does it mean?” query.

– Scott
Jan 27 at 21:06

add a comment |

Welcome to Super User, and kudos for solving the problem. The site's Q&A format relies on questions being just questions, and solutions being in answer posts. With your clarification, the question has been taken off hold. Please move you solution to an answer (you can answer your own question). Two days after posting the question, you can accept your own answer by clicking the checkmark there. That will indicate that the problem has been solved.

– fixer1234
Jan 27 at 20:52

@fixer1234: When you posted the above comment, I was in the process of editing the question into a broader “why?” / “what does it mean?” query.

– Scott
Jan 27 at 21:06

Welcome to Super User, and kudos for solving the problem. The site's Q&A format relies on questions being just questions, and solutions being in answer posts. With your clarification, the question has been taken off hold. Please move you solution to an answer (you can answer your own question). Two days after posting the question, you can accept your own answer by clicking the checkmark there. That will indicate that the problem has been solved.

– fixer1234
Jan 27 at 20:52

@fixer1234: When you posted the above comment, I was in the process of editing the question into a broader “why?” / “what does it mean?” query.

– Scott
Jan 27 at 21:06

add a comment |

1 Answer
1

active

oldest

votes

I think I've managed to solve this with

wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>

even though the filenames are slightly crippled due to special chars in URLs.

answered Jan 28 at 14:30

McUrgd

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1398858%2fwget-decides-not-to-load-because-of-black-list%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

I think I've managed to solve this with

wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>

even though the filenames are slightly crippled due to special chars in URLs.

answered Jan 28 at 14:30

McUrgd

add a comment |

I think I've managed to solve this with

wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>

even though the filenames are slightly crippled due to special chars in URLs.

answered Jan 28 at 14:30

McUrgd

add a comment |

I think I've managed to solve this with

wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>

even though the filenames are slightly crippled due to special chars in URLs.

answered Jan 28 at 14:30

McUrgd

I think I've managed to solve this with

wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>

even though the filenames are slightly crippled due to special chars in URLs.

answered Jan 28 at 14:30

McUrgd

answered Jan 28 at 14:30

McUrgd

answered Jan 28 at 14:30

McUrgd

answered Jan 28 at 14:30

McUrgd

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Super User!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

vCAX,aDFp6DtNj

搜尋此網誌

Csdrhrt