wget decides not to load because of black list
I'm trying to make a full copy of a web site; e.g.,
http://vfilesarchive.bgmod.com/files/
I'm running
wget -r -level=inf -R "index.html*" --debug http://vfilesarchive.bgmod.com/files/
and getting, for example
Deciding whether to enqueue "http://vfilesarchive.bgmod.com/files/Half-Life%D0%92%D0%86/".
Already on the black list.
Decided NOT to load it.
What is happening?
What does wget
mean by "black list",
why is it downloading only parts of what is there,
and what should I do to get the entire web site?
The version of wget is
GNU Wget 1.20 built on mingw32
(running on Windows 10 x64).
P.S. I think I've managed to solve this with
wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>
even though the filenames are slightly crippled
due to special chars in URLs.
Is there a better solution?
download wget web-crawler
add a comment |
I'm trying to make a full copy of a web site; e.g.,
http://vfilesarchive.bgmod.com/files/
I'm running
wget -r -level=inf -R "index.html*" --debug http://vfilesarchive.bgmod.com/files/
and getting, for example
Deciding whether to enqueue "http://vfilesarchive.bgmod.com/files/Half-Life%D0%92%D0%86/".
Already on the black list.
Decided NOT to load it.
What is happening?
What does wget
mean by "black list",
why is it downloading only parts of what is there,
and what should I do to get the entire web site?
The version of wget is
GNU Wget 1.20 built on mingw32
(running on Windows 10 x64).
P.S. I think I've managed to solve this with
wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>
even though the filenames are slightly crippled
due to special chars in URLs.
Is there a better solution?
download wget web-crawler
Welcome to Super User, and kudos for solving the problem. The site's Q&A format relies on questions being just questions, and solutions being in answer posts. With your clarification, the question has been taken off hold. Please move you solution to an answer (you can answer your own question). Two days after posting the question, you can accept your own answer by clicking the checkmark there. That will indicate that the problem has been solved.
– fixer1234
Jan 27 at 20:52
@fixer1234: When you posted the above comment, I was in the process of editing the question into a broader “why?” / “what does it mean?” query.
– Scott
Jan 27 at 21:06
add a comment |
I'm trying to make a full copy of a web site; e.g.,
http://vfilesarchive.bgmod.com/files/
I'm running
wget -r -level=inf -R "index.html*" --debug http://vfilesarchive.bgmod.com/files/
and getting, for example
Deciding whether to enqueue "http://vfilesarchive.bgmod.com/files/Half-Life%D0%92%D0%86/".
Already on the black list.
Decided NOT to load it.
What is happening?
What does wget
mean by "black list",
why is it downloading only parts of what is there,
and what should I do to get the entire web site?
The version of wget is
GNU Wget 1.20 built on mingw32
(running on Windows 10 x64).
P.S. I think I've managed to solve this with
wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>
even though the filenames are slightly crippled
due to special chars in URLs.
Is there a better solution?
download wget web-crawler
I'm trying to make a full copy of a web site; e.g.,
http://vfilesarchive.bgmod.com/files/
I'm running
wget -r -level=inf -R "index.html*" --debug http://vfilesarchive.bgmod.com/files/
and getting, for example
Deciding whether to enqueue "http://vfilesarchive.bgmod.com/files/Half-Life%D0%92%D0%86/".
Already on the black list.
Decided NOT to load it.
What is happening?
What does wget
mean by "black list",
why is it downloading only parts of what is there,
and what should I do to get the entire web site?
The version of wget is
GNU Wget 1.20 built on mingw32
(running on Windows 10 x64).
P.S. I think I've managed to solve this with
wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>
even though the filenames are slightly crippled
due to special chars in URLs.
Is there a better solution?
download wget web-crawler
download wget web-crawler
edited Jan 27 at 21:00
Scott
15.9k113990
15.9k113990
asked Jan 27 at 3:38
McUrgdMcUrgd
93
93
Welcome to Super User, and kudos for solving the problem. The site's Q&A format relies on questions being just questions, and solutions being in answer posts. With your clarification, the question has been taken off hold. Please move you solution to an answer (you can answer your own question). Two days after posting the question, you can accept your own answer by clicking the checkmark there. That will indicate that the problem has been solved.
– fixer1234
Jan 27 at 20:52
@fixer1234: When you posted the above comment, I was in the process of editing the question into a broader “why?” / “what does it mean?” query.
– Scott
Jan 27 at 21:06
add a comment |
Welcome to Super User, and kudos for solving the problem. The site's Q&A format relies on questions being just questions, and solutions being in answer posts. With your clarification, the question has been taken off hold. Please move you solution to an answer (you can answer your own question). Two days after posting the question, you can accept your own answer by clicking the checkmark there. That will indicate that the problem has been solved.
– fixer1234
Jan 27 at 20:52
@fixer1234: When you posted the above comment, I was in the process of editing the question into a broader “why?” / “what does it mean?” query.
– Scott
Jan 27 at 21:06
Welcome to Super User, and kudos for solving the problem. The site's Q&A format relies on questions being just questions, and solutions being in answer posts. With your clarification, the question has been taken off hold. Please move you solution to an answer (you can answer your own question). Two days after posting the question, you can accept your own answer by clicking the checkmark there. That will indicate that the problem has been solved.
– fixer1234
Jan 27 at 20:52
Welcome to Super User, and kudos for solving the problem. The site's Q&A format relies on questions being just questions, and solutions being in answer posts. With your clarification, the question has been taken off hold. Please move you solution to an answer (you can answer your own question). Two days after posting the question, you can accept your own answer by clicking the checkmark there. That will indicate that the problem has been solved.
– fixer1234
Jan 27 at 20:52
@fixer1234: When you posted the above comment, I was in the process of editing the question into a broader “why?” / “what does it mean?” query.
– Scott
Jan 27 at 21:06
@fixer1234: When you posted the above comment, I was in the process of editing the question into a broader “why?” / “what does it mean?” query.
– Scott
Jan 27 at 21:06
add a comment |
1 Answer
1
active
oldest
votes
I think I've managed to solve this with
wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>
even though the filenames are slightly crippled due to special chars in URLs.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1398858%2fwget-decides-not-to-load-because-of-black-list%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I think I've managed to solve this with
wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>
even though the filenames are slightly crippled due to special chars in URLs.
add a comment |
I think I've managed to solve this with
wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>
even though the filenames are slightly crippled due to special chars in URLs.
add a comment |
I think I've managed to solve this with
wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>
even though the filenames are slightly crippled due to special chars in URLs.
I think I've managed to solve this with
wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" <target url>
even though the filenames are slightly crippled due to special chars in URLs.
answered Jan 28 at 14:30
McUrgdMcUrgd
93
93
add a comment |
add a comment |
Thanks for contributing an answer to Super User!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1398858%2fwget-decides-not-to-load-because-of-black-list%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Welcome to Super User, and kudos for solving the problem. The site's Q&A format relies on questions being just questions, and solutions being in answer posts. With your clarification, the question has been taken off hold. Please move you solution to an answer (you can answer your own question). Two days after posting the question, you can accept your own answer by clicking the checkmark there. That will indicate that the problem has been solved.
– fixer1234
Jan 27 at 20:52
@fixer1234: When you posted the above comment, I was in the process of editing the question into a broader “why?” / “what does it mean?” query.
– Scott
Jan 27 at 21:06