Hypothesis testing of binomialy distributed data
$begingroup$
I haven't done any hypothesis testing for years since I left school and I just wanted to refresh my memory of it.
The hypothesis is stated as following: Assume that average high school student has a dropout rate of 70%. Alternative hypothesis would be that the dropout rate is less than 70%. Since student can only either stay in school or leave we can model that using Binomial distribution, when we sample.
Thus, we have:
Suppose $theta$ is the probability that a student stays in school.
Then
$$H_0 : theta = 0.7 quad text{vs.} quad H_a : theta < 0.7.$$
The test statistic we will use is based on the binomial distribution. $X$ is the number of students in $n$ cases that stayed in school, then $$X mid H_0 sim operatorname{Binomial}(n, theta = 0.7).$$
Then I sample, say, $100$ students and count how many of them actually stayed in school. We observe that $57$ of them stay in school. Then, $p$ value would be
$$p = Pr[X le 57 mid H_0] = 0.00396779$$
Can we use normal approximation in this case? Also, what type of test would I need to use then? Left-sided?
Thanks!
probability hypothesis-testing
$endgroup$
add a comment |
$begingroup$
I haven't done any hypothesis testing for years since I left school and I just wanted to refresh my memory of it.
The hypothesis is stated as following: Assume that average high school student has a dropout rate of 70%. Alternative hypothesis would be that the dropout rate is less than 70%. Since student can only either stay in school or leave we can model that using Binomial distribution, when we sample.
Thus, we have:
Suppose $theta$ is the probability that a student stays in school.
Then
$$H_0 : theta = 0.7 quad text{vs.} quad H_a : theta < 0.7.$$
The test statistic we will use is based on the binomial distribution. $X$ is the number of students in $n$ cases that stayed in school, then $$X mid H_0 sim operatorname{Binomial}(n, theta = 0.7).$$
Then I sample, say, $100$ students and count how many of them actually stayed in school. We observe that $57$ of them stay in school. Then, $p$ value would be
$$p = Pr[X le 57 mid H_0] = 0.00396779$$
Can we use normal approximation in this case? Also, what type of test would I need to use then? Left-sided?
Thanks!
probability hypothesis-testing
$endgroup$
$begingroup$
$p(H_0)=0,$ right variant $H_0: theta geq 0.7$.
$endgroup$
– Yuri Negometyanov
Dec 13 '18 at 1:54
add a comment |
$begingroup$
I haven't done any hypothesis testing for years since I left school and I just wanted to refresh my memory of it.
The hypothesis is stated as following: Assume that average high school student has a dropout rate of 70%. Alternative hypothesis would be that the dropout rate is less than 70%. Since student can only either stay in school or leave we can model that using Binomial distribution, when we sample.
Thus, we have:
Suppose $theta$ is the probability that a student stays in school.
Then
$$H_0 : theta = 0.7 quad text{vs.} quad H_a : theta < 0.7.$$
The test statistic we will use is based on the binomial distribution. $X$ is the number of students in $n$ cases that stayed in school, then $$X mid H_0 sim operatorname{Binomial}(n, theta = 0.7).$$
Then I sample, say, $100$ students and count how many of them actually stayed in school. We observe that $57$ of them stay in school. Then, $p$ value would be
$$p = Pr[X le 57 mid H_0] = 0.00396779$$
Can we use normal approximation in this case? Also, what type of test would I need to use then? Left-sided?
Thanks!
probability hypothesis-testing
$endgroup$
I haven't done any hypothesis testing for years since I left school and I just wanted to refresh my memory of it.
The hypothesis is stated as following: Assume that average high school student has a dropout rate of 70%. Alternative hypothesis would be that the dropout rate is less than 70%. Since student can only either stay in school or leave we can model that using Binomial distribution, when we sample.
Thus, we have:
Suppose $theta$ is the probability that a student stays in school.
Then
$$H_0 : theta = 0.7 quad text{vs.} quad H_a : theta < 0.7.$$
The test statistic we will use is based on the binomial distribution. $X$ is the number of students in $n$ cases that stayed in school, then $$X mid H_0 sim operatorname{Binomial}(n, theta = 0.7).$$
Then I sample, say, $100$ students and count how many of them actually stayed in school. We observe that $57$ of them stay in school. Then, $p$ value would be
$$p = Pr[X le 57 mid H_0] = 0.00396779$$
Can we use normal approximation in this case? Also, what type of test would I need to use then? Left-sided?
Thanks!
probability hypothesis-testing
probability hypothesis-testing
asked Dec 9 '18 at 15:53
ZakkeryZakkery
634
634
$begingroup$
$p(H_0)=0,$ right variant $H_0: theta geq 0.7$.
$endgroup$
– Yuri Negometyanov
Dec 13 '18 at 1:54
add a comment |
$begingroup$
$p(H_0)=0,$ right variant $H_0: theta geq 0.7$.
$endgroup$
– Yuri Negometyanov
Dec 13 '18 at 1:54
$begingroup$
$p(H_0)=0,$ right variant $H_0: theta geq 0.7$.
$endgroup$
– Yuri Negometyanov
Dec 13 '18 at 1:54
$begingroup$
$p(H_0)=0,$ right variant $H_0: theta geq 0.7$.
$endgroup$
– Yuri Negometyanov
Dec 13 '18 at 1:54
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
If we were to apply a normal approximation, we would model X as
$$X|H_{0}simmathcal{N}left(ntheta,nthetaleft(1-thetaright)right)$$
$$X|H_{0}simmathcal{N}(70,21)$$
$$sigma=sqrt{21}$$
Thus, our p-value, under normal approximation, would be $mathbb{P}(Z<frac{left(57-70right)}{sqrt{21}})$
This is a one sided Z-test, and we get a p value of 0.002327, which as you can see is quite a bit lower than yours(but is a nice quick order of magnitude approximation if you're away from a computer). It is well known that normal approximations to binomial give p values that smaller than they should be when we are deep into the tails. With only 100 people, you really don't need to apply a normal approximation
$endgroup$
$begingroup$
What if I have, let's say 10000 people? What is an appropriate condition to apply normal approximation?
$endgroup$
– Zakkery
Dec 12 '18 at 19:20
$begingroup$
A stirling approximation would be better, $$left(begin{array}{c} n\ i end{array}right)approxfrac{n^{i}}{i!}$$ $$mathbb{P}left(X<mright)=sum_{i=0}^{m}left(begin{array}{c} n\ i end{array}right)p^{i}left(1-pright)^{n-i} approxsum_{i=0}^{m}frac{n^{k}}{i!}p^{i}left(1-pright)^{n-i}$$ Should be a little bit easier than binomial to compute. A normal won't be bad as long as you aren't too too far in the tail--just keep in mind that if you have something like $p=10^{-6}$, you're probably underestimating the p value by using a normal approximation.
$endgroup$
– Alex
Dec 12 '18 at 19:49
$begingroup$
Also, to answer the question of when the normal approximation is good, it will be good with a sample size of 10,000 up to very deep into the tails, and it is more accurate when $theta$ is closer to 0.5 as the distribution will be more symmetric.
$endgroup$
– Alex
Dec 12 '18 at 23:20
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3032534%2fhypothesis-testing-of-binomialy-distributed-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
If we were to apply a normal approximation, we would model X as
$$X|H_{0}simmathcal{N}left(ntheta,nthetaleft(1-thetaright)right)$$
$$X|H_{0}simmathcal{N}(70,21)$$
$$sigma=sqrt{21}$$
Thus, our p-value, under normal approximation, would be $mathbb{P}(Z<frac{left(57-70right)}{sqrt{21}})$
This is a one sided Z-test, and we get a p value of 0.002327, which as you can see is quite a bit lower than yours(but is a nice quick order of magnitude approximation if you're away from a computer). It is well known that normal approximations to binomial give p values that smaller than they should be when we are deep into the tails. With only 100 people, you really don't need to apply a normal approximation
$endgroup$
$begingroup$
What if I have, let's say 10000 people? What is an appropriate condition to apply normal approximation?
$endgroup$
– Zakkery
Dec 12 '18 at 19:20
$begingroup$
A stirling approximation would be better, $$left(begin{array}{c} n\ i end{array}right)approxfrac{n^{i}}{i!}$$ $$mathbb{P}left(X<mright)=sum_{i=0}^{m}left(begin{array}{c} n\ i end{array}right)p^{i}left(1-pright)^{n-i} approxsum_{i=0}^{m}frac{n^{k}}{i!}p^{i}left(1-pright)^{n-i}$$ Should be a little bit easier than binomial to compute. A normal won't be bad as long as you aren't too too far in the tail--just keep in mind that if you have something like $p=10^{-6}$, you're probably underestimating the p value by using a normal approximation.
$endgroup$
– Alex
Dec 12 '18 at 19:49
$begingroup$
Also, to answer the question of when the normal approximation is good, it will be good with a sample size of 10,000 up to very deep into the tails, and it is more accurate when $theta$ is closer to 0.5 as the distribution will be more symmetric.
$endgroup$
– Alex
Dec 12 '18 at 23:20
add a comment |
$begingroup$
If we were to apply a normal approximation, we would model X as
$$X|H_{0}simmathcal{N}left(ntheta,nthetaleft(1-thetaright)right)$$
$$X|H_{0}simmathcal{N}(70,21)$$
$$sigma=sqrt{21}$$
Thus, our p-value, under normal approximation, would be $mathbb{P}(Z<frac{left(57-70right)}{sqrt{21}})$
This is a one sided Z-test, and we get a p value of 0.002327, which as you can see is quite a bit lower than yours(but is a nice quick order of magnitude approximation if you're away from a computer). It is well known that normal approximations to binomial give p values that smaller than they should be when we are deep into the tails. With only 100 people, you really don't need to apply a normal approximation
$endgroup$
$begingroup$
What if I have, let's say 10000 people? What is an appropriate condition to apply normal approximation?
$endgroup$
– Zakkery
Dec 12 '18 at 19:20
$begingroup$
A stirling approximation would be better, $$left(begin{array}{c} n\ i end{array}right)approxfrac{n^{i}}{i!}$$ $$mathbb{P}left(X<mright)=sum_{i=0}^{m}left(begin{array}{c} n\ i end{array}right)p^{i}left(1-pright)^{n-i} approxsum_{i=0}^{m}frac{n^{k}}{i!}p^{i}left(1-pright)^{n-i}$$ Should be a little bit easier than binomial to compute. A normal won't be bad as long as you aren't too too far in the tail--just keep in mind that if you have something like $p=10^{-6}$, you're probably underestimating the p value by using a normal approximation.
$endgroup$
– Alex
Dec 12 '18 at 19:49
$begingroup$
Also, to answer the question of when the normal approximation is good, it will be good with a sample size of 10,000 up to very deep into the tails, and it is more accurate when $theta$ is closer to 0.5 as the distribution will be more symmetric.
$endgroup$
– Alex
Dec 12 '18 at 23:20
add a comment |
$begingroup$
If we were to apply a normal approximation, we would model X as
$$X|H_{0}simmathcal{N}left(ntheta,nthetaleft(1-thetaright)right)$$
$$X|H_{0}simmathcal{N}(70,21)$$
$$sigma=sqrt{21}$$
Thus, our p-value, under normal approximation, would be $mathbb{P}(Z<frac{left(57-70right)}{sqrt{21}})$
This is a one sided Z-test, and we get a p value of 0.002327, which as you can see is quite a bit lower than yours(but is a nice quick order of magnitude approximation if you're away from a computer). It is well known that normal approximations to binomial give p values that smaller than they should be when we are deep into the tails. With only 100 people, you really don't need to apply a normal approximation
$endgroup$
If we were to apply a normal approximation, we would model X as
$$X|H_{0}simmathcal{N}left(ntheta,nthetaleft(1-thetaright)right)$$
$$X|H_{0}simmathcal{N}(70,21)$$
$$sigma=sqrt{21}$$
Thus, our p-value, under normal approximation, would be $mathbb{P}(Z<frac{left(57-70right)}{sqrt{21}})$
This is a one sided Z-test, and we get a p value of 0.002327, which as you can see is quite a bit lower than yours(but is a nice quick order of magnitude approximation if you're away from a computer). It is well known that normal approximations to binomial give p values that smaller than they should be when we are deep into the tails. With only 100 people, you really don't need to apply a normal approximation
answered Dec 12 '18 at 17:21
AlexAlex
17713
17713
$begingroup$
What if I have, let's say 10000 people? What is an appropriate condition to apply normal approximation?
$endgroup$
– Zakkery
Dec 12 '18 at 19:20
$begingroup$
A stirling approximation would be better, $$left(begin{array}{c} n\ i end{array}right)approxfrac{n^{i}}{i!}$$ $$mathbb{P}left(X<mright)=sum_{i=0}^{m}left(begin{array}{c} n\ i end{array}right)p^{i}left(1-pright)^{n-i} approxsum_{i=0}^{m}frac{n^{k}}{i!}p^{i}left(1-pright)^{n-i}$$ Should be a little bit easier than binomial to compute. A normal won't be bad as long as you aren't too too far in the tail--just keep in mind that if you have something like $p=10^{-6}$, you're probably underestimating the p value by using a normal approximation.
$endgroup$
– Alex
Dec 12 '18 at 19:49
$begingroup$
Also, to answer the question of when the normal approximation is good, it will be good with a sample size of 10,000 up to very deep into the tails, and it is more accurate when $theta$ is closer to 0.5 as the distribution will be more symmetric.
$endgroup$
– Alex
Dec 12 '18 at 23:20
add a comment |
$begingroup$
What if I have, let's say 10000 people? What is an appropriate condition to apply normal approximation?
$endgroup$
– Zakkery
Dec 12 '18 at 19:20
$begingroup$
A stirling approximation would be better, $$left(begin{array}{c} n\ i end{array}right)approxfrac{n^{i}}{i!}$$ $$mathbb{P}left(X<mright)=sum_{i=0}^{m}left(begin{array}{c} n\ i end{array}right)p^{i}left(1-pright)^{n-i} approxsum_{i=0}^{m}frac{n^{k}}{i!}p^{i}left(1-pright)^{n-i}$$ Should be a little bit easier than binomial to compute. A normal won't be bad as long as you aren't too too far in the tail--just keep in mind that if you have something like $p=10^{-6}$, you're probably underestimating the p value by using a normal approximation.
$endgroup$
– Alex
Dec 12 '18 at 19:49
$begingroup$
Also, to answer the question of when the normal approximation is good, it will be good with a sample size of 10,000 up to very deep into the tails, and it is more accurate when $theta$ is closer to 0.5 as the distribution will be more symmetric.
$endgroup$
– Alex
Dec 12 '18 at 23:20
$begingroup$
What if I have, let's say 10000 people? What is an appropriate condition to apply normal approximation?
$endgroup$
– Zakkery
Dec 12 '18 at 19:20
$begingroup$
What if I have, let's say 10000 people? What is an appropriate condition to apply normal approximation?
$endgroup$
– Zakkery
Dec 12 '18 at 19:20
$begingroup$
A stirling approximation would be better, $$left(begin{array}{c} n\ i end{array}right)approxfrac{n^{i}}{i!}$$ $$mathbb{P}left(X<mright)=sum_{i=0}^{m}left(begin{array}{c} n\ i end{array}right)p^{i}left(1-pright)^{n-i} approxsum_{i=0}^{m}frac{n^{k}}{i!}p^{i}left(1-pright)^{n-i}$$ Should be a little bit easier than binomial to compute. A normal won't be bad as long as you aren't too too far in the tail--just keep in mind that if you have something like $p=10^{-6}$, you're probably underestimating the p value by using a normal approximation.
$endgroup$
– Alex
Dec 12 '18 at 19:49
$begingroup$
A stirling approximation would be better, $$left(begin{array}{c} n\ i end{array}right)approxfrac{n^{i}}{i!}$$ $$mathbb{P}left(X<mright)=sum_{i=0}^{m}left(begin{array}{c} n\ i end{array}right)p^{i}left(1-pright)^{n-i} approxsum_{i=0}^{m}frac{n^{k}}{i!}p^{i}left(1-pright)^{n-i}$$ Should be a little bit easier than binomial to compute. A normal won't be bad as long as you aren't too too far in the tail--just keep in mind that if you have something like $p=10^{-6}$, you're probably underestimating the p value by using a normal approximation.
$endgroup$
– Alex
Dec 12 '18 at 19:49
$begingroup$
Also, to answer the question of when the normal approximation is good, it will be good with a sample size of 10,000 up to very deep into the tails, and it is more accurate when $theta$ is closer to 0.5 as the distribution will be more symmetric.
$endgroup$
– Alex
Dec 12 '18 at 23:20
$begingroup$
Also, to answer the question of when the normal approximation is good, it will be good with a sample size of 10,000 up to very deep into the tails, and it is more accurate when $theta$ is closer to 0.5 as the distribution will be more symmetric.
$endgroup$
– Alex
Dec 12 '18 at 23:20
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3032534%2fhypothesis-testing-of-binomialy-distributed-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
$p(H_0)=0,$ right variant $H_0: theta geq 0.7$.
$endgroup$
– Yuri Negometyanov
Dec 13 '18 at 1:54