Hypothesis testing of binomialy distributed data












2












$begingroup$


I haven't done any hypothesis testing for years since I left school and I just wanted to refresh my memory of it.



The hypothesis is stated as following: Assume that average high school student has a dropout rate of 70%. Alternative hypothesis would be that the dropout rate is less than 70%. Since student can only either stay in school or leave we can model that using Binomial distribution, when we sample.



Thus, we have:



Suppose $theta$ is the probability that a student stays in school.
Then



$$H_0 : theta = 0.7 quad text{vs.} quad H_a : theta < 0.7.$$



The test statistic we will use is based on the binomial distribution. $X$ is the number of students in $n$ cases that stayed in school, then $$X mid H_0 sim operatorname{Binomial}(n, theta = 0.7).$$



Then I sample, say, $100$ students and count how many of them actually stayed in school. We observe that $57$ of them stay in school. Then, $p$ value would be
$$p = Pr[X le 57 mid H_0] = 0.00396779$$
Can we use normal approximation in this case? Also, what type of test would I need to use then? Left-sided?



Thanks!










share|cite|improve this question









$endgroup$












  • $begingroup$
    $p(H_0)=0,$ right variant $H_0: theta geq 0.7$.
    $endgroup$
    – Yuri Negometyanov
    Dec 13 '18 at 1:54
















2












$begingroup$


I haven't done any hypothesis testing for years since I left school and I just wanted to refresh my memory of it.



The hypothesis is stated as following: Assume that average high school student has a dropout rate of 70%. Alternative hypothesis would be that the dropout rate is less than 70%. Since student can only either stay in school or leave we can model that using Binomial distribution, when we sample.



Thus, we have:



Suppose $theta$ is the probability that a student stays in school.
Then



$$H_0 : theta = 0.7 quad text{vs.} quad H_a : theta < 0.7.$$



The test statistic we will use is based on the binomial distribution. $X$ is the number of students in $n$ cases that stayed in school, then $$X mid H_0 sim operatorname{Binomial}(n, theta = 0.7).$$



Then I sample, say, $100$ students and count how many of them actually stayed in school. We observe that $57$ of them stay in school. Then, $p$ value would be
$$p = Pr[X le 57 mid H_0] = 0.00396779$$
Can we use normal approximation in this case? Also, what type of test would I need to use then? Left-sided?



Thanks!










share|cite|improve this question









$endgroup$












  • $begingroup$
    $p(H_0)=0,$ right variant $H_0: theta geq 0.7$.
    $endgroup$
    – Yuri Negometyanov
    Dec 13 '18 at 1:54














2












2








2





$begingroup$


I haven't done any hypothesis testing for years since I left school and I just wanted to refresh my memory of it.



The hypothesis is stated as following: Assume that average high school student has a dropout rate of 70%. Alternative hypothesis would be that the dropout rate is less than 70%. Since student can only either stay in school or leave we can model that using Binomial distribution, when we sample.



Thus, we have:



Suppose $theta$ is the probability that a student stays in school.
Then



$$H_0 : theta = 0.7 quad text{vs.} quad H_a : theta < 0.7.$$



The test statistic we will use is based on the binomial distribution. $X$ is the number of students in $n$ cases that stayed in school, then $$X mid H_0 sim operatorname{Binomial}(n, theta = 0.7).$$



Then I sample, say, $100$ students and count how many of them actually stayed in school. We observe that $57$ of them stay in school. Then, $p$ value would be
$$p = Pr[X le 57 mid H_0] = 0.00396779$$
Can we use normal approximation in this case? Also, what type of test would I need to use then? Left-sided?



Thanks!










share|cite|improve this question









$endgroup$




I haven't done any hypothesis testing for years since I left school and I just wanted to refresh my memory of it.



The hypothesis is stated as following: Assume that average high school student has a dropout rate of 70%. Alternative hypothesis would be that the dropout rate is less than 70%. Since student can only either stay in school or leave we can model that using Binomial distribution, when we sample.



Thus, we have:



Suppose $theta$ is the probability that a student stays in school.
Then



$$H_0 : theta = 0.7 quad text{vs.} quad H_a : theta < 0.7.$$



The test statistic we will use is based on the binomial distribution. $X$ is the number of students in $n$ cases that stayed in school, then $$X mid H_0 sim operatorname{Binomial}(n, theta = 0.7).$$



Then I sample, say, $100$ students and count how many of them actually stayed in school. We observe that $57$ of them stay in school. Then, $p$ value would be
$$p = Pr[X le 57 mid H_0] = 0.00396779$$
Can we use normal approximation in this case? Also, what type of test would I need to use then? Left-sided?



Thanks!







probability hypothesis-testing






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Dec 9 '18 at 15:53









ZakkeryZakkery

634




634












  • $begingroup$
    $p(H_0)=0,$ right variant $H_0: theta geq 0.7$.
    $endgroup$
    – Yuri Negometyanov
    Dec 13 '18 at 1:54


















  • $begingroup$
    $p(H_0)=0,$ right variant $H_0: theta geq 0.7$.
    $endgroup$
    – Yuri Negometyanov
    Dec 13 '18 at 1:54
















$begingroup$
$p(H_0)=0,$ right variant $H_0: theta geq 0.7$.
$endgroup$
– Yuri Negometyanov
Dec 13 '18 at 1:54




$begingroup$
$p(H_0)=0,$ right variant $H_0: theta geq 0.7$.
$endgroup$
– Yuri Negometyanov
Dec 13 '18 at 1:54










1 Answer
1






active

oldest

votes


















1





+50







$begingroup$

If we were to apply a normal approximation, we would model X as



$$X|H_{0}simmathcal{N}left(ntheta,nthetaleft(1-thetaright)right)$$



$$X|H_{0}simmathcal{N}(70,21)$$



$$sigma=sqrt{21}$$



Thus, our p-value, under normal approximation, would be $mathbb{P}(Z<frac{left(57-70right)}{sqrt{21}})$



This is a one sided Z-test, and we get a p value of 0.002327, which as you can see is quite a bit lower than yours(but is a nice quick order of magnitude approximation if you're away from a computer). It is well known that normal approximations to binomial give p values that smaller than they should be when we are deep into the tails. With only 100 people, you really don't need to apply a normal approximation






share|cite|improve this answer









$endgroup$













  • $begingroup$
    What if I have, let's say 10000 people? What is an appropriate condition to apply normal approximation?
    $endgroup$
    – Zakkery
    Dec 12 '18 at 19:20










  • $begingroup$
    A stirling approximation would be better, $$left(begin{array}{c} n\ i end{array}right)approxfrac{n^{i}}{i!}$$ $$mathbb{P}left(X<mright)=sum_{i=0}^{m}left(begin{array}{c} n\ i end{array}right)p^{i}left(1-pright)^{n-i} approxsum_{i=0}^{m}frac{n^{k}}{i!}p^{i}left(1-pright)^{n-i}$$ Should be a little bit easier than binomial to compute. A normal won't be bad as long as you aren't too too far in the tail--just keep in mind that if you have something like $p=10^{-6}$, you're probably underestimating the p value by using a normal approximation.
    $endgroup$
    – Alex
    Dec 12 '18 at 19:49












  • $begingroup$
    Also, to answer the question of when the normal approximation is good, it will be good with a sample size of 10,000 up to very deep into the tails, and it is more accurate when $theta$ is closer to 0.5 as the distribution will be more symmetric.
    $endgroup$
    – Alex
    Dec 12 '18 at 23:20











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3032534%2fhypothesis-testing-of-binomialy-distributed-data%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1





+50







$begingroup$

If we were to apply a normal approximation, we would model X as



$$X|H_{0}simmathcal{N}left(ntheta,nthetaleft(1-thetaright)right)$$



$$X|H_{0}simmathcal{N}(70,21)$$



$$sigma=sqrt{21}$$



Thus, our p-value, under normal approximation, would be $mathbb{P}(Z<frac{left(57-70right)}{sqrt{21}})$



This is a one sided Z-test, and we get a p value of 0.002327, which as you can see is quite a bit lower than yours(but is a nice quick order of magnitude approximation if you're away from a computer). It is well known that normal approximations to binomial give p values that smaller than they should be when we are deep into the tails. With only 100 people, you really don't need to apply a normal approximation






share|cite|improve this answer









$endgroup$













  • $begingroup$
    What if I have, let's say 10000 people? What is an appropriate condition to apply normal approximation?
    $endgroup$
    – Zakkery
    Dec 12 '18 at 19:20










  • $begingroup$
    A stirling approximation would be better, $$left(begin{array}{c} n\ i end{array}right)approxfrac{n^{i}}{i!}$$ $$mathbb{P}left(X<mright)=sum_{i=0}^{m}left(begin{array}{c} n\ i end{array}right)p^{i}left(1-pright)^{n-i} approxsum_{i=0}^{m}frac{n^{k}}{i!}p^{i}left(1-pright)^{n-i}$$ Should be a little bit easier than binomial to compute. A normal won't be bad as long as you aren't too too far in the tail--just keep in mind that if you have something like $p=10^{-6}$, you're probably underestimating the p value by using a normal approximation.
    $endgroup$
    – Alex
    Dec 12 '18 at 19:49












  • $begingroup$
    Also, to answer the question of when the normal approximation is good, it will be good with a sample size of 10,000 up to very deep into the tails, and it is more accurate when $theta$ is closer to 0.5 as the distribution will be more symmetric.
    $endgroup$
    – Alex
    Dec 12 '18 at 23:20
















1





+50







$begingroup$

If we were to apply a normal approximation, we would model X as



$$X|H_{0}simmathcal{N}left(ntheta,nthetaleft(1-thetaright)right)$$



$$X|H_{0}simmathcal{N}(70,21)$$



$$sigma=sqrt{21}$$



Thus, our p-value, under normal approximation, would be $mathbb{P}(Z<frac{left(57-70right)}{sqrt{21}})$



This is a one sided Z-test, and we get a p value of 0.002327, which as you can see is quite a bit lower than yours(but is a nice quick order of magnitude approximation if you're away from a computer). It is well known that normal approximations to binomial give p values that smaller than they should be when we are deep into the tails. With only 100 people, you really don't need to apply a normal approximation






share|cite|improve this answer









$endgroup$













  • $begingroup$
    What if I have, let's say 10000 people? What is an appropriate condition to apply normal approximation?
    $endgroup$
    – Zakkery
    Dec 12 '18 at 19:20










  • $begingroup$
    A stirling approximation would be better, $$left(begin{array}{c} n\ i end{array}right)approxfrac{n^{i}}{i!}$$ $$mathbb{P}left(X<mright)=sum_{i=0}^{m}left(begin{array}{c} n\ i end{array}right)p^{i}left(1-pright)^{n-i} approxsum_{i=0}^{m}frac{n^{k}}{i!}p^{i}left(1-pright)^{n-i}$$ Should be a little bit easier than binomial to compute. A normal won't be bad as long as you aren't too too far in the tail--just keep in mind that if you have something like $p=10^{-6}$, you're probably underestimating the p value by using a normal approximation.
    $endgroup$
    – Alex
    Dec 12 '18 at 19:49












  • $begingroup$
    Also, to answer the question of when the normal approximation is good, it will be good with a sample size of 10,000 up to very deep into the tails, and it is more accurate when $theta$ is closer to 0.5 as the distribution will be more symmetric.
    $endgroup$
    – Alex
    Dec 12 '18 at 23:20














1





+50







1





+50



1




+50



$begingroup$

If we were to apply a normal approximation, we would model X as



$$X|H_{0}simmathcal{N}left(ntheta,nthetaleft(1-thetaright)right)$$



$$X|H_{0}simmathcal{N}(70,21)$$



$$sigma=sqrt{21}$$



Thus, our p-value, under normal approximation, would be $mathbb{P}(Z<frac{left(57-70right)}{sqrt{21}})$



This is a one sided Z-test, and we get a p value of 0.002327, which as you can see is quite a bit lower than yours(but is a nice quick order of magnitude approximation if you're away from a computer). It is well known that normal approximations to binomial give p values that smaller than they should be when we are deep into the tails. With only 100 people, you really don't need to apply a normal approximation






share|cite|improve this answer









$endgroup$



If we were to apply a normal approximation, we would model X as



$$X|H_{0}simmathcal{N}left(ntheta,nthetaleft(1-thetaright)right)$$



$$X|H_{0}simmathcal{N}(70,21)$$



$$sigma=sqrt{21}$$



Thus, our p-value, under normal approximation, would be $mathbb{P}(Z<frac{left(57-70right)}{sqrt{21}})$



This is a one sided Z-test, and we get a p value of 0.002327, which as you can see is quite a bit lower than yours(but is a nice quick order of magnitude approximation if you're away from a computer). It is well known that normal approximations to binomial give p values that smaller than they should be when we are deep into the tails. With only 100 people, you really don't need to apply a normal approximation







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered Dec 12 '18 at 17:21









AlexAlex

17713




17713












  • $begingroup$
    What if I have, let's say 10000 people? What is an appropriate condition to apply normal approximation?
    $endgroup$
    – Zakkery
    Dec 12 '18 at 19:20










  • $begingroup$
    A stirling approximation would be better, $$left(begin{array}{c} n\ i end{array}right)approxfrac{n^{i}}{i!}$$ $$mathbb{P}left(X<mright)=sum_{i=0}^{m}left(begin{array}{c} n\ i end{array}right)p^{i}left(1-pright)^{n-i} approxsum_{i=0}^{m}frac{n^{k}}{i!}p^{i}left(1-pright)^{n-i}$$ Should be a little bit easier than binomial to compute. A normal won't be bad as long as you aren't too too far in the tail--just keep in mind that if you have something like $p=10^{-6}$, you're probably underestimating the p value by using a normal approximation.
    $endgroup$
    – Alex
    Dec 12 '18 at 19:49












  • $begingroup$
    Also, to answer the question of when the normal approximation is good, it will be good with a sample size of 10,000 up to very deep into the tails, and it is more accurate when $theta$ is closer to 0.5 as the distribution will be more symmetric.
    $endgroup$
    – Alex
    Dec 12 '18 at 23:20


















  • $begingroup$
    What if I have, let's say 10000 people? What is an appropriate condition to apply normal approximation?
    $endgroup$
    – Zakkery
    Dec 12 '18 at 19:20










  • $begingroup$
    A stirling approximation would be better, $$left(begin{array}{c} n\ i end{array}right)approxfrac{n^{i}}{i!}$$ $$mathbb{P}left(X<mright)=sum_{i=0}^{m}left(begin{array}{c} n\ i end{array}right)p^{i}left(1-pright)^{n-i} approxsum_{i=0}^{m}frac{n^{k}}{i!}p^{i}left(1-pright)^{n-i}$$ Should be a little bit easier than binomial to compute. A normal won't be bad as long as you aren't too too far in the tail--just keep in mind that if you have something like $p=10^{-6}$, you're probably underestimating the p value by using a normal approximation.
    $endgroup$
    – Alex
    Dec 12 '18 at 19:49












  • $begingroup$
    Also, to answer the question of when the normal approximation is good, it will be good with a sample size of 10,000 up to very deep into the tails, and it is more accurate when $theta$ is closer to 0.5 as the distribution will be more symmetric.
    $endgroup$
    – Alex
    Dec 12 '18 at 23:20
















$begingroup$
What if I have, let's say 10000 people? What is an appropriate condition to apply normal approximation?
$endgroup$
– Zakkery
Dec 12 '18 at 19:20




$begingroup$
What if I have, let's say 10000 people? What is an appropriate condition to apply normal approximation?
$endgroup$
– Zakkery
Dec 12 '18 at 19:20












$begingroup$
A stirling approximation would be better, $$left(begin{array}{c} n\ i end{array}right)approxfrac{n^{i}}{i!}$$ $$mathbb{P}left(X<mright)=sum_{i=0}^{m}left(begin{array}{c} n\ i end{array}right)p^{i}left(1-pright)^{n-i} approxsum_{i=0}^{m}frac{n^{k}}{i!}p^{i}left(1-pright)^{n-i}$$ Should be a little bit easier than binomial to compute. A normal won't be bad as long as you aren't too too far in the tail--just keep in mind that if you have something like $p=10^{-6}$, you're probably underestimating the p value by using a normal approximation.
$endgroup$
– Alex
Dec 12 '18 at 19:49






$begingroup$
A stirling approximation would be better, $$left(begin{array}{c} n\ i end{array}right)approxfrac{n^{i}}{i!}$$ $$mathbb{P}left(X<mright)=sum_{i=0}^{m}left(begin{array}{c} n\ i end{array}right)p^{i}left(1-pright)^{n-i} approxsum_{i=0}^{m}frac{n^{k}}{i!}p^{i}left(1-pright)^{n-i}$$ Should be a little bit easier than binomial to compute. A normal won't be bad as long as you aren't too too far in the tail--just keep in mind that if you have something like $p=10^{-6}$, you're probably underestimating the p value by using a normal approximation.
$endgroup$
– Alex
Dec 12 '18 at 19:49














$begingroup$
Also, to answer the question of when the normal approximation is good, it will be good with a sample size of 10,000 up to very deep into the tails, and it is more accurate when $theta$ is closer to 0.5 as the distribution will be more symmetric.
$endgroup$
– Alex
Dec 12 '18 at 23:20




$begingroup$
Also, to answer the question of when the normal approximation is good, it will be good with a sample size of 10,000 up to very deep into the tails, and it is more accurate when $theta$ is closer to 0.5 as the distribution will be more symmetric.
$endgroup$
– Alex
Dec 12 '18 at 23:20


















draft saved

draft discarded




















































Thanks for contributing an answer to Mathematics Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3032534%2fhypothesis-testing-of-binomialy-distributed-data%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Plaza Victoria

In PowerPoint, is there a keyboard shortcut for bulleted / numbered list?

How to put 3 figures in Latex with 2 figures side by side and 1 below these side by side images but in...