Hypothesis testing of binomialy distributed data

I haven't done any hypothesis testing for years since I left school and I just wanted to refresh my memory of it.

The hypothesis is stated as following: Assume that average high school student has a dropout rate of 70%. Alternative hypothesis would be that the dropout rate is less than 70%. Since student can only either stay in school or leave we can model that using Binomial distribution, when we sample.

Thus, we have:

Suppose $theta$ is the probability that a student stays in school.
Then

$$H_0 : theta = 0.7 quad text{vs.} quad H_a : theta < 0.7.$$

The test statistic we will use is based on the binomial distribution. $X$ is the number of students in $n$ cases that stayed in school, then $$X mid H_0 sim operatorname{Binomial}(n, theta = 0.7).$$

Then I sample, say, $100$ students and count how many of them actually stayed in school. We observe that $57$ of them stay in school. Then, $p$ value would be
$$p = Pr[X le 57 mid H_0] = 0.00396779$$
Can we use normal approximation in this case? Also, what type of test would I need to use then? Left-sided?

Thanks!

asked Dec 9 '18 at 15:53

Zakkery

634

$begingroup$
$p(H_0)=0,$ right variant $H_0: theta geq 0.7$.
$endgroup$
– Yuri Negometyanov
Dec 13 '18 at 1:54

add a comment |

I haven't done any hypothesis testing for years since I left school and I just wanted to refresh my memory of it.

Thus, we have:

Suppose $theta$ is the probability that a student stays in school.
Then

$$H_0 : theta = 0.7 quad text{vs.} quad H_a : theta < 0.7.$$

The test statistic we will use is based on the binomial distribution. $X$ is the number of students in $n$ cases that stayed in school, then $$X mid H_0 sim operatorname{Binomial}(n, theta = 0.7).$$

Thanks!

asked Dec 9 '18 at 15:53

Zakkery

634

$begingroup$
$p(H_0)=0,$ right variant $H_0: theta geq 0.7$.
$endgroup$
– Yuri Negometyanov
Dec 13 '18 at 1:54

add a comment |

I haven't done any hypothesis testing for years since I left school and I just wanted to refresh my memory of it.

Thus, we have:

Suppose $theta$ is the probability that a student stays in school.
Then

$$H_0 : theta = 0.7 quad text{vs.} quad H_a : theta < 0.7.$$

The test statistic we will use is based on the binomial distribution. $X$ is the number of students in $n$ cases that stayed in school, then $$X mid H_0 sim operatorname{Binomial}(n, theta = 0.7).$$

Thanks!

asked Dec 9 '18 at 15:53

Zakkery

634

I haven't done any hypothesis testing for years since I left school and I just wanted to refresh my memory of it.

Thus, we have:

Suppose $theta$ is the probability that a student stays in school.
Then

$$H_0 : theta = 0.7 quad text{vs.} quad H_a : theta < 0.7.$$

The test statistic we will use is based on the binomial distribution. $X$ is the number of students in $n$ cases that stayed in school, then $$X mid H_0 sim operatorname{Binomial}(n, theta = 0.7).$$

Thanks!

probability hypothesis-testing

asked Dec 9 '18 at 15:53

Zakkery

634

asked Dec 9 '18 at 15:53

Zakkery

634

asked Dec 9 '18 at 15:53

Zakkery

634

asked Dec 9 '18 at 15:53

Zakkery

634

asked Dec 9 '18 at 15:53

Zakkery

634

$begingroup$
$p(H_0)=0,$ right variant $H_0: theta geq 0.7$.
$endgroup$
– Yuri Negometyanov
Dec 13 '18 at 1:54

add a comment |

$begingroup$
$p(H_0)=0,$ right variant $H_0: theta geq 0.7$.
$endgroup$
– Yuri Negometyanov
Dec 13 '18 at 1:54

$p(H_0)=0,$ right variant $H_0: theta geq 0.7$.

– Yuri Negometyanov
Dec 13 '18 at 1:54

add a comment |

1 Answer
1

active

oldest

votes

+50

If we were to apply a normal approximation, we would model X as

$$X|H_{0}simmathcal{N}left(ntheta,nthetaleft(1-thetaright)right)$$

$$X|H_{0}simmathcal{N}(70,21)$$

$$sigma=sqrt{21}$$

Thus, our p-value, under normal approximation, would be $mathbb{P}(Z<frac{left(57-70right)}{sqrt{21}})$

This is a one sided Z-test, and we get a p value of 0.002327, which as you can see is quite a bit lower than yours(but is a nice quick order of magnitude approximation if you're away from a computer). It is well known that normal approximations to binomial give p values that smaller than they should be when we are deep into the tails. With only 100 people, you really don't need to apply a normal approximation

answered Dec 12 '18 at 17:21

Alex

17713

$begingroup$
What if I have, let's say 10000 people? What is an appropriate condition to apply normal approximation?
$endgroup$
– Zakkery
Dec 12 '18 at 19:20

$begingroup$
A stirling approximation would be better, $$left(begin{array}{c} n\ i end{array}right)approxfrac{n^{i}}{i!}$$ $$mathbb{P}left(X<mright)=sum_{i=0}^{m}left(begin{array}{c} n\ i end{array}right)p^{i}left(1-pright)^{n-i} approxsum_{i=0}^{m}frac{n^{k}}{i!}p^{i}left(1-pright)^{n-i}$$ Should be a little bit easier than binomial to compute. A normal won't be bad as long as you aren't too too far in the tail--just keep in mind that if you have something like $p=10^{-6}$, you're probably underestimating the p value by using a normal approximation.
$endgroup$
– Alex
Dec 12 '18 at 19:49

$begingroup$
Also, to answer the question of when the normal approximation is good, it will be good with a sample size of 10,000 up to very deep into the tails, and it is more accurate when $theta$ is closer to 0.5 as the distribution will be more symmetric.
$endgroup$
– Alex
Dec 12 '18 at 23:20

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3032534%2fhypothesis-testing-of-binomialy-distributed-data%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

+50

If we were to apply a normal approximation, we would model X as

$$X|H_{0}simmathcal{N}left(ntheta,nthetaleft(1-thetaright)right)$$

$$X|H_{0}simmathcal{N}(70,21)$$

$$sigma=sqrt{21}$$

Thus, our p-value, under normal approximation, would be $mathbb{P}(Z<frac{left(57-70right)}{sqrt{21}})$

answered Dec 12 '18 at 17:21

Alex

17713

$begingroup$
What if I have, let's say 10000 people? What is an appropriate condition to apply normal approximation?
$endgroup$
– Zakkery
Dec 12 '18 at 19:20

$begingroup$
A stirling approximation would be better, $$left(begin{array}{c} n\ i end{array}right)approxfrac{n^{i}}{i!}$$ $$mathbb{P}left(X<mright)=sum_{i=0}^{m}left(begin{array}{c} n\ i end{array}right)p^{i}left(1-pright)^{n-i} approxsum_{i=0}^{m}frac{n^{k}}{i!}p^{i}left(1-pright)^{n-i}$$ Should be a little bit easier than binomial to compute. A normal won't be bad as long as you aren't too too far in the tail--just keep in mind that if you have something like $p=10^{-6}$, you're probably underestimating the p value by using a normal approximation.
$endgroup$
– Alex
Dec 12 '18 at 19:49

$begingroup$
Also, to answer the question of when the normal approximation is good, it will be good with a sample size of 10,000 up to very deep into the tails, and it is more accurate when $theta$ is closer to 0.5 as the distribution will be more symmetric.
$endgroup$
– Alex
Dec 12 '18 at 23:20

add a comment |

+50

If we were to apply a normal approximation, we would model X as

$$X|H_{0}simmathcal{N}left(ntheta,nthetaleft(1-thetaright)right)$$

$$X|H_{0}simmathcal{N}(70,21)$$

$$sigma=sqrt{21}$$

Thus, our p-value, under normal approximation, would be $mathbb{P}(Z<frac{left(57-70right)}{sqrt{21}})$

answered Dec 12 '18 at 17:21

Alex

17713

$begingroup$
What if I have, let's say 10000 people? What is an appropriate condition to apply normal approximation?
$endgroup$
– Zakkery
Dec 12 '18 at 19:20

$begingroup$
A stirling approximation would be better, $$left(begin{array}{c} n\ i end{array}right)approxfrac{n^{i}}{i!}$$ $$mathbb{P}left(X<mright)=sum_{i=0}^{m}left(begin{array}{c} n\ i end{array}right)p^{i}left(1-pright)^{n-i} approxsum_{i=0}^{m}frac{n^{k}}{i!}p^{i}left(1-pright)^{n-i}$$ Should be a little bit easier than binomial to compute. A normal won't be bad as long as you aren't too too far in the tail--just keep in mind that if you have something like $p=10^{-6}$, you're probably underestimating the p value by using a normal approximation.
$endgroup$
– Alex
Dec 12 '18 at 19:49

$begingroup$
Also, to answer the question of when the normal approximation is good, it will be good with a sample size of 10,000 up to very deep into the tails, and it is more accurate when $theta$ is closer to 0.5 as the distribution will be more symmetric.
$endgroup$
– Alex
Dec 12 '18 at 23:20

add a comment |

+50

If we were to apply a normal approximation, we would model X as

$$X|H_{0}simmathcal{N}left(ntheta,nthetaleft(1-thetaright)right)$$

$$X|H_{0}simmathcal{N}(70,21)$$

$$sigma=sqrt{21}$$

Thus, our p-value, under normal approximation, would be $mathbb{P}(Z<frac{left(57-70right)}{sqrt{21}})$

answered Dec 12 '18 at 17:21

Alex

17713

If we were to apply a normal approximation, we would model X as

$$X|H_{0}simmathcal{N}left(ntheta,nthetaleft(1-thetaright)right)$$

$$X|H_{0}simmathcal{N}(70,21)$$

$$sigma=sqrt{21}$$

Thus, our p-value, under normal approximation, would be $mathbb{P}(Z<frac{left(57-70right)}{sqrt{21}})$

answered Dec 12 '18 at 17:21

Alex

17713

answered Dec 12 '18 at 17:21

Alex

17713

answered Dec 12 '18 at 17:21

Alex

17713

answered Dec 12 '18 at 17:21

Alex

17713

$begingroup$
What if I have, let's say 10000 people? What is an appropriate condition to apply normal approximation?
$endgroup$
– Zakkery
Dec 12 '18 at 19:20

$begingroup$
A stirling approximation would be better, $$left(begin{array}{c} n\ i end{array}right)approxfrac{n^{i}}{i!}$$ $$mathbb{P}left(X<mright)=sum_{i=0}^{m}left(begin{array}{c} n\ i end{array}right)p^{i}left(1-pright)^{n-i} approxsum_{i=0}^{m}frac{n^{k}}{i!}p^{i}left(1-pright)^{n-i}$$ Should be a little bit easier than binomial to compute. A normal won't be bad as long as you aren't too too far in the tail--just keep in mind that if you have something like $p=10^{-6}$, you're probably underestimating the p value by using a normal approximation.
$endgroup$
– Alex
Dec 12 '18 at 19:49

$begingroup$
Also, to answer the question of when the normal approximation is good, it will be good with a sample size of 10,000 up to very deep into the tails, and it is more accurate when $theta$ is closer to 0.5 as the distribution will be more symmetric.
$endgroup$
– Alex
Dec 12 '18 at 23:20

add a comment |

$begingroup$
What if I have, let's say 10000 people? What is an appropriate condition to apply normal approximation?
$endgroup$
– Zakkery
Dec 12 '18 at 19:20

$begingroup$
A stirling approximation would be better, $$left(begin{array}{c} n\ i end{array}right)approxfrac{n^{i}}{i!}$$ $$mathbb{P}left(X<mright)=sum_{i=0}^{m}left(begin{array}{c} n\ i end{array}right)p^{i}left(1-pright)^{n-i} approxsum_{i=0}^{m}frac{n^{k}}{i!}p^{i}left(1-pright)^{n-i}$$ Should be a little bit easier than binomial to compute. A normal won't be bad as long as you aren't too too far in the tail--just keep in mind that if you have something like $p=10^{-6}$, you're probably underestimating the p value by using a normal approximation.
$endgroup$
– Alex
Dec 12 '18 at 19:49

$begingroup$
Also, to answer the question of when the normal approximation is good, it will be good with a sample size of 10,000 up to very deep into the tails, and it is more accurate when $theta$ is closer to 0.5 as the distribution will be more symmetric.
$endgroup$
– Alex
Dec 12 '18 at 23:20

What if I have, let's say 10000 people? What is an appropriate condition to apply normal approximation?

– Zakkery
Dec 12 '18 at 19:20

A stirling approximation would be better, $$left(begin{array}{c} n\ i end{array}right)approxfrac{n^{i}}{i!}$$ $$mathbb{P}left(X<mright)=sum_{i=0}^{m}left(begin{array}{c} n\ i end{array}right)p^{i}left(1-pright)^{n-i} approxsum_{i=0}^{m}frac{n^{k}}{i!}p^{i}left(1-pright)^{n-i}$$ Should be a little bit easier than binomial to compute. A normal won't be bad as long as you aren't too too far in the tail--just keep in mind that if you have something like $p=10^{-6}$, you're probably underestimating the p value by using a normal approximation.

– Alex
Dec 12 '18 at 19:49

Also, to answer the question of when the normal approximation is good, it will be good with a sample size of 10,000 up to very deep into the tails, and it is more accurate when $theta$ is closer to 0.5 as the distribution will be more symmetric.

– Alex
Dec 12 '18 at 23:20

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

BTAoSB8j uElg7c ysNFKCA,Ua,OoyrtTgZr2DV c6njR,AT3

搜尋此網誌

Csdrhrt