Reproducing t-test in R gives different result than built-in function

up vote
5
down vote

favorite

I'm trying to reproduce how a t-test works using the example from page 211 in The Art Of Computer System Performance Analysis by Raj Jain.

The calculation is as follows:

# system A sample and statistics

a <- c(5.36, 16.57, 0.62, 1.41, 0.64, 7.26)

x_a <- mean(a)

s2_a <- var(a)

n_a <- length(a)



# system B sample and statistics

b <- c(19.12, 3.52, 3.38, 2.5, 3.6, 1.74)

x_b <- mean(b)

s2_b <- var(b)

n_b <- length(b)



# computation of t-test

s <- (s2_a/n_a + s2_b/n_b)^(1/2)

v <- ((s2_a/n_a + s2_b/n_b)^2)/((1/(n_a - 1))*((s2_a/n_a)^2) + (1/(n_b - 1))*((s2_b/n_b)^2)) - 2



# 90% confidence interval

(x_a - x_b) + c(1, -1) * qt(c(0.95), v) * s

The output of the last computation is (6.55, -7.22) which matches the result given in the book (see erratas here: https://www.cse.wustl.edu/~jain/books/ftp/errors_all.pdf).

However, the built-in t-test gives a different result:

> t.test(a, b, conf.level = 0.9)



    Welch Two Sample t-test



data:  a and b

t = -0.09015, df = 9.9434, p-value = 0.93

alternative hypothesis: true difference in means is not equal to 0

90 percent confidence interval:

 -7.038828  6.372161

sample estimates:

mean of x mean of y 

 5.310000  5.643333

The built-in function gives a confidence interval of (-7.04, 6.37). I'm unable to reproduce this interval. What is the cause of the difference? Is my calculation wrong?

Update: The different result is due to a different value for $v$, indicating the degrees of freedom. As, Ben Bolker points out, R uses 9.94 whereas the manual calculation in the book uses $9.94 - 2 = 7.94$ from it. The book does not say why it subtracts 2. It only assumes that the observations are independent and unpaired but is silent about the variance.

The answer by Neeraj reproduces the calculation of the degrees of freedom that is given in the Wikipedia article to the Welch's t-test.

edited Nov 20 at 13:30

asked Nov 19 at 12:46

Viktor Rosenfeld

283

1

Haven't dug into the calculations, but R gives df=9.94 whereas your manual calculation gives df=7.94. deparse(body(stats:::t.test.default))[73:77] will show you R's internal df calculations ...
– Ben Bolker
Nov 19 at 13:10

add a comment |

up vote
5
down vote

favorite

I'm trying to reproduce how a t-test works using the example from page 211 in The Art Of Computer System Performance Analysis by Raj Jain.

The calculation is as follows:

# system A sample and statistics

a <- c(5.36, 16.57, 0.62, 1.41, 0.64, 7.26)

x_a <- mean(a)

s2_a <- var(a)

n_a <- length(a)



# system B sample and statistics

b <- c(19.12, 3.52, 3.38, 2.5, 3.6, 1.74)

x_b <- mean(b)

s2_b <- var(b)

n_b <- length(b)



# computation of t-test

s <- (s2_a/n_a + s2_b/n_b)^(1/2)

v <- ((s2_a/n_a + s2_b/n_b)^2)/((1/(n_a - 1))*((s2_a/n_a)^2) + (1/(n_b - 1))*((s2_b/n_b)^2)) - 2



# 90% confidence interval

(x_a - x_b) + c(1, -1) * qt(c(0.95), v) * s

The output of the last computation is (6.55, -7.22) which matches the result given in the book (see erratas here: https://www.cse.wustl.edu/~jain/books/ftp/errors_all.pdf).

However, the built-in t-test gives a different result:

> t.test(a, b, conf.level = 0.9)



    Welch Two Sample t-test



data:  a and b

t = -0.09015, df = 9.9434, p-value = 0.93

alternative hypothesis: true difference in means is not equal to 0

90 percent confidence interval:

 -7.038828  6.372161

sample estimates:

mean of x mean of y 

 5.310000  5.643333

The built-in function gives a confidence interval of (-7.04, 6.37). I'm unable to reproduce this interval. What is the cause of the difference? Is my calculation wrong?

The answer by Neeraj reproduces the calculation of the degrees of freedom that is given in the Wikipedia article to the Welch's t-test.

edited Nov 20 at 13:30

asked Nov 19 at 12:46

Viktor Rosenfeld

283

1

Haven't dug into the calculations, but R gives df=9.94 whereas your manual calculation gives df=7.94. deparse(body(stats:::t.test.default))[73:77] will show you R's internal df calculations ...
– Ben Bolker
Nov 19 at 13:10

add a comment |

up vote
5
down vote

favorite

I'm trying to reproduce how a t-test works using the example from page 211 in The Art Of Computer System Performance Analysis by Raj Jain.

The calculation is as follows:

# system A sample and statistics

a <- c(5.36, 16.57, 0.62, 1.41, 0.64, 7.26)

x_a <- mean(a)

s2_a <- var(a)

n_a <- length(a)



# system B sample and statistics

b <- c(19.12, 3.52, 3.38, 2.5, 3.6, 1.74)

x_b <- mean(b)

s2_b <- var(b)

n_b <- length(b)



# computation of t-test

s <- (s2_a/n_a + s2_b/n_b)^(1/2)

v <- ((s2_a/n_a + s2_b/n_b)^2)/((1/(n_a - 1))*((s2_a/n_a)^2) + (1/(n_b - 1))*((s2_b/n_b)^2)) - 2



# 90% confidence interval

(x_a - x_b) + c(1, -1) * qt(c(0.95), v) * s

The output of the last computation is (6.55, -7.22) which matches the result given in the book (see erratas here: https://www.cse.wustl.edu/~jain/books/ftp/errors_all.pdf).

However, the built-in t-test gives a different result:

> t.test(a, b, conf.level = 0.9)



    Welch Two Sample t-test



data:  a and b

t = -0.09015, df = 9.9434, p-value = 0.93

alternative hypothesis: true difference in means is not equal to 0

90 percent confidence interval:

 -7.038828  6.372161

sample estimates:

mean of x mean of y 

 5.310000  5.643333

The built-in function gives a confidence interval of (-7.04, 6.37). I'm unable to reproduce this interval. What is the cause of the difference? Is my calculation wrong?

The answer by Neeraj reproduces the calculation of the degrees of freedom that is given in the Wikipedia article to the Welch's t-test.

edited Nov 20 at 13:30

asked Nov 19 at 12:46

Viktor Rosenfeld

283

I'm trying to reproduce how a t-test works using the example from page 211 in The Art Of Computer System Performance Analysis by Raj Jain.

The calculation is as follows:

# system A sample and statistics

a <- c(5.36, 16.57, 0.62, 1.41, 0.64, 7.26)

x_a <- mean(a)

s2_a <- var(a)

n_a <- length(a)



# system B sample and statistics

b <- c(19.12, 3.52, 3.38, 2.5, 3.6, 1.74)

x_b <- mean(b)

s2_b <- var(b)

n_b <- length(b)



# computation of t-test

s <- (s2_a/n_a + s2_b/n_b)^(1/2)

v <- ((s2_a/n_a + s2_b/n_b)^2)/((1/(n_a - 1))*((s2_a/n_a)^2) + (1/(n_b - 1))*((s2_b/n_b)^2)) - 2



# 90% confidence interval

(x_a - x_b) + c(1, -1) * qt(c(0.95), v) * s

The output of the last computation is (6.55, -7.22) which matches the result given in the book (see erratas here: https://www.cse.wustl.edu/~jain/books/ftp/errors_all.pdf).

However, the built-in t-test gives a different result:

> t.test(a, b, conf.level = 0.9)



    Welch Two Sample t-test



data:  a and b

t = -0.09015, df = 9.9434, p-value = 0.93

alternative hypothesis: true difference in means is not equal to 0

90 percent confidence interval:

 -7.038828  6.372161

sample estimates:

mean of x mean of y 

 5.310000  5.643333

The built-in function gives a confidence interval of (-7.04, 6.37). I'm unable to reproduce this interval. What is the cause of the difference? Is my calculation wrong?

The answer by Neeraj reproduces the calculation of the degrees of freedom that is given in the Wikipedia article to the Welch's t-test.

r t-test

edited Nov 20 at 13:30

asked Nov 19 at 12:46

Viktor Rosenfeld

283

edited Nov 20 at 13:30

asked Nov 19 at 12:46

Viktor Rosenfeld

283

edited Nov 20 at 13:30

asked Nov 19 at 12:46

Viktor Rosenfeld

283

asked Nov 19 at 12:46

Viktor Rosenfeld

283

asked Nov 19 at 12:46

Viktor Rosenfeld

283

1

Haven't dug into the calculations, but R gives df=9.94 whereas your manual calculation gives df=7.94. deparse(body(stats:::t.test.default))[73:77] will show you R's internal df calculations ...
– Ben Bolker
Nov 19 at 13:10

add a comment |

1

Haven't dug into the calculations, but R gives df=9.94 whereas your manual calculation gives df=7.94. deparse(body(stats:::t.test.default))[73:77] will show you R's internal df calculations ...
– Ben Bolker
Nov 19 at 13:10

Haven't dug into the calculations, but R gives df=9.94 whereas your manual calculation gives df=7.94. deparse(body(stats:::t.test.default))[73:77] will show you R's internal df calculations ...
– Ben Bolker
Nov 19 at 13:10

add a comment |

1 Answer
1

active

oldest

votes

up vote
7
down vote

accepted

You are making mistake in calculating your degree of freedom.

Here is the code, that exactly reproduce the R t.test results.

a <- c(5.36, 16.57, 0.62, 1.41, 0.64, 7.26)

b <- c(19.12, 3.52, 3.38, 2.5, 3.6, 1.74)



v1 <- var(a)

v2 <- var(b)



n1 <- length(a)

n2 <- length(b)



se <- sqrt(v1/n1 + v2/n2)



nu <- se^4 / ((v1^2 /(n1^2*(n1 -1))) + (v2^2/(n2^2*(n2-1)))) #degree of freedom



#Confidence Interval

mean(a) - mean(b) + c(1, -1)* qt(.95, nu)*se



> 6.372161 -7.038828

It exactly matches with t.test results

t.test(a, b, conf.level = 0.9)

answered Nov 19 at 13:14

Neeraj

1,135619

at first glance, Wikipedia's Welch test df formula appears to be different. I wonder if there are different variants ... ???
– Ben Bolker
Nov 19 at 13:16

I used wikipedia formula, and it exactly replicate the R result.
– Neeraj
Nov 19 at 13:17

This is Welch's t-test, which does not assume variance are equal for both the group.
– Neeraj
Nov 19 at 13:18

I knew it was Welch, but I may have been looking too quickly at the formula.
– Ben Bolker
Nov 19 at 13:22

@Ben There are indeed variants of Welch-Satterthwaite.
– Glen_b♦
Nov 20 at 1:54

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f377759%2freproducing-t-test-in-r-gives-different-result-than-built-in-function%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
7
down vote

accepted

You are making mistake in calculating your degree of freedom.

Here is the code, that exactly reproduce the R t.test results.

a <- c(5.36, 16.57, 0.62, 1.41, 0.64, 7.26)

b <- c(19.12, 3.52, 3.38, 2.5, 3.6, 1.74)



v1 <- var(a)

v2 <- var(b)



n1 <- length(a)

n2 <- length(b)



se <- sqrt(v1/n1 + v2/n2)



nu <- se^4 / ((v1^2 /(n1^2*(n1 -1))) + (v2^2/(n2^2*(n2-1)))) #degree of freedom



#Confidence Interval

mean(a) - mean(b) + c(1, -1)* qt(.95, nu)*se



> 6.372161 -7.038828

It exactly matches with t.test results

t.test(a, b, conf.level = 0.9)

answered Nov 19 at 13:14

Neeraj

1,135619

at first glance, Wikipedia's Welch test df formula appears to be different. I wonder if there are different variants ... ???
– Ben Bolker
Nov 19 at 13:16

I used wikipedia formula, and it exactly replicate the R result.
– Neeraj
Nov 19 at 13:17

This is Welch's t-test, which does not assume variance are equal for both the group.
– Neeraj
Nov 19 at 13:18

I knew it was Welch, but I may have been looking too quickly at the formula.
– Ben Bolker
Nov 19 at 13:22

@Ben There are indeed variants of Welch-Satterthwaite.
– Glen_b♦
Nov 20 at 1:54

add a comment |

up vote
7
down vote

accepted

You are making mistake in calculating your degree of freedom.

Here is the code, that exactly reproduce the R t.test results.

a <- c(5.36, 16.57, 0.62, 1.41, 0.64, 7.26)

b <- c(19.12, 3.52, 3.38, 2.5, 3.6, 1.74)



v1 <- var(a)

v2 <- var(b)



n1 <- length(a)

n2 <- length(b)



se <- sqrt(v1/n1 + v2/n2)



nu <- se^4 / ((v1^2 /(n1^2*(n1 -1))) + (v2^2/(n2^2*(n2-1)))) #degree of freedom



#Confidence Interval

mean(a) - mean(b) + c(1, -1)* qt(.95, nu)*se



> 6.372161 -7.038828

It exactly matches with t.test results

t.test(a, b, conf.level = 0.9)

answered Nov 19 at 13:14

Neeraj

1,135619

at first glance, Wikipedia's Welch test df formula appears to be different. I wonder if there are different variants ... ???
– Ben Bolker
Nov 19 at 13:16

I used wikipedia formula, and it exactly replicate the R result.
– Neeraj
Nov 19 at 13:17

This is Welch's t-test, which does not assume variance are equal for both the group.
– Neeraj
Nov 19 at 13:18

I knew it was Welch, but I may have been looking too quickly at the formula.
– Ben Bolker
Nov 19 at 13:22

@Ben There are indeed variants of Welch-Satterthwaite.
– Glen_b♦
Nov 20 at 1:54

add a comment |

up vote
7
down vote

accepted

You are making mistake in calculating your degree of freedom.

Here is the code, that exactly reproduce the R t.test results.

a <- c(5.36, 16.57, 0.62, 1.41, 0.64, 7.26)

b <- c(19.12, 3.52, 3.38, 2.5, 3.6, 1.74)



v1 <- var(a)

v2 <- var(b)



n1 <- length(a)

n2 <- length(b)



se <- sqrt(v1/n1 + v2/n2)



nu <- se^4 / ((v1^2 /(n1^2*(n1 -1))) + (v2^2/(n2^2*(n2-1)))) #degree of freedom



#Confidence Interval

mean(a) - mean(b) + c(1, -1)* qt(.95, nu)*se



> 6.372161 -7.038828

It exactly matches with t.test results

t.test(a, b, conf.level = 0.9)

answered Nov 19 at 13:14

Neeraj

1,135619

You are making mistake in calculating your degree of freedom.

Here is the code, that exactly reproduce the R t.test results.

a <- c(5.36, 16.57, 0.62, 1.41, 0.64, 7.26)

b <- c(19.12, 3.52, 3.38, 2.5, 3.6, 1.74)



v1 <- var(a)

v2 <- var(b)



n1 <- length(a)

n2 <- length(b)



se <- sqrt(v1/n1 + v2/n2)



nu <- se^4 / ((v1^2 /(n1^2*(n1 -1))) + (v2^2/(n2^2*(n2-1)))) #degree of freedom



#Confidence Interval

mean(a) - mean(b) + c(1, -1)* qt(.95, nu)*se



> 6.372161 -7.038828

It exactly matches with t.test results

t.test(a, b, conf.level = 0.9)

answered Nov 19 at 13:14

Neeraj

1,135619

answered Nov 19 at 13:14

Neeraj

1,135619

answered Nov 19 at 13:14

Neeraj

1,135619

answered Nov 19 at 13:14

Neeraj

1,135619

at first glance, Wikipedia's Welch test df formula appears to be different. I wonder if there are different variants ... ???
– Ben Bolker
Nov 19 at 13:16

I used wikipedia formula, and it exactly replicate the R result.
– Neeraj
Nov 19 at 13:17

This is Welch's t-test, which does not assume variance are equal for both the group.
– Neeraj
Nov 19 at 13:18

I knew it was Welch, but I may have been looking too quickly at the formula.
– Ben Bolker
Nov 19 at 13:22

@Ben There are indeed variants of Welch-Satterthwaite.
– Glen_b♦
Nov 20 at 1:54

add a comment |

at first glance, Wikipedia's Welch test df formula appears to be different. I wonder if there are different variants ... ???
– Ben Bolker
Nov 19 at 13:16

I used wikipedia formula, and it exactly replicate the R result.
– Neeraj
Nov 19 at 13:17

This is Welch's t-test, which does not assume variance are equal for both the group.
– Neeraj
Nov 19 at 13:18

I knew it was Welch, but I may have been looking too quickly at the formula.
– Ben Bolker
Nov 19 at 13:22

@Ben There are indeed variants of Welch-Satterthwaite.
– Glen_b♦
Nov 20 at 1:54

at first glance, Wikipedia's Welch test df formula appears to be different. I wonder if there are different variants ... ???
– Ben Bolker
Nov 19 at 13:16

I used wikipedia formula, and it exactly replicate the R result.
– Neeraj
Nov 19 at 13:17

This is Welch's t-test, which does not assume variance are equal for both the group.
– Neeraj
Nov 19 at 13:18

I knew it was Welch, but I may have been looking too quickly at the formula.
– Ben Bolker
Nov 19 at 13:22

@Ben There are indeed variants of Welch-Satterthwaite.
– Glen_b♦
Nov 20 at 1:54

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Csdrhrt