How to interpret interaction dummies of multiple categories and main effect

I have a panel data crosscountry regression with following structure ($y$ as a drug addiction rate of the country, $x$ as number of homeless of the country and $m$ as HIV infection rate of the country) and I categorize my countries in four world regions which I code as Dummys $D_1$, $D_2$, $D_3$ and the fourth region as reference category:

$y = b_1x + b_2m + b_3D_1m + b_4D_2m + b_5D_3m$ (1)

When I change my base category every coefficient and significance value except $b_1$ changes.

When I change my regression to:

$y = b_1x + b_3D_1m + b_4D_2m + b_5D_3m + b_6D_4m$ (2)

the coefficients in (2) are the same as $b_2$ in regression (1) with the same significance values depending on the reference category

Now I don't understand what I am seeing. the maineffect coefficient $b_2$ is the effect of the reference category and not the mean of the HIV infection rate effect? What does my main effect coefficient $b_2$ say? In regression (1) why does my significance values $b_3$, $b_4$, and $b_5$ change if I change my reference category and what does the significance of $b_3$, $b_4$, and $b_5$ mean regarding my main effect $b_2$? I am completely confused right now.

Best regards,
Rub_n

edited Dec 20 '18 at 23:38

StatsStudent

5,06832042

asked Dec 20 '18 at 23:30

Rub_n

183

$begingroup$
How are you modelling the error terms? What kind of model is this? Ordinary least squares? Logistic regression?
$endgroup$
– StatsStudent
Dec 20 '18 at 23:33

$begingroup$
Do you really have crosscountry data or is this supposed to be cross-sectional?
$endgroup$
– StatsStudent
Dec 20 '18 at 23:34

$begingroup$
I use an OLS regression with group and time fixed effects. Yes I have crosscountry data.
$endgroup$
– Rub_n
Dec 21 '18 at 0:10

add a comment |

$y = b_1x + b_2m + b_3D_1m + b_4D_2m + b_5D_3m$ (1)

When I change my base category every coefficient and significance value except $b_1$ changes.

When I change my regression to:

$y = b_1x + b_3D_1m + b_4D_2m + b_5D_3m + b_6D_4m$ (2)

the coefficients in (2) are the same as $b_2$ in regression (1) with the same significance values depending on the reference category

Best regards,
Rub_n

edited Dec 20 '18 at 23:38

StatsStudent

5,06832042

asked Dec 20 '18 at 23:30

Rub_n

183

$begingroup$
How are you modelling the error terms? What kind of model is this? Ordinary least squares? Logistic regression?
$endgroup$
– StatsStudent
Dec 20 '18 at 23:33

$begingroup$
Do you really have crosscountry data or is this supposed to be cross-sectional?
$endgroup$
– StatsStudent
Dec 20 '18 at 23:34

$begingroup$
I use an OLS regression with group and time fixed effects. Yes I have crosscountry data.
$endgroup$
– Rub_n
Dec 21 '18 at 0:10

add a comment |

$y = b_1x + b_2m + b_3D_1m + b_4D_2m + b_5D_3m$ (1)

When I change my base category every coefficient and significance value except $b_1$ changes.

When I change my regression to:

$y = b_1x + b_3D_1m + b_4D_2m + b_5D_3m + b_6D_4m$ (2)

the coefficients in (2) are the same as $b_2$ in regression (1) with the same significance values depending on the reference category

Best regards,
Rub_n

edited Dec 20 '18 at 23:38

StatsStudent

5,06832042

asked Dec 20 '18 at 23:30

Rub_n

183

$y = b_1x + b_2m + b_3D_1m + b_4D_2m + b_5D_3m$ (1)

When I change my base category every coefficient and significance value except $b_1$ changes.

When I change my regression to:

$y = b_1x + b_3D_1m + b_4D_2m + b_5D_3m + b_6D_4m$ (2)

the coefficients in (2) are the same as $b_2$ in regression (1) with the same significance values depending on the reference category

Best regards,
Rub_n

regression mean interpretation categorical-encoding

edited Dec 20 '18 at 23:38

StatsStudent

5,06832042

asked Dec 20 '18 at 23:30

Rub_n

183

edited Dec 20 '18 at 23:38

StatsStudent

5,06832042

asked Dec 20 '18 at 23:30

Rub_n

183

edited Dec 20 '18 at 23:38

StatsStudent

5,06832042

edited Dec 20 '18 at 23:38

StatsStudent

5,06832042

edited Dec 20 '18 at 23:38

StatsStudent

5,06832042

asked Dec 20 '18 at 23:30

Rub_n

183

asked Dec 20 '18 at 23:30

Rub_n

183

asked Dec 20 '18 at 23:30

Rub_n

183

$begingroup$
How are you modelling the error terms? What kind of model is this? Ordinary least squares? Logistic regression?
$endgroup$
– StatsStudent
Dec 20 '18 at 23:33

$begingroup$
Do you really have crosscountry data or is this supposed to be cross-sectional?
$endgroup$
– StatsStudent
Dec 20 '18 at 23:34

$begingroup$
I use an OLS regression with group and time fixed effects. Yes I have crosscountry data.
$endgroup$
– Rub_n
Dec 21 '18 at 0:10

add a comment |

$begingroup$
How are you modelling the error terms? What kind of model is this? Ordinary least squares? Logistic regression?
$endgroup$
– StatsStudent
Dec 20 '18 at 23:33

$begingroup$
Do you really have crosscountry data or is this supposed to be cross-sectional?
$endgroup$
– StatsStudent
Dec 20 '18 at 23:34

$begingroup$
I use an OLS regression with group and time fixed effects. Yes I have crosscountry data.
$endgroup$
– Rub_n
Dec 21 '18 at 0:10

How are you modelling the error terms? What kind of model is this? Ordinary least squares? Logistic regression?

– StatsStudent
Dec 20 '18 at 23:33

Do you really have crosscountry data or is this supposed to be cross-sectional?

– StatsStudent
Dec 20 '18 at 23:34

I use an OLS regression with group and time fixed effects. Yes I have crosscountry data.

– Rub_n
Dec 21 '18 at 0:10

add a comment |

1 Answer
1

active

oldest

votes

Consider a model with only 3 regions and hence two dummies $D_1$ and $D_2$. Assume the data is crosscountry so $i=1,...,n$ are countries. Let the model equation be

$$y_{it} = b_1 x_{it} + b_2 m_{it} + b_3 D_1 m_{it} + b_4 D_2 m_{it} + epsilon_{it}$$

implying that the conditional expected rate of drug addiction is

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it} + b_3 D_1 m_{it} + b_4 D_2 m_{it}$$

hence the model allows for different regions to have different marginal effects of HIV infection rate $m$ on drug addiction rate $y$ - so their drug addiction rate responds differently to change HIV infection rate compared to the reference region.

For the reference region $D_1=D_2=0$ the conditional effect reduces to

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it}$$

differentiating with respect to $m_{it}$ to get

$$frac{partial mathbb E[y lvert data]}{partial m_{it}} = b_2$$

which is the marginal effect of HIV infection rate $m$ on drug addiction rate $y$ for contries in the reference region. An increase of one unit in HIV infection rate in a country $i$ from the reference region result in a change of $b_2$ units in the drug addiction rate of country $i$.

For countries from the region defined by $D_1=1$ and $D_2=0$ the conditional expectation is

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it} + b_3m_{it} $$

and the marginal effect

$$frac{partial mathbb E[y lvert data]}{partial m_{it}} = b_2 + b_3$$

hence $b_3$ is the difference in the marginal effect of HIV infection rate $m$ on drug addiction rate $y$ for contries in region $D_1=1$ compared to the reference region, for which the marginal effect was simply $b_2$. Hence if $b_3$ is positive then it appears that countries from region $D_1=1$ reacts stronger changes in the HIV infection rate with respect to the drug addiction rate.

So $b_2$ measures the increase in drug addiction rate as a result of a 1 unit increase in the HIV infection rate $m$ for the countries in the reference region. An the values of $b_3$ changes when you change the reference because it is the difference the marginal effect between some region - here $D_1=1$ and the reference - and offcourse the difference depend on what the region is compared to. The significance of $b_3$ means that you can reject the null hypothesis that countries from region $D_1=1$ have the same marginal effect as countries from the reference region.

In the second model there is no reference category so now the coefficients $b_3,b_4,b_5$ and $b_6$ are region specific marginal effects (not differences in the marginal effect). The purpose of this model is that it will allow you to test for the significant marginal effect of HIV infection rate on drug addiction rate for each region simply by testing the significance of the coefficients. To test for differences between regions in this model you have to test differences in coefficients for example $H0: b_3 = b_4$, which can easily be performed as a Wald test for example. However in model (1) this comparison between regions in the responsiveness of drug addcition rate to HIV infection rate was performed simply by testing the significance of a coefficient.

edited Dec 21 '18 at 0:53

answered Dec 21 '18 at 0:10

Jesper Hybel

824514

$begingroup$
Oh man, that really helps! So the significance of b2 is the significance of the effect of region 3 on my drug addiction rate?
$endgroup$
– Rub_n
Dec 21 '18 at 0:19

1

$begingroup$
b_2 measure the effect on drug addiction rate of a 1 unit increase in the HIV infection rate for countries belonging to the reference region. It's significance means it is significantly different from 0 therefore you can reject the null hypothesis that HIV infection rate do not affect drug addiction rate in countries from this region (I dont know what you define as region 3??)
$endgroup$
– Jesper Hybel
Dec 21 '18 at 0:25

$begingroup$
perfect thank you so much, so do i have a benefit of using regression (2) instead of doing four different regressions for each region except having a bigger sample size for the effect of x?
$endgroup$
– Rub_n
Dec 21 '18 at 0:31

1

$begingroup$
See edit of my repsonse last two paragraphs.
$endgroup$
– Jesper Hybel
Dec 21 '18 at 0:46

1

$begingroup$
pls. accept and upvote if you think the answer was helpful :)
$endgroup$
– Jesper Hybel
Dec 21 '18 at 0:54

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f383994%2fhow-to-interpret-interaction-dummies-of-multiple-categories-and-main-effect%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Consider a model with only 3 regions and hence two dummies $D_1$ and $D_2$. Assume the data is crosscountry so $i=1,...,n$ are countries. Let the model equation be

$$y_{it} = b_1 x_{it} + b_2 m_{it} + b_3 D_1 m_{it} + b_4 D_2 m_{it} + epsilon_{it}$$

implying that the conditional expected rate of drug addiction is

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it} + b_3 D_1 m_{it} + b_4 D_2 m_{it}$$

For the reference region $D_1=D_2=0$ the conditional effect reduces to

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it}$$

differentiating with respect to $m_{it}$ to get

$$frac{partial mathbb E[y lvert data]}{partial m_{it}} = b_2$$

For countries from the region defined by $D_1=1$ and $D_2=0$ the conditional expectation is

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it} + b_3m_{it} $$

and the marginal effect

$$frac{partial mathbb E[y lvert data]}{partial m_{it}} = b_2 + b_3$$

edited Dec 21 '18 at 0:53

answered Dec 21 '18 at 0:10

Jesper Hybel

824514

$begingroup$
Oh man, that really helps! So the significance of b2 is the significance of the effect of region 3 on my drug addiction rate?
$endgroup$
– Rub_n
Dec 21 '18 at 0:19

1

$begingroup$
b_2 measure the effect on drug addiction rate of a 1 unit increase in the HIV infection rate for countries belonging to the reference region. It's significance means it is significantly different from 0 therefore you can reject the null hypothesis that HIV infection rate do not affect drug addiction rate in countries from this region (I dont know what you define as region 3??)
$endgroup$
– Jesper Hybel
Dec 21 '18 at 0:25

$begingroup$
perfect thank you so much, so do i have a benefit of using regression (2) instead of doing four different regressions for each region except having a bigger sample size for the effect of x?
$endgroup$
– Rub_n
Dec 21 '18 at 0:31

1

$begingroup$
See edit of my repsonse last two paragraphs.
$endgroup$
– Jesper Hybel
Dec 21 '18 at 0:46

1

$begingroup$
pls. accept and upvote if you think the answer was helpful :)
$endgroup$
– Jesper Hybel
Dec 21 '18 at 0:54

add a comment |

Consider a model with only 3 regions and hence two dummies $D_1$ and $D_2$. Assume the data is crosscountry so $i=1,...,n$ are countries. Let the model equation be

$$y_{it} = b_1 x_{it} + b_2 m_{it} + b_3 D_1 m_{it} + b_4 D_2 m_{it} + epsilon_{it}$$

implying that the conditional expected rate of drug addiction is

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it} + b_3 D_1 m_{it} + b_4 D_2 m_{it}$$

For the reference region $D_1=D_2=0$ the conditional effect reduces to

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it}$$

differentiating with respect to $m_{it}$ to get

$$frac{partial mathbb E[y lvert data]}{partial m_{it}} = b_2$$

For countries from the region defined by $D_1=1$ and $D_2=0$ the conditional expectation is

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it} + b_3m_{it} $$

and the marginal effect

$$frac{partial mathbb E[y lvert data]}{partial m_{it}} = b_2 + b_3$$

edited Dec 21 '18 at 0:53

answered Dec 21 '18 at 0:10

Jesper Hybel

824514

$begingroup$
Oh man, that really helps! So the significance of b2 is the significance of the effect of region 3 on my drug addiction rate?
$endgroup$
– Rub_n
Dec 21 '18 at 0:19

1

$begingroup$
b_2 measure the effect on drug addiction rate of a 1 unit increase in the HIV infection rate for countries belonging to the reference region. It's significance means it is significantly different from 0 therefore you can reject the null hypothesis that HIV infection rate do not affect drug addiction rate in countries from this region (I dont know what you define as region 3??)
$endgroup$
– Jesper Hybel
Dec 21 '18 at 0:25

$begingroup$
perfect thank you so much, so do i have a benefit of using regression (2) instead of doing four different regressions for each region except having a bigger sample size for the effect of x?
$endgroup$
– Rub_n
Dec 21 '18 at 0:31

1

$begingroup$
See edit of my repsonse last two paragraphs.
$endgroup$
– Jesper Hybel
Dec 21 '18 at 0:46

1

$begingroup$
pls. accept and upvote if you think the answer was helpful :)
$endgroup$
– Jesper Hybel
Dec 21 '18 at 0:54

add a comment |

Consider a model with only 3 regions and hence two dummies $D_1$ and $D_2$. Assume the data is crosscountry so $i=1,...,n$ are countries. Let the model equation be

$$y_{it} = b_1 x_{it} + b_2 m_{it} + b_3 D_1 m_{it} + b_4 D_2 m_{it} + epsilon_{it}$$

implying that the conditional expected rate of drug addiction is

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it} + b_3 D_1 m_{it} + b_4 D_2 m_{it}$$

For the reference region $D_1=D_2=0$ the conditional effect reduces to

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it}$$

differentiating with respect to $m_{it}$ to get

$$frac{partial mathbb E[y lvert data]}{partial m_{it}} = b_2$$

For countries from the region defined by $D_1=1$ and $D_2=0$ the conditional expectation is

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it} + b_3m_{it} $$

and the marginal effect

$$frac{partial mathbb E[y lvert data]}{partial m_{it}} = b_2 + b_3$$

edited Dec 21 '18 at 0:53

answered Dec 21 '18 at 0:10

Jesper Hybel

824514

Consider a model with only 3 regions and hence two dummies $D_1$ and $D_2$. Assume the data is crosscountry so $i=1,...,n$ are countries. Let the model equation be

$$y_{it} = b_1 x_{it} + b_2 m_{it} + b_3 D_1 m_{it} + b_4 D_2 m_{it} + epsilon_{it}$$

implying that the conditional expected rate of drug addiction is

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it} + b_3 D_1 m_{it} + b_4 D_2 m_{it}$$

For the reference region $D_1=D_2=0$ the conditional effect reduces to

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it}$$

differentiating with respect to $m_{it}$ to get

$$frac{partial mathbb E[y lvert data]}{partial m_{it}} = b_2$$

For countries from the region defined by $D_1=1$ and $D_2=0$ the conditional expectation is

$$mathbb E[y lvert data] = b_1 x_{it} + b_2 m_{it} + b_3m_{it} $$

and the marginal effect

$$frac{partial mathbb E[y lvert data]}{partial m_{it}} = b_2 + b_3$$

edited Dec 21 '18 at 0:53

answered Dec 21 '18 at 0:10

Jesper Hybel

824514

edited Dec 21 '18 at 0:53

answered Dec 21 '18 at 0:10

Jesper Hybel

824514

answered Dec 21 '18 at 0:10

Jesper Hybel

824514

answered Dec 21 '18 at 0:10

Jesper Hybel

824514

$begingroup$
Oh man, that really helps! So the significance of b2 is the significance of the effect of region 3 on my drug addiction rate?
$endgroup$
– Rub_n
Dec 21 '18 at 0:19

1

$begingroup$
b_2 measure the effect on drug addiction rate of a 1 unit increase in the HIV infection rate for countries belonging to the reference region. It's significance means it is significantly different from 0 therefore you can reject the null hypothesis that HIV infection rate do not affect drug addiction rate in countries from this region (I dont know what you define as region 3??)
$endgroup$
– Jesper Hybel
Dec 21 '18 at 0:25

$begingroup$
perfect thank you so much, so do i have a benefit of using regression (2) instead of doing four different regressions for each region except having a bigger sample size for the effect of x?
$endgroup$
– Rub_n
Dec 21 '18 at 0:31

1

$begingroup$
See edit of my repsonse last two paragraphs.
$endgroup$
– Jesper Hybel
Dec 21 '18 at 0:46

1

$begingroup$
pls. accept and upvote if you think the answer was helpful :)
$endgroup$
– Jesper Hybel
Dec 21 '18 at 0:54

add a comment |

$begingroup$
Oh man, that really helps! So the significance of b2 is the significance of the effect of region 3 on my drug addiction rate?
$endgroup$
– Rub_n
Dec 21 '18 at 0:19

1

$begingroup$
b_2 measure the effect on drug addiction rate of a 1 unit increase in the HIV infection rate for countries belonging to the reference region. It's significance means it is significantly different from 0 therefore you can reject the null hypothesis that HIV infection rate do not affect drug addiction rate in countries from this region (I dont know what you define as region 3??)
$endgroup$
– Jesper Hybel
Dec 21 '18 at 0:25

$begingroup$
perfect thank you so much, so do i have a benefit of using regression (2) instead of doing four different regressions for each region except having a bigger sample size for the effect of x?
$endgroup$
– Rub_n
Dec 21 '18 at 0:31

1

$begingroup$
See edit of my repsonse last two paragraphs.
$endgroup$
– Jesper Hybel
Dec 21 '18 at 0:46

1

$begingroup$
pls. accept and upvote if you think the answer was helpful :)
$endgroup$
– Jesper Hybel
Dec 21 '18 at 0:54

Oh man, that really helps! So the significance of b2 is the significance of the effect of region 3 on my drug addiction rate?

– Rub_n
Dec 21 '18 at 0:19

b_2 measure the effect on drug addiction rate of a 1 unit increase in the HIV infection rate for countries belonging to the reference region. It's significance means it is significantly different from 0 therefore you can reject the null hypothesis that HIV infection rate do not affect drug addiction rate in countries from this region (I dont know what you define as region 3??)

– Jesper Hybel
Dec 21 '18 at 0:25

perfect thank you so much, so do i have a benefit of using regression (2) instead of doing four different regressions for each region except having a bigger sample size for the effect of x?

– Rub_n
Dec 21 '18 at 0:31

See edit of my repsonse last two paragraphs.

– Jesper Hybel
Dec 21 '18 at 0:46

pls. accept and upvote if you think the answer was helpful :)

– Jesper Hybel
Dec 21 '18 at 0:54

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Csdrhrt