Random Forest and Decision Tree Algorithm

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty{ margin-bottom:0;
}

up vote
11
down vote

favorite

A random forest is a collection of decision trees following the bagging concept. When we move from one decision tree to the next decision tree then how does the information learned by last decision tree move forward to the next?

Because, as per my understanding, there is nothing like a trained model which gets created for every decision tree and then loaded before the next decision tree starts learning from the misclassified error.

So how does it work?

edited Nov 21 at 12:16

Peter Flom♦

73.3k11104200

asked Nov 20 at 1:55

Abhay Raj Singh

563

New contributor

"When we move from one decision tree to the next decision tree". This suggests an linear process. We've built parallel implementations where we worked on one tree per CPU core; this works perfectly fine unless you use a separate random number generator per CPU core in training, all of which share the same seed. In that case you can end up with lots of identical trees.
– MSalters
Nov 21 at 14:25

add a comment |

up vote
11
down vote

favorite

So how does it work?

edited Nov 21 at 12:16

Peter Flom♦

73.3k11104200

asked Nov 20 at 1:55

Abhay Raj Singh

563

New contributor

"When we move from one decision tree to the next decision tree". This suggests an linear process. We've built parallel implementations where we worked on one tree per CPU core; this works perfectly fine unless you use a separate random number generator per CPU core in training, all of which share the same seed. In that case you can end up with lots of identical trees.
– MSalters
Nov 21 at 14:25

add a comment |

up vote
11
down vote

favorite

So how does it work?

edited Nov 21 at 12:16

Peter Flom♦

73.3k11104200

asked Nov 20 at 1:55

Abhay Raj Singh

563

New contributor

So how does it work?

machine-learning random-forest cart bagging

edited Nov 21 at 12:16

Peter Flom♦

73.3k11104200

asked Nov 20 at 1:55

Abhay Raj Singh

563

New contributor

edited Nov 21 at 12:16

Peter Flom♦

73.3k11104200

asked Nov 20 at 1:55

Abhay Raj Singh

563

New contributor

edited Nov 21 at 12:16

Peter Flom♦

73.3k11104200

edited Nov 21 at 12:16

Peter Flom♦

73.3k11104200

edited Nov 21 at 12:16

Peter Flom♦

73.3k11104200

asked Nov 20 at 1:55

Abhay Raj Singh

563

New contributor

asked Nov 20 at 1:55

Abhay Raj Singh

563

asked Nov 20 at 1:55

Abhay Raj Singh

563

New contributor

Abhay Raj Singh is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

"When we move from one decision tree to the next decision tree". This suggests an linear process. We've built parallel implementations where we worked on one tree per CPU core; this works perfectly fine unless you use a separate random number generator per CPU core in training, all of which share the same seed. In that case you can end up with lots of identical trees.
– MSalters
Nov 21 at 14:25

add a comment |

"When we move from one decision tree to the next decision tree". This suggests an linear process. We've built parallel implementations where we worked on one tree per CPU core; this works perfectly fine unless you use a separate random number generator per CPU core in training, all of which share the same seed. In that case you can end up with lots of identical trees.
– MSalters
Nov 21 at 14:25

"When we move from one decision tree to the next decision tree". This suggests an linear process. We've built parallel implementations where we worked on one tree per CPU core; this works perfectly fine unless you use a separate random number generator per CPU core in training, all of which share the same seed. In that case you can end up with lots of identical trees.
– MSalters
Nov 21 at 14:25

add a comment |

4 Answers
4

active

oldest

votes

up vote
20
down vote

No information is passed between trees. In a random forest, all of the trees are iid. They are iid because trees are grown using the same randomization strategy for all trees: first, take a bootstrap sample of the data, and then grow the tree using splits from a randomly-chosen subset of features. This happens for each tree individually without attention to any other trees in the ensemble.

You might find it helpful to read an introduction to random forests from a high-quality text. One is "Random Forests" by Leo Breiman. There's also a chapter in Elements of Statistical Learning by Hastie et al.

It's possible that you've confused random forests with boosting methods such as AdaBoost or gradient-boosted trees. Boosting methods are not the same, because they use information about misfit from previous boosting rounds to inform the next boosting round.

edited Nov 20 at 21:17

answered Nov 20 at 1:59

Sycorax

37.3k994183

2

By iid do you mean independent and identically distributed? I wasn't familiar with this abbreviation.
– nekomatic
Nov 21 at 11:52

1

@nekomatic It's safe to assume that that was the intended meaning. It's a pretty common abbrev. in statistics.
– JAD
Nov 21 at 14:00

add a comment |

up vote
9
down vote

The random forests is a collection of multiple decision trees which are trained independently of one another. So there is no notion of sequentially dependent training (which is the case in boosting algorithms). As a result of this, as mentioned in another answer, it is possible to do parallel training of the trees.

You might like to know where the "random" in random forest comes from: there are two ways with which randomness is injected into the process of learning the trees. First is the random selection of data points used for training each of the trees, and second is the random selection of features used in building each tree. As a single decision tree usually tends to overfit on the data, the injection of randomness in this way results in having a bunch of trees where each one of them have a good accuracy (and possibly overfit) on a different subset of the available training data. Therefore, when we take the average of the predictions made by all the trees, we would observe a reduction in overfitting (compared to the case of training one single decision tree on all the available data).

To better understand this, here is a rough sketch of the training process assuming all the data points are stored in a set denoted by $M$ and the number of trees in the forest is $N$:

$i = 0$

Take a boostrap sample of $M$ (i.e. sampling with replacement and with the same size as $M$) which is denoted by $S_i$.

Train $i$-th tree, denoted as $T_i$, using $S_i$ as input data.
- the training process is the same as training a decision tree except with the difference that at each node in the tree only a random selection of features is used for the split in that node.

$i = i + 1$

if $i < N$ go to step 2, otherwise all the trees have been trained, so random forest training is finished.

Note that I described the algorithm as a sequential algorithm, but since training of the trees is not dependent on each other, you can also do this in parallel. Now for prediction step, first make a prediction for every tree (i.e. $T_1$, $T_2$, ..., $T_N$) in the forest and then:

If it is used for a regression task, take the average of predictions as the final prediction of the random forest.

If it is used for a classification task, use soft voting strategy: take the average of the probabilities predicted by the trees for each class, then declare the class with the highest average probability as the final prediction of random forest.

Further, it is worth mentioning that it is possible to train the trees in a sequentially dependent manner and that's exactly what gradient boosted trees algorithm does, which is a totally different method from random forests.

edited Nov 20 at 17:00

answered Nov 20 at 7:13

today

23418

add a comment |

up vote
6
down vote

Random forest is a bagging algorithm rather than a boosting algorithm.

Random forest constructs the tree independently using random sample of the data. A parallel implementation is possible.

You might like to check out gradient boosting where trees are built sequentially where new tree tries to correct the mistake previously made.

answered Nov 20 at 2:06

Siong Thye Goh

2,2821618

add a comment |

up vote
3
down vote

So how does it works ?

Random Forest is a collection of decision trees. The trees are constructed independently. Each tree is trained on subset of features and subset of a sample chosen with replacement.

When predicting, say for Classification, the input parameters are given to each tree in the forest and each tree "votes" on the classification, label with most votes wins.

Why to use Random Forest over simple Decision Tree? Bias/Variance trade off. Random Forest are built from much simpler trees when compared to a single decision tree. Generally Random forests provide a big reduction of error due to variance and small increase in error due to bias.

answered Nov 20 at 5:23

Akavall

1,56111522

If we are chosing different features for every Decision Tree, then how the learning by a set of features in previous Decision Tree improves while we send the missclassified values ahead as for the next Decision Tree there is totally a new set of features ?
– Abhay Raj Singh
Nov 20 at 6:50

3

@AbhayRajSingh - you do not "send the misclassified values ahead" in Random Forest. As Akavall says, "The trees are constructed independently"
– Henry
Nov 20 at 10:16

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Abhay Raj Singh is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f377865%2frandom-forest-and-decision-tree-algorithm%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
20
down vote

edited Nov 20 at 21:17

answered Nov 20 at 1:59

Sycorax

37.3k994183

2

By iid do you mean independent and identically distributed? I wasn't familiar with this abbreviation.
– nekomatic
Nov 21 at 11:52

1

@nekomatic It's safe to assume that that was the intended meaning. It's a pretty common abbrev. in statistics.
– JAD
Nov 21 at 14:00

add a comment |

up vote
20
down vote

edited Nov 20 at 21:17

answered Nov 20 at 1:59

Sycorax

37.3k994183

2

By iid do you mean independent and identically distributed? I wasn't familiar with this abbreviation.
– nekomatic
Nov 21 at 11:52

1

@nekomatic It's safe to assume that that was the intended meaning. It's a pretty common abbrev. in statistics.
– JAD
Nov 21 at 14:00

add a comment |

up vote
20
down vote

edited Nov 20 at 21:17

answered Nov 20 at 1:59

Sycorax

37.3k994183

edited Nov 20 at 21:17

answered Nov 20 at 1:59

Sycorax

37.3k994183

edited Nov 20 at 21:17

answered Nov 20 at 1:59

Sycorax

37.3k994183

answered Nov 20 at 1:59

Sycorax

37.3k994183

answered Nov 20 at 1:59

Sycorax

37.3k994183

2

By iid do you mean independent and identically distributed? I wasn't familiar with this abbreviation.
– nekomatic
Nov 21 at 11:52

1

@nekomatic It's safe to assume that that was the intended meaning. It's a pretty common abbrev. in statistics.
– JAD
Nov 21 at 14:00

add a comment |

2

By iid do you mean independent and identically distributed? I wasn't familiar with this abbreviation.
– nekomatic
Nov 21 at 11:52

1

@nekomatic It's safe to assume that that was the intended meaning. It's a pretty common abbrev. in statistics.
– JAD
Nov 21 at 14:00

By iid do you mean independent and identically distributed? I wasn't familiar with this abbreviation.
– nekomatic
Nov 21 at 11:52

@nekomatic It's safe to assume that that was the intended meaning. It's a pretty common abbrev. in statistics.
– JAD
Nov 21 at 14:00

add a comment |

up vote
9
down vote

To better understand this, here is a rough sketch of the training process assuming all the data points are stored in a set denoted by $M$ and the number of trees in the forest is $N$:

$i = 0$

Take a boostrap sample of $M$ (i.e. sampling with replacement and with the same size as $M$) which is denoted by $S_i$.

Train $i$-th tree, denoted as $T_i$, using $S_i$ as input data.
- the training process is the same as training a decision tree except with the difference that at each node in the tree only a random selection of features is used for the split in that node.

$i = i + 1$

if $i < N$ go to step 2, otherwise all the trees have been trained, so random forest training is finished.

If it is used for a regression task, take the average of predictions as the final prediction of the random forest.

If it is used for a classification task, use soft voting strategy: take the average of the probabilities predicted by the trees for each class, then declare the class with the highest average probability as the final prediction of random forest.

edited Nov 20 at 17:00

answered Nov 20 at 7:13

today

23418

add a comment |

up vote
9
down vote

To better understand this, here is a rough sketch of the training process assuming all the data points are stored in a set denoted by $M$ and the number of trees in the forest is $N$:

$i = 0$

Take a boostrap sample of $M$ (i.e. sampling with replacement and with the same size as $M$) which is denoted by $S_i$.

Train $i$-th tree, denoted as $T_i$, using $S_i$ as input data.
- the training process is the same as training a decision tree except with the difference that at each node in the tree only a random selection of features is used for the split in that node.

$i = i + 1$

if $i < N$ go to step 2, otherwise all the trees have been trained, so random forest training is finished.

If it is used for a regression task, take the average of predictions as the final prediction of the random forest.

If it is used for a classification task, use soft voting strategy: take the average of the probabilities predicted by the trees for each class, then declare the class with the highest average probability as the final prediction of random forest.

edited Nov 20 at 17:00

answered Nov 20 at 7:13

today

23418

add a comment |

up vote
9
down vote

To better understand this, here is a rough sketch of the training process assuming all the data points are stored in a set denoted by $M$ and the number of trees in the forest is $N$:

$i = 0$

Take a boostrap sample of $M$ (i.e. sampling with replacement and with the same size as $M$) which is denoted by $S_i$.

Train $i$-th tree, denoted as $T_i$, using $S_i$ as input data.
- the training process is the same as training a decision tree except with the difference that at each node in the tree only a random selection of features is used for the split in that node.

$i = i + 1$

if $i < N$ go to step 2, otherwise all the trees have been trained, so random forest training is finished.

If it is used for a regression task, take the average of predictions as the final prediction of the random forest.

If it is used for a classification task, use soft voting strategy: take the average of the probabilities predicted by the trees for each class, then declare the class with the highest average probability as the final prediction of random forest.

edited Nov 20 at 17:00

answered Nov 20 at 7:13

today

23418

To better understand this, here is a rough sketch of the training process assuming all the data points are stored in a set denoted by $M$ and the number of trees in the forest is $N$:

$i = 0$

Take a boostrap sample of $M$ (i.e. sampling with replacement and with the same size as $M$) which is denoted by $S_i$.

Train $i$-th tree, denoted as $T_i$, using $S_i$ as input data.
- the training process is the same as training a decision tree except with the difference that at each node in the tree only a random selection of features is used for the split in that node.

$i = i + 1$

if $i < N$ go to step 2, otherwise all the trees have been trained, so random forest training is finished.

If it is used for a regression task, take the average of predictions as the final prediction of the random forest.

If it is used for a classification task, use soft voting strategy: take the average of the probabilities predicted by the trees for each class, then declare the class with the highest average probability as the final prediction of random forest.

edited Nov 20 at 17:00

answered Nov 20 at 7:13

today

23418

edited Nov 20 at 17:00

answered Nov 20 at 7:13

today

23418

answered Nov 20 at 7:13

today

23418

answered Nov 20 at 7:13

today

23418

add a comment |

up vote
6
down vote

Random forest is a bagging algorithm rather than a boosting algorithm.

Random forest constructs the tree independently using random sample of the data. A parallel implementation is possible.

You might like to check out gradient boosting where trees are built sequentially where new tree tries to correct the mistake previously made.

answered Nov 20 at 2:06

Siong Thye Goh

2,2821618

add a comment |

up vote
6
down vote

Random forest is a bagging algorithm rather than a boosting algorithm.

Random forest constructs the tree independently using random sample of the data. A parallel implementation is possible.

You might like to check out gradient boosting where trees are built sequentially where new tree tries to correct the mistake previously made.

answered Nov 20 at 2:06

Siong Thye Goh

2,2821618

add a comment |

up vote
6
down vote

Random forest is a bagging algorithm rather than a boosting algorithm.

Random forest constructs the tree independently using random sample of the data. A parallel implementation is possible.

You might like to check out gradient boosting where trees are built sequentially where new tree tries to correct the mistake previously made.

answered Nov 20 at 2:06

Siong Thye Goh

2,2821618

Random forest is a bagging algorithm rather than a boosting algorithm.

Random forest constructs the tree independently using random sample of the data. A parallel implementation is possible.

You might like to check out gradient boosting where trees are built sequentially where new tree tries to correct the mistake previously made.

answered Nov 20 at 2:06

Siong Thye Goh

2,2821618

answered Nov 20 at 2:06

Siong Thye Goh

2,2821618

answered Nov 20 at 2:06

Siong Thye Goh

2,2821618

answered Nov 20 at 2:06

Siong Thye Goh

2,2821618

add a comment |

up vote
3
down vote

So how does it works ?

Random Forest is a collection of decision trees. The trees are constructed independently. Each tree is trained on subset of features and subset of a sample chosen with replacement.

When predicting, say for Classification, the input parameters are given to each tree in the forest and each tree "votes" on the classification, label with most votes wins.

answered Nov 20 at 5:23

Akavall

1,56111522

If we are chosing different features for every Decision Tree, then how the learning by a set of features in previous Decision Tree improves while we send the missclassified values ahead as for the next Decision Tree there is totally a new set of features ?
– Abhay Raj Singh
Nov 20 at 6:50

3

@AbhayRajSingh - you do not "send the misclassified values ahead" in Random Forest. As Akavall says, "The trees are constructed independently"
– Henry
Nov 20 at 10:16

add a comment |

up vote
3
down vote

So how does it works ?

Random Forest is a collection of decision trees. The trees are constructed independently. Each tree is trained on subset of features and subset of a sample chosen with replacement.

When predicting, say for Classification, the input parameters are given to each tree in the forest and each tree "votes" on the classification, label with most votes wins.

answered Nov 20 at 5:23

Akavall

1,56111522

If we are chosing different features for every Decision Tree, then how the learning by a set of features in previous Decision Tree improves while we send the missclassified values ahead as for the next Decision Tree there is totally a new set of features ?
– Abhay Raj Singh
Nov 20 at 6:50

3

@AbhayRajSingh - you do not "send the misclassified values ahead" in Random Forest. As Akavall says, "The trees are constructed independently"
– Henry
Nov 20 at 10:16

add a comment |

up vote
3
down vote

So how does it works ?

Random Forest is a collection of decision trees. The trees are constructed independently. Each tree is trained on subset of features and subset of a sample chosen with replacement.

When predicting, say for Classification, the input parameters are given to each tree in the forest and each tree "votes" on the classification, label with most votes wins.

answered Nov 20 at 5:23

Akavall

1,56111522

So how does it works ?

Random Forest is a collection of decision trees. The trees are constructed independently. Each tree is trained on subset of features and subset of a sample chosen with replacement.

When predicting, say for Classification, the input parameters are given to each tree in the forest and each tree "votes" on the classification, label with most votes wins.

answered Nov 20 at 5:23

Akavall

1,56111522

answered Nov 20 at 5:23

Akavall

1,56111522

answered Nov 20 at 5:23

Akavall

1,56111522

answered Nov 20 at 5:23

Akavall

1,56111522

If we are chosing different features for every Decision Tree, then how the learning by a set of features in previous Decision Tree improves while we send the missclassified values ahead as for the next Decision Tree there is totally a new set of features ?
– Abhay Raj Singh
Nov 20 at 6:50

3

@AbhayRajSingh - you do not "send the misclassified values ahead" in Random Forest. As Akavall says, "The trees are constructed independently"
– Henry
Nov 20 at 10:16

add a comment |

If we are chosing different features for every Decision Tree, then how the learning by a set of features in previous Decision Tree improves while we send the missclassified values ahead as for the next Decision Tree there is totally a new set of features ?
– Abhay Raj Singh
Nov 20 at 6:50

3

@AbhayRajSingh - you do not "send the misclassified values ahead" in Random Forest. As Akavall says, "The trees are constructed independently"
– Henry
Nov 20 at 10:16

If we are chosing different features for every Decision Tree, then how the learning by a set of features in previous Decision Tree improves while we send the missclassified values ahead as for the next Decision Tree there is totally a new set of features ?
– Abhay Raj Singh
Nov 20 at 6:50

@AbhayRajSingh - you do not "send the misclassified values ahead" in Random Forest. As Akavall says, "The trees are constructed independently"
– Henry
Nov 20 at 10:16

add a comment |

Abhay Raj Singh is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Abhay Raj Singh is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Csdrhrt