Create a bagged tree and tune the ensemble size 2.8.6 Exercises for Section 2.5.7
The question asks: "Create a bagged tree of the following tree and tune the ensemble size. What is the optimal number of trees?"
How are we supposed to tackle this question? Do we start with the code on page 121-123 of the book to create the bagged trees? What are we
supposed to have as an output for tuning the ensemble size?
For the optimal of trees, should we use the code on page 129 which creates the model with the optimal ensemble size?
I am really confused on the steps / process we should take in this problem.
Thank you very much for all of your help, I greatly appreciate it.
Answers and follow-up questions Answer or follow-up question 1
The code on page 129 is a code for random forest. I don't know that this will really help us here.
My thoughts are that you would run the code he gives in the homework problem and create a bagged tree as we did in the code for the bagged
tree section. From here, I think we are supposed to find the ensemble size (number of trees) that gives the highest AUC value. However, if
you build this code and rerun it, the AUC changes drastically each time, so this probably isn't valid.
However, in class, he used the AUC method to find an appropriate ensemble size so I assume this method is what he is looking for. Answer or follow-up question 2
I gave you code in the book to create a bagged tree. I think we used 10 trees. The question now is:
optimize the number of trees. Which number of trees gives you the best ensemble performance in terms of AUC?
It is true that this can change with each run, but what I am looking for is the code to do that.
Answer or follow-up question 3
Dear Dr. Ballings,
Thank you for your reply. I am still very confused on how we do this problem though. What steps are we supposed to take? I just don't know
where to start or what the steps we need to take. Answer or follow-up question 4
Don't try to overthink this.
What I am asking you to do is to create an ensemble of 1 tree, an ensemble of 2 trees, an ensemble of 3 trees, ..., an ensemble of 100
each time see how well it performs. If the ensemble with 50 trees has te best performance, you know that 50 is the optimal ensemble size.
Operationally, you would create one ensemble with 100 trees, and predict with the first tree, the first two trees, ..., the first three
trees, ..., all trees
and each time note the AUC.
In terms of coding, this is nothing more than adding an loop over the ensemble size (i.e., number of trees).
Michel BallingsSign in to be able to add an answer or mark this question as resolved.