How do we structure a data frame for stacking models?

Dr. Ballings,

It is my understanding that stacking is an ensemble where the predictions of multiple
models become the inputs of another model (typically logistic regression).

I was wanting to give this a try but have a question regarding structuring the problem.
Say we train four models on a training set that consists of 3000 observations, and
we make predictions on a test set consisting of 100 observations. We can then store
these predictions as four column vectors in a data frame, but then how do we include
our target variable in this new data frame? Since this new data frame of predictions
would have only 100 observations, our original target vector would not match in
dimension. What am I missing here?


Answers and follow-up questions

Answer or follow-up question 1

Dear Todd,

Your understanding is correct.

Instead of making predictions on a test set, you would make predictions on a validation set (for which you have the target variable).

Then you would have a set of four predictors (your four predictions) and one dependent variable on which you can create your stacking

Michel Ballings

Sign in to be able to add an answer or mark this question as resolved.