Question



Deploying the model given by the calibrate function on unbinned or binned data?

I do not understand how to properly use the random forest model given by the calibrate function. Let me explain my understanding of how the
random forest model is created to ensure that that process is correct, then my question will be after that.

1. The calibrate function takes predictions from another model (which I'll call A's), bins them, and takes the midpoint as X's.
2. To calculate Y's, it takes the observed/actual responses from each row (which I'll call B's) in a bin and averages them.
3. It trains a random forest model on these X's & Y's.
4. Finally the number of bins are cross-validated.

Is this random forest model meant to be run on A's & B's (original predictions) or X's & Y's (binned predictions and responses)?

If it is meant to be run on A's & B's then how are we to plot the uncalibrated vs. calibrated when the calibrated will have many more points
than the uncalibrated (but still binned)?

If it is meant to run on X's & Y's then when validating the number of bins should it validated on X's & Y's too (I currently have mine
validated with A's & B's)?






Answers and follow-up questions





Answer or follow-up question 1

Dear student,

"Is this random forest model meant to be run on A's & B's (original predictions) or X's & Y's (binned predictions and responses)?"

On the X's & Y's (binned predictions and responses).

"If it is meant to run on X's & Y's then when validating the number of bins should it validated on X's & Y's too (I currently have mine
validated with A's & B's)?"

I don't understand this question. Please reword this. The prediction calibration function will be used on A.

Michel






Sign in to be able to add an answer or mark this question as resolved.