Question



Optimizing bins for calibration

When creating the calibrate function, I am having issues with understanding how to create bins for the data.
I understand that we bin based on the scores; however, I am not sure how to approach it.

Do we optimize by different sets of values? ( .00 - .09, .10 - .19, .2-.29, etc. )
Do we only check bin sizes that are all equal in number of observations? ( which(nrows(data.frame) %% binsize == 0 )
Or do we try all possible amounts of bins after sorting? ( for (i in 1:nrow(data.frame) ) {split(data.frame,(seq(nrow(data.frame))-1)%/%
i)}

I understand we are suppose to find the best AUROC, but are there constraints on what can be a bin?





Answers and follow-up questions





Answer or follow-up question 1

Dear students,

There are no constraints.

The best advise I can give you is this:
Think about what you want your calibration basetable to look like.

It is essentially a dataset with translation (if then else) rules.
Because there are gaps in the table we need to build a model on it to extrapolate.

Michel Ballings



Sign in to be able to add an answer or mark this question as resolved.