## Question

Optimizing bins for calibration

When creating the calibrate function, I am having issues with understanding how to create bins for the data.

I understand that we bin based on the scores; however, I am not sure how to approach it.

Do we optimize by different sets of values? ( .00 - .09, .10 - .19, .2-.29, etc. )

Do we only check bin sizes that are all equal in number of observations? ( which(nrows(data.frame) %% binsize == 0 )

Or do we try all possible amounts of bins after sorting? ( for (i in 1:nrow(data.frame) ) {split(data.frame,(seq(nrow(data.frame))-1)%/%

i)}

I understand we are suppose to find the best AUROC, but are there constraints on what can be a bin?

## Answers and follow-up questions

** Answer or follow-up question 1**Dear students,

There are no constraints.

The best advise I can give you is this:

Think about what you want your calibration basetable to look like.

It is essentially a dataset with translation (if then else) rules.

Because there are gaps in the table we need to build a model on it to extrapolate.

Michel Ballings

Sign in to be able to add an answer or mark this question as resolved.