Question



Cross Validation Optimal Parameter Consistency

Dr. Ballings,

When we cross validate, we find the optimal parameter for our given algorithm based on the best AUC result from the validation set. From
that, we often re-train our model using that optimal parameter on the trainBIG, or combined training and validation, so that we have more
data to base our predictions on.

My question is how do we know that the same optimal parameter that we found based on solely the training set will remain the best when
combining training and validation to make predictions for the test set. My dilemma is that it seems we are changing the data so there is no
guarantee the best parameter from validation stays the best.

Thanks,

Jon Bockman





Answers and follow-up questions





Answer or follow-up question 1

Dear Jon,

There is no guarantee that a parameter value is still optimal if you change the data set on which you predict (i.e., go from validation to
test).

There is no guarantee that a parameter value is still optimal if you change the data set on which you train (i.e., go from train to
trainbig).

In my experience, it is almost always *unfavorable* to change the predict data and it is almost always *favorable* to increase the train
data to envelop both train and validation data.

When in doubt, as always in data mining, try both and see which one wins.

Michel Ballings



Sign in to be able to add an answer or mark this question as resolved.