Question



Random Forest error on calibration function

Dr. Ballings,

On the Calibrate assignment where we build two functions, I'm having issues running the Random Forest model. I have created a data frame
with my bins. This data frame contains the scores and probabilities for each bin. When I try to run the random forest model based on this
new
data set (data=bins) I get the following error. I have included the structure of my bins data frame below also. I have also tried it with
multiple
size bins. I also made sure there were no missing values in the data frame. If you have any suggestions or advice they would be much
appreciated.

> rFmodel <- randomForest(x=score,
+ y=prob,
+ ntree=1000,
+ data=bins,
+ importance=TRUE)
Error in if (n == 0) stop("data (x) has 0 rows") :
argument is of length zero
In addition: Warning message:
In randomForest.default(x = score, y = prob, ntree = 1000, data = bins, :
The response has five or fewer unique values. Are you sure you want to do regression?

> str(bins)
'data.frame': 500 obs. of 2 variables:
$ score: num 0.001 0.003 0.005 0.007 0.009 0.011 0.013 0.015 0.017 0.019 ...


Thank you,

Bethany
$ prob : num 0.0268 0.0282 0 0 0.069 ...





Answers and follow-up questions





Answer or follow-up question 1

Dear Bethany,

randomForest complains if you have less than 5 unique values in your response variables.
You will get this is you only 4 bins, or when you have more bins but some with the same proportion of ones.

Michel Ballings



Sign in to be able to add an answer or mark this question as resolved.