## Question

Random Forest error on calibration function

Dr. Ballings,

On the Calibrate assignment where we build two functions, I'm having issues running the Random Forest model. I have created a data frame

with my bins. This data frame contains the scores and probabilities for each bin. When I try to run the random forest model based on this

new

data set (data=bins) I get the following error. I have included the structure of my bins data frame below also. I have also tried it with

multiple

size bins. I also made sure there were no missing values in the data frame. If you have any suggestions or advice they would be much

appreciated.

> rFmodel <- randomForest(x=score,

+ y=prob,

+ ntree=1000,

+ data=bins,

+ importance=TRUE)

Error in if (n == 0) stop("data (x) has 0 rows") :

argument is of length zero

In addition: Warning message:

In randomForest.default(x = score, y = prob, ntree = 1000, data = bins, :

The response has five or fewer unique values. Are you sure you want to do regression?

> str(bins)

'data.frame': 500 obs. of 2 variables:

$ score: num 0.001 0.003 0.005 0.007 0.009 0.011 0.013 0.015 0.017 0.019 ...

Thank you,

Bethany

$ prob : num 0.0268 0.0282 0 0 0.069 ...

## Answers and follow-up questions

** Answer or follow-up question 1**Dear Bethany,

randomForest complains if you have less than 5 unique values in your response variables.

You will get this is you only 4 bins, or when you have more bins but some with the same proportion of ones.

Michel Ballings

Sign in to be able to add an answer or mark this question as resolved.