Question



AUC ROC calculation explored

Dr. Ballings,
I ran some tests on AUC using roc as shown below. It appears that it considers anything >= .51 as a 1, and anything <.51 as a 0.
I'm confused because I thought the scores our models produce are not probabilities? But it looks like here AUC using ROC depicts them
as probabilities. anything <.51 is considered a 0, and anything >=.51 is considered a 1. Can you please elaborate?

y <- as.factor(c(rep(1,1000),rep(0,1000)))
pred <- as.numeric(c(rep(.51,1000),rep(.51,1000)))
AUC::auc(roc(pred,y))
#0.5
#1000/2000 = 0.5


y <- as.factor(c(rep(1,1000),rep(0,1000)))
pred <- as.numeric(c(rep(.51,1000),rep(.5,1000)))
AUC::auc(roc(pred,y))
#1
#2000/2000 = 1.0
#it counts .51 as predicting a 1, and .50 as predicting a 0?

y <- as.factor(c(rep(1,1000),rep(0,1000)))
pred <- as.numeric(c(rep(.51,1500),rep(.50,500)))
AUC::auc(roc(pred,y))
#0.75
#1500/2000 = 0.75

y <- as.factor(c(rep(1,1000),rep(0,1000)))
pred <- as.numeric(c(rep(1,1000),rep(0,1000)))
AUC::auc(roc(pred,y))
#1
#2000/2000 = 1.0

y <- as.factor(c(rep(1,1000),rep(0,1000)))
pred <- as.numeric(c(rep(1,1800),rep(0,200)))
AUC::auc(roc(pred,y))
#0.6
#1200/2000 = 0.6

Thanks,
Michael






Answers and follow-up questions





Answer or follow-up question 1

Dear Michael,

"It appears that it considers anything >= .51 as a 1, and anything <.51 as a 0."

No it does not. Your examples do not support that.

"I'm confused because I thought the scores our models produce are not probabilities? "

Correct it are scores and not probabilities.

"But it looks like here AUC using ROC depicts them
as probabilities. anything <.51 is considered a 0, and anything >=.51 is considered a 1. Can you please elaborate?"

Your examples do not suggest that.
Yes if you have only two unique values, A and B, then A=0 and B=1. A can be anything you want, including 0.50 or 0.51

For example, the following are equivalent in terms of AUC:

> y <- as.factor(c(rep(1,1000),rep(0,1000)))
> pred <- as.numeric(c(rep(.51,1000),rep(.5,1000)))
> auc(roc(pred,y))
[1] 1
>
> pred <- as.numeric(c(rep(0.676,1000),rep(0.41435,1000)))
> auc(roc(pred,y))
[1] 1
>
> pred <- as.numeric(c(rep(1,1000),rep(0,1000)))
> auc(roc(pred,y))
[1] 1

Note that you get a perfect AUC because your predictions perfectly separate the labels. See what happens if the separation is not perfect
anymore:

> pred <- as.numeric(c(rep(1,1001),rep(0,999)))
> auc(roc(pred,y))
[1] 0.9995
>
> pred <- as.numeric(c(rep(1,1010),rep(0,990)))
> auc(roc(pred,y))
[1] 0.995
>
> pred <- as.numeric(c(rep(1,1100),rep(0,900)))
> auc(roc(pred,y))
[1] 0.95

Michel Ballings



Sign in to be able to add an answer or mark this question as resolved.