AUC ROC calculation explored

Dr. Ballings,

I ran some tests on AUC using roc as shown below. It appears that it considers anything >= .51 as a 1, and anything <.51 as a 0.

I'm confused because I thought the scores our models produce are not probabilities? But it looks like here AUC using ROC depicts them

as probabilities. anything <.51 is considered a 0, and anything >=.51 is considered a 1. Can you please elaborate?

y <- as.factor(c(rep(1,1000),rep(0,1000)))

pred <- as.numeric(c(rep(.51,1000),rep(.51,1000)))

AUC::auc(roc(pred,y))

#0.5

#1000/2000 = 0.5

y <- as.factor(c(rep(1,1000),rep(0,1000)))

pred <- as.numeric(c(rep(.51,1000),rep(.5,1000)))

AUC::auc(roc(pred,y))

#1

#2000/2000 = 1.0

#it counts .51 as predicting a 1, and .50 as predicting a 0?

y <- as.factor(c(rep(1,1000),rep(0,1000)))

pred <- as.numeric(c(rep(.51,1500),rep(.50,500)))

AUC::auc(roc(pred,y))

#0.75

#1500/2000 = 0.75

y <- as.factor(c(rep(1,1000),rep(0,1000)))

pred <- as.numeric(c(rep(1,1000),rep(0,1000)))

AUC::auc(roc(pred,y))

#1

#2000/2000 = 1.0

y <- as.factor(c(rep(1,1000),rep(0,1000)))

pred <- as.numeric(c(rep(1,1800),rep(0,200)))

AUC::auc(roc(pred,y))

#0.6

#1200/2000 = 0.6

Thanks,

Michael

Dear Michael,

"It appears that it considers anything >= .51 as a 1, and anything <.51 as a 0."

No it does not. Your examples do not support that.

"I'm confused because I thought the scores our models produce are not probabilities? "

Correct it are scores and not probabilities.

"But it looks like here AUC using ROC depicts them

as probabilities. anything <.51 is considered a 0, and anything >=.51 is considered a 1. Can you please elaborate?"

Your examples do not suggest that.

Yes if you have only two unique values, A and B, then A=0 and B=1. A can be anything you want, including 0.50 or 0.51

For example, the following are equivalent in terms of AUC:

> y <- as.factor(c(rep(1,1000),rep(0,1000)))

> pred <- as.numeric(c(rep(.51,1000),rep(.5,1000)))

> auc(roc(pred,y))

[1] 1

>

> pred <- as.numeric(c(rep(0.676,1000),rep(0.41435,1000)))

> auc(roc(pred,y))

[1] 1

>

> pred <- as.numeric(c(rep(1,1000),rep(0,1000)))

> auc(roc(pred,y))

[1] 1

Note that you get a perfect AUC because your predictions perfectly separate the labels. See what happens if the separation is not perfect

anymore:

> pred <- as.numeric(c(rep(1,1001),rep(0,999)))

> auc(roc(pred,y))

[1] 0.9995

>

> pred <- as.numeric(c(rep(1,1010),rep(0,990)))

> auc(roc(pred,y))

[1] 0.995

>

> pred <- as.numeric(c(rep(1,1100),rep(0,900)))

> auc(roc(pred,y))

[1] 0.95

Michel Ballings

Sign in to be able to add an answer or mark this question as resolved.