Error running LR with glmnet

Professor,

I took this code from the textbook and applied it to my group's NYSE data

LR <- glmnet(x=data.matrix(BasetableTRAIN),y=yTRAIN,family="binomial")

Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, :

one multinomial or binomial class has 1 or 0 observations; not allowed

I am confused on how we are supposed to run Logistic Regression on our data. The NYSE Volume column is made up of 0s and 1s.

How are we supposed to run the glmnet on this column?

Thanks

For Option1 I also tried

LR <- glm(yTRAINbig Ëœ.,data=BasetableTRAINbig,family=binomial("logit"))

but received this error

Error: unexpected input in "LR <- glm(yTRAINbig Ëœ"

Dear student,

Have you looked at your data?

Can you post the output of

str(BasetableTRAIN)

and

str(yTRAIN)

> str(BasetableTRAIN)

'data.frame': 116057 obs. of 7 variables:

$ Date : Factor w/ 261 levels "01-Jan-2001",..: 1 1 1 1 1 1 1 1 1 1 ...

$ OpenOfTheDay : num 54.8 33.5 20.2 12.6 15.3 ...

$ HighOfTheDay : num 54.8 33.5 20.2 12.6 15.3 ...

$ LowOfTheDay : num 54.8 33.5 20.2 12.6 15.3 ...

$ CloseOfTheDay: num 54.8 33.5 20.2 12.6 15.3 ...

$ Volume : int 0 0 0 0 0 0 0 0 0 0 ...

$ DV : num 1 1 1 1 1 1 1 1 1 1 ...

> str(yTRAIN)

num [1:116057] 0.944 1.142 1.024 0.962 0.982 ...

Dear student,

Whenever you get an error when using a function, take the following approach:

Step 1:

?functionname

Step 2:

Read the documentation of the arguments that you are using

Step 3:

Make sure the values you pass to the arguments are conform with the documentation.

In this case the documentation tells us:

y "For family="binomial" should be either a factor with two levels, or a two-column matrix of counts or proportions

(the second column is treated as the target class; for a factor, the last level in alphabetical order is the target class)"

This means that you have to change your code as follows:

LR <- glmnet(x=data.matrix(BasetableTRAIN),y=as.factor(yTRAIN),family="binomial")

or make sure yTRAIN is a factor.

Michel Ballings

Thanks professor, I got the LR to run by changing my variables to factors.

I ran the following code

#Option 1: Logistic regression with stepwise variable selection

#Logistic Regression

LR <- glm(yTRAINbig ~ .,data=BasetableTRAINbig,family=binomial("logit"))

#Warning message:

# glm.fit: fitted probabilities numerically 0 or 1 occurred

#Stepwise Variable Selection

LRstep <- step(LR, direction="both", trace = FALSE)

#There were 50 or more warnings (use warnings() to see the first 50)

#Use the model to make a prediction on test data.

predLRstep <- predict(LRstep, newdata=BasetableTEST, type="response")

#Assess the performance of the model

AUC::auc(roc(predLRstep,yTEST))

and received this error at the end

Error in roc(predLRstep, yTEST) :

Not enough distinct predictions to compute area under the ROC curve.

Dear student,

Can you post the output of str(predLRstep) and str(yTEST)?

Thanks,

Michel Ballings

> str(predLRstep)

Named num [1:116057] 1 1 1 1 1 ...

- attr(*, "names")= chr [1:116057] "232115" "232116" "232117" "232118" ...

> str(yTEST)

Factor w/ 60238 levels "0.632688927943761",..: 43070 51587 17226 51539 49024 13849 12133 48577 16806 9268 ...

I also tried converting predLRstep to a factor with

predLRstep <- as.factor(predLRstep)

> str(predLRstep)

Factor w/ 14158 levels "0.999977111573698",..: 4165 3947 3558 3972 1849 2415 2736 3143 4392 4537 ...

- attr(*, "names")= chr [1:116057] "232115" "232116" "232117" "232118" ...

but it output the same error

> AUC::auc(roc(predLRstep,yTRAIN))

Error in roc(predLRstep, yTRAIN) :

Not enough distinct predictions to compute area under the ROC curve.

Professor,

I am still having the same issue. Are my outcomes of str(predLRstep) and str(yTEST) incorrect?

I'm having the same problem, and this is my str(predLRstep):

'data.frame': 231199 obs. of 6 variables:

$ OpenOfTheDay : num 2.37 13.31 48.5 65.2 21.5 ...

$ HighOfTheDay : num 2.37 13.62 50.31 66.09 21.53 ...

$ LowOfTheDay : num 2.36 13.25 45.5 64.62 21 ...

$ CloseOfTheDay: num 2.36 13.62 50.03 65.07 21.15 ...

$ Volume : int 1900 46200 2008400 4336400 129600 64400 33150 293600 17600 23800 ...

$ DV : num 0.996 1.023 1.032 0.998 0.984 ...

Dear student,

You can get this error:

Error in roc(predLRstep, yTRAIN) :

Not enough distinct predictions to compute area under the ROC curve.

whenever:

1) predLRstep and yTRAIN are not of the same length.

2) you have NAs in any of those two vectors

3) yTRAIN is not a factor.

Can you check that?

Michel Ballings

Sign in to be able to add an answer or mark this question as resolved.