Question



Error running LR with glmnet

Professor,

I took this code from the textbook and applied it to my group's NYSE data

LR <- glmnet(x=data.matrix(BasetableTRAIN),y=yTRAIN,family="binomial")

Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, :
one multinomial or binomial class has 1 or 0 observations; not allowed

I am confused on how we are supposed to run Logistic Regression on our data. The NYSE Volume column is made up of 0s and 1s.
How are we supposed to run the glmnet on this column?

Thanks





Answers and follow-up questions





Answer or follow-up question 1

For Option1 I also tried

LR <- glm(yTRAINbig ˜.,data=BasetableTRAINbig,family=binomial("logit"))

but received this error

Error: unexpected input in "LR <- glm(yTRAINbig ˜"



Answer or follow-up question 2

Dear student,

Have you looked at your data?

Can you post the output of

str(BasetableTRAIN)
and
str(yTRAIN)


Answer or follow-up question 3

> str(BasetableTRAIN)
'data.frame': 116057 obs. of 7 variables:
$ Date : Factor w/ 261 levels "01-Jan-2001",..: 1 1 1 1 1 1 1 1 1 1 ...
$ OpenOfTheDay : num 54.8 33.5 20.2 12.6 15.3 ...
$ HighOfTheDay : num 54.8 33.5 20.2 12.6 15.3 ...
$ LowOfTheDay : num 54.8 33.5 20.2 12.6 15.3 ...
$ CloseOfTheDay: num 54.8 33.5 20.2 12.6 15.3 ...
$ Volume : int 0 0 0 0 0 0 0 0 0 0 ...
$ DV : num 1 1 1 1 1 1 1 1 1 1 ...

> str(yTRAIN)
num [1:116057] 0.944 1.142 1.024 0.962 0.982 ...


Answer or follow-up question 4

Dear student,

Whenever you get an error when using a function, take the following approach:

Step 1:
?functionname

Step 2:
Read the documentation of the arguments that you are using

Step 3:
Make sure the values you pass to the arguments are conform with the documentation.

In this case the documentation tells us:
y "For family="binomial" should be either a factor with two levels, or a two-column matrix of counts or proportions
(the second column is treated as the target class; for a factor, the last level in alphabetical order is the target class)"

This means that you have to change your code as follows:
LR <- glmnet(x=data.matrix(BasetableTRAIN),y=as.factor(yTRAIN),family="binomial")
or make sure yTRAIN is a factor.

Michel Ballings


Answer or follow-up question 5

Thanks professor, I got the LR to run by changing my variables to factors.

I ran the following code

#Option 1: Logistic regression with stepwise variable selection
#Logistic Regression
LR <- glm(yTRAINbig ~ .,data=BasetableTRAINbig,family=binomial("logit"))
#Warning message:
# glm.fit: fitted probabilities numerically 0 or 1 occurred

#Stepwise Variable Selection
LRstep <- step(LR, direction="both", trace = FALSE)
#There were 50 or more warnings (use warnings() to see the first 50)

#Use the model to make a prediction on test data.
predLRstep <- predict(LRstep, newdata=BasetableTEST, type="response")

#Assess the performance of the model
AUC::auc(roc(predLRstep,yTEST))

and received this error at the end

Error in roc(predLRstep, yTEST) :
Not enough distinct predictions to compute area under the ROC curve.



Answer or follow-up question 6

Dear student,

Can you post the output of str(predLRstep) and str(yTEST)?

Thanks,
Michel Ballings


Answer or follow-up question 7

> str(predLRstep)
Named num [1:116057] 1 1 1 1 1 ...
- attr(*, "names")= chr [1:116057] "232115" "232116" "232117" "232118" ...
> str(yTEST)
Factor w/ 60238 levels "0.632688927943761",..: 43070 51587 17226 51539 49024 13849 12133 48577 16806 9268 ...

I also tried converting predLRstep to a factor with
predLRstep <- as.factor(predLRstep)
> str(predLRstep)
Factor w/ 14158 levels "0.999977111573698",..: 4165 3947 3558 3972 1849 2415 2736 3143 4392 4537 ...
- attr(*, "names")= chr [1:116057] "232115" "232116" "232117" "232118" ...

but it output the same error
> AUC::auc(roc(predLRstep,yTRAIN))
Error in roc(predLRstep, yTRAIN) :
Not enough distinct predictions to compute area under the ROC curve.


Answer or follow-up question 8

Professor,

I am still having the same issue. Are my outcomes of str(predLRstep) and str(yTEST) incorrect?


Answer or follow-up question 9

I'm having the same problem, and this is my str(predLRstep):
'data.frame': 231199 obs. of 6 variables:
$ OpenOfTheDay : num 2.37 13.31 48.5 65.2 21.5 ...
$ HighOfTheDay : num 2.37 13.62 50.31 66.09 21.53 ...
$ LowOfTheDay : num 2.36 13.25 45.5 64.62 21 ...
$ CloseOfTheDay: num 2.36 13.62 50.03 65.07 21.15 ...
$ Volume : int 1900 46200 2008400 4336400 129600 64400 33150 293600 17600 23800 ...
$ DV : num 0.996 1.023 1.032 0.998 0.984 ...


Answer or follow-up question 10

Dear student,

You can get this error:
Error in roc(predLRstep, yTRAIN) :
Not enough distinct predictions to compute area under the ROC curve.

whenever:
1) predLRstep and yTRAIN are not of the same length.
2) you have NAs in any of those two vectors
3) yTRAIN is not a factor.

Can you check that?

Michel Ballings



Sign in to be able to add an answer or mark this question as resolved.