Question



NAs in Dataset from Lead Variable

Dear Dr. Ballings,

Last week, I was having an issue in which I had many NAs in my data, which caused my AUC function not to run, along with a few other cases
of trouble. I have been running through the code and checking with is.na after each line, and it turns out that the code to move the
dependent variable values by one (lead) is consistently inserting 1372 NA values in the data. Could you offer any insight? The code used is
below:
stocks.imp$DV_Lead <- unlist(tapply(stocks.imp[,"DV"],stocks.imp[,"tickerSymbol"],
function(x) c(x[-1],NA))) ##This right here is adding NAs
stocks.imp$DV_Lead <- as.factor(stocks.imp$DV_Lead)





Answers and follow-up questions





Answer or follow-up question 1

Dear student,

Yes that is normal. If you're leading your dependent variable (c(x[-1],NA)) you'll get an NA on each last day of each stock.
If you get 1372 NAs that means you have 1372 companies.

You don't have a value for the last day + 1 in your data and therefore you want that NA.
Just remove those rows:

yourdata <- yourdata[!is.na(yourdata$DV_Lead),]

Michel Ballings



Sign in to be able to add an answer or mark this question as resolved.