Question



Is it better to delete row of NA's or impute mode?

Dr. Ballings,

When i lead my dependent variable on the stock project it leaves NA's for the last date (12-31) in the dependent variable column. For KNN
your data cannot contain NA's. Therefore, would it be better to omit all NA's in which it will delete all rows of date 12-31 (since they all
contain NA's). Or would it work better to impute the mode for each stock on date 12-31and still include the rows in our dataset (in which
case, mine would be all 0's for the dependent variable on 12-31)?

Thanks,
John





Answers and follow-up questions





Answer or follow-up question 1

Dear John,

That last row is the data you would use to make your prediction to submit to the website.
Therefore you cannot include it in the training set.

Exclude it from the training set and do not impute it.

Michel Ballings




Sign in to be able to add an answer or mark this question as resolved.