## Question

Lead variable for predictions

Dear Dr. Ballings,

I ran the following code to make a lag variable:

DATA1$lag <- unlist(tapply(DATA1[,"DV"],DATA1[,"Symbol"],

function(x) c(NA,x[-length(x)])))

It worked on my data set.

Then I tried the following code to make a lead variable, it changes the c(NA,x[-length(x) to c(NA,x,[+length(x)

DATA1$lead <- unlist(tapply(DATA1[,"DV"],DATA1[,"Symbol"],

function(x) c(NA,x[+length(x)])))

The error message says:

Error in `$<-.data.frame`(`*tmp*`, "lead", value = c(NA, 0, NA, 0, NA, :

replacement has 2744 rows, data has 348171

I looked at numerous codes on Google and it appears that they all require me to install a new package to create a lead variable. Why does

the function from the book not work when you simply change the '"-" to a "+"? It seems like the change from "-" to "+" would change it from

a lag variable to a lead variable.

Is there a way to create an effective lead variable without installing a new package?

Thanks for your help.

## Answers and follow-up questions

** Answer or follow-up question 1**Dear student,

I won't give you the solution, as this is part of the assignment, but I will give you a strong hint.

This part is incorrect:

c(NA,x[+length(x)])

What you a doing is:

Take all values of DV by Symbol. Per Symbol, apply the following function:

c(NA,x[+length(x)])

That function is adding a leading NA to a vector of length 1 containing only its last element (i.e., you select

the element at the last position of the vector).

This is clearly not what you want.

Michel Ballings

Sign in to be able to add an answer or mark this question as resolved.