Question



Should I subtract colMeans from my DTM matrix when I am recreating it in the predict function?

In your code for text mining and creating an SVD, you first "center the data" by subtracting the column means of the reviews_mat before
performing the svd. This is what I did as well in my best algortihm creation function. In the prediction function, when I am recreating
the DTM just for this day's stories, do I still need to subtract the colMeans from the dtm matrix? Wouldn I need to save the column means
from the first matrix and subtract them from this before I use the "reviews_mat %*% s$v %*% solve(diag(s$d))" code to create my variables?


Thanks.





Answers and follow-up questions





Answer or follow-up question 1

Dear student,

"Wouldn I need to save the column means
from the first matrix and subtract them from this before I use the "reviews_mat %*% s$v %*% solve(diag(s$d))" code to create my variables?
"

Indeed.

Michel Ballings



Sign in to be able to add an answer or mark this question as resolved.