HW#2: Calculating Second Half of SUBS dataset My question concerns whether or not we need to clean up Second Half of SUBS, using impute(), as.matrix() etc... before putting it into the
"u <- head(reviews_mat %*% s$v %*% solve(diag(s$d)))"
I got different answers for calculating u, one with data clean up and one without data clean up. I checked the answer by running
"svd(second_half)" and the answer does not match with my calculation.
Dr. Ballings, can you clarify for me whether I need to repeat what I did to First half of the SUBS data cleaning for the Second half of the
data before I put it into : "u <- head(reviews_mat %*% s$v %*% solve(diag(s$d)))" ?
thanks and please advise,
Answers and follow-up questions Answer or follow-up question 1
Yes you need to use compute and impute() on the first half and
impute() with the output from compute on the first half on the second half.
Use as.matrix() on both.
Michel Ballings Answer or follow-up question 2
thank you sir. I have 2 other questions:
1) Do you want us to add the plot of the variance at the end to see if we need to eliminate any columns?
ylab="% variance explained",
2) Because we took the Mean out earlier "reviews_mat <- t(t(reviews_mat)-colMeans(reviews_mat))", do we have to add the mean back in every
observation when we calculate the second half of the dataset?
Answer or follow-up question 3
No and no.
Michel BallingsSign in to be able to add an answer or mark this question as resolved.