Question



HW#2: Calculating Second Half of SUBS dataset

My question concerns whether or not we need to clean up Second Half of SUBS, using impute(), as.matrix() etc... before putting it into the
function:

"u <- head(reviews_mat %*% s$v %*% solve(diag(s$d)))"

I got different answers for calculating u, one with data clean up and one without data clean up. I checked the answer by running
"svd(second_half)" and the answer does not match with my calculation.

Dr. Ballings, can you clarify for me whether I need to repeat what I did to First half of the SUBS data cleaning for the Second half of the
data before I put it into : "u <- head(reviews_mat %*% s$v %*% solve(diag(s$d)))" ?

thanks and please advise,

Jian





Answers and follow-up questions





Answer or follow-up question 1

Dear Jian,

Yes you need to use compute and impute() on the first half and
impute() with the output from compute on the first half on the second half.

Use as.matrix() on both.

Michel Ballings


Answer or follow-up question 2

thank you sir. I have 2 other questions:

1) Do you want us to add the plot of the variance at the end to see if we need to eliminate any columns?

plot(s$d^2/sum(s$d^2),
type="b",
ylab="% variance explained",
xlab="Singular vectors",
xaxt="n"
)
axis(1,at=1:length(s$d^2/sum(s$d^2)))

2) Because we took the Mean out earlier "reviews_mat <- t(t(reviews_mat)-colMeans(reviews_mat))", do we have to add the mean back in every
observation when we calculate the second half of the dataset?

please advise.

Jian




Answer or follow-up question 3

Jian,

No and no.

Michel Ballings



Sign in to be able to add an answer or mark this question as resolved.