## Question

HW#2: Calculating Second Half of SUBS dataset

My question concerns whether or not we need to clean up Second Half of SUBS, using impute(), as.matrix() etc... before putting it into the

function:

"u <- head(reviews_mat %*% s$v %*% solve(diag(s$d)))"

I got different answers for calculating u, one with data clean up and one without data clean up. I checked the answer by running

"svd(second_half)" and the answer does not match with my calculation.

Dr. Ballings, can you clarify for me whether I need to repeat what I did to First half of the SUBS data cleaning for the Second half of the

data before I put it into : "u <- head(reviews_mat %*% s$v %*% solve(diag(s$d)))" ?

thanks and please advise,

Jian

## Answers and follow-up questions

** Answer or follow-up question 1** Dear Jian,

Yes you need to use compute and impute() on the first half and

impute() with the output from compute on the first half on the second half.

Use as.matrix() on both.

Michel Ballings

** Answer or follow-up question 2** thank you sir. I have 2 other questions:

1) Do you want us to add the plot of the variance at the end to see if we need to eliminate any columns?

plot(s$d^2/sum(s$d^2),

type="b",

ylab="% variance explained",

xlab="Singular vectors",

xaxt="n"

)

axis(1,at=1:length(s$d^2/sum(s$d^2)))

2) Because we took the Mean out earlier "reviews_mat <- t(t(reviews_mat)-colMeans(reviews_mat))", do we have to add the mean back in every

observation when we calculate the second half of the dataset?

please advise.

Jian

** Answer or follow-up question 3** Jian,

No and no.

Michel Ballings

Sign in to be able to add an answer or mark this question as resolved.