Why do we sometimes use the transpose function back to back, and how do we recognize when it is appropriate to do so? In some lines of code, I've noticed the use of the transpose function repeated side by side. Two examples are:
trainKNNbig <- data.frame(t((t(trainKNNbig)-means)/stdev))
My question is, why do we do this, and how do we recognize when this is the right method to use? It seems like transposing an object twice
would just cancel out the original transpose.
Answers and follow-up questions Answer or follow-up question 1
Please observe the following code. It shows the difference with and without the transpose strategy.
The goal is to center the columns: subtract the respective column mean from each column.
Since the columns are constants the result should be all 0.
> (df <- data.frame(a=c(1,1,1),b=c(2,2,2)))
1 1 2
2 1 2
3 1 2
> (means <- colMeans(df))
> df - means
1 0 0
2 -1 1
3 0 0
[1,] 0 0
[2,] 0 0
[3,] 0 0
What happened is that R subtracts the column means row-wise, while we want it to subtract column wise.
Specifically: what happens without the transpose is:
subtract 1 from df[1,1], subtract 2 from df[2,1], subtract 1 from df[3,1], ....
What we want is:
subtract 1 from df[1,1], subtract 1 from df[2,1], subtract 1 from df[3,1]
subtract 2 from df[1,2], subtract 2 from df[2,2], subtract 1 from df[3,2]
The trick is to transpose df first, subtract the means, and then transpose back.Sign in to be able to add an answer or mark this question as resolved.