Question



Euclidean Distance for K-Nearest Neighbors

If, in order to manually identify the 5 nearest neighbors, one must calculate the distance for every new instance on every point in the
training set, for 7 new instances and 7 values in the training data, would that entail calculating 49 distances (then one would choose the
five smallest distances)? Am I thinking about this right?






Answers and follow-up questions





Answer or follow-up question 1

Dear student,

Yes. That's right.

Michel Ballings


Answer or follow-up question 2

Should we be manually inputting the euclidean distance function for the 7 observations in both the train and new data set? Such as
dissimilarity observation one = sqrt((tenure;x1new(3) - x1train(3))^2 + (spend;x2new(2) - x2train(3))^2). If so would this not give you 7
distances instead of 49 since there is only 7 observations in the training and new set.


Answer or follow-up question 3

Dear student,

For each new instance, you compute the distance with all training instances.

If there are 7 new instances and 7 training instances, then you get 49 distances.

Michel Ballings


Answer or follow-up question 4

Would you mind helping me comb through my for loop to calculate the 49 distances? I'd appreciate that!

df_distance <- c()
for (i in 1:7) {
for (j in 1:7) {
append(df_distance,((df_new[3,j]-df_train[3,i])^2+(df_new[2,j]-df_train[2,i])^2))
}
}


Answer or follow-up question 5

Dear student,

You're close.

Here is a hint:

The first position in the square brackets is for the rows and the second position is for the columns.

Michel Ballings


Answer or follow-up question 6

Ah, of course. I switched the dimensions, but I am still left with a null vector in df_distance. Could append be the wrong function to use?


Answer or follow-up question 7

There is no reason to use append.

If distance is a numeric vector, just store it in as distance[ii], and add ii <- ii + 1 in the inner loop before that.

Michel Ballings



Sign in to be able to add an answer or mark this question as resolved.