## Question

Euclidean Distance for K-Nearest Neighbors

If, in order to manually identify the 5 nearest neighbors, one must calculate the distance for every new instance on every point in the

training set, for 7 new instances and 7 values in the training data, would that entail calculating 49 distances (then one would choose the

five smallest distances)? Am I thinking about this right?

## Answers and follow-up questions

** Answer or follow-up question 1**Dear student,

Yes. That's right.

Michel Ballings

** Answer or follow-up question 2**Should we be manually inputting the euclidean distance function for the 7 observations in both the train and new data set? Such as

dissimilarity observation one = sqrt((tenure;x1new(3) - x1train(3))^2 + (spend;x2new(2) - x2train(3))^2). If so would this not give you 7

distances instead of 49 since there is only 7 observations in the training and new set.

** Answer or follow-up question 3** Dear student,

For each new instance, you compute the distance with all training instances.

If there are 7 new instances and 7 training instances, then you get 49 distances.

Michel Ballings

** Answer or follow-up question 4**Would you mind helping me comb through my for loop to calculate the 49 distances? I'd appreciate that!

df_distance <- c()

for (i in 1:7) {

for (j in 1:7) {

append(df_distance,((df_new[3,j]-df_train[3,i])^2+(df_new[2,j]-df_train[2,i])^2))

}

}

** Answer or follow-up question 5**Dear student,

You're close.

Here is a hint:

The first position in the square brackets is for the rows and the second position is for the columns.

Michel Ballings

** Answer or follow-up question 6** Ah, of course. I switched the dimensions, but I am still left with a null vector in df_distance. Could append be the wrong function to use?

** Answer or follow-up question 7** There is no reason to use append.

If distance is a numeric vector, just store it in as distance[ii], and add ii <- ii + 1 in the inner loop before that.

Michel Ballings

Sign in to be able to add an answer or mark this question as resolved.