Euclidean Distance for K-Nearest Neighbors

If, in order to manually identify the 5 nearest neighbors, one must calculate the distance for every new instance on every point in the
training set, for 7 new instances and 7 values in the training data, would that entail calculating 49 distances (then one would choose the
five smallest distances)? Am I thinking about this right?

Answers and follow-up questions

Answer or follow-up question 1

Dear student,

Yes. That's right.

Michel Ballings

Answer or follow-up question 2

Should we be manually inputting the euclidean distance function for the 7 observations in both the train and new data set? Such as
dissimilarity observation one = sqrt((tenure;x1new(3) - x1train(3))^2 + (spend;x2new(2) - x2train(3))^2). If so would this not give you 7
distances instead of 49 since there is only 7 observations in the training and new set.

Answer or follow-up question 3

Dear student,

For each new instance, you compute the distance with all training instances.

If there are 7 new instances and 7 training instances, then you get 49 distances.

Michel Ballings

Answer or follow-up question 4

Would you mind helping me comb through my for loop to calculate the 49 distances? I'd appreciate that!

df_distance <- c()
for (i in 1:7) {
for (j in 1:7) {

Answer or follow-up question 5

Dear student,

You're close.

Here is a hint:

The first position in the square brackets is for the rows and the second position is for the columns.

Michel Ballings

Answer or follow-up question 6

Ah, of course. I switched the dimensions, but I am still left with a null vector in df_distance. Could append be the wrong function to use?

Answer or follow-up question 7

There is no reason to use append.

If distance is a numeric vector, just store it in as distance[ii], and add ii <- ii + 1 in the inner loop before that.

Michel Ballings

Sign in to be able to add an answer or mark this question as resolved.