k-Nearest Neighbors algorithm

K-Nearest Neighbors (KNN) classification divides data into a test set and a training set. For each row of the test set, the K nearest (in Euclidean distance) training set objects are found, and the classification is determined by majority vote with ties broken at random. If there are ties for the Kth nearest vector, all candidates are included in the vote.

Pros High accuracy, insensitive to outliers, no assumptions about data
Cons Computationally expensive, requires a lot of memory
Works with Numeric values, nominal values

In the following diagram, Blue dots are already planted on a Cartesian Coordinates system and we want to know the neighbors of new fired red dot




from numpy import array, tile  # for createDataSetForGroups() functions
import operator

def createDataSetForGroups():
    group = array([[ 1 , 4], [ 2 , 5 ], [ 3 , 1 ], [ 4 , 2]])
    return group

def createDataSetForLabels():
    labels = ['A', 'A', 'B', 'B']
    return labels

def classify0(inX, dataSet, labels, k):
    #-----------calculating distance start-------------
    dataSetSize = dataSet.shape[0]  # size of input data set
    diffMat = tile(inX, (dataSetSize, 1)) - dataSet  # subtracting distance
    sqDiffMat = diffMat ** 2  # powers calculating
    sqDistances = sqDiffMat.sum(axis=1) 
    distances = sqDistances ** 0.5  # square root   
    sortedDistIndicies = distances.argsort()  # sorting distance
    #-----------calculating distance end-------------
    classCount = {}  # dictionary
    # sorting distances
    for i in range(k):
        voteIlabel = labels[sortedDistIndicies[i]]
        classCount[voteIlabel] = classCount.get(voteIlabel, 0) + 1
    sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)
    return sortedClassCount[0][0]  # return first element

grp = createDataSetForGroups()
lbls = createDataSetForLabels()

print classify0([4, 5], grp, lbls, 3)