Coding is secondary here, you need an algorithm first. There is a good advice
here.
For calculation of similarity measure, there is the simpler Euclidean distance based, and the more complicated Gaussian function based depending on the problem and the nature of the data set, just to mention 2 of them. You should try them out to compare their accuraries using the same data set. Ask google for more information on this area.