Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / artificial-intelligence / machine-learning

Data Clustering Simulation in Python and PyGame

5.00/5 (1 vote)
14 Jun 2012CPOL2 min read 20.8K   402  
Clustering of 2D data Using Python and simulation in PyGame
In this article, you will see how to cluster 2D data using Python and simulation in PyGame.

Image 1Image 2

Introduction

In this article, I will explain the implementation of K-Mean algorithm which is being used in Machine Learning. In the above figures, on the left is unclustered data whereas on the right is clustered in 10 clusters. For this, I have created two files:

  1. pyDataCluster.py
  2. clusterSimulation.py

File 1 contains an implementation class of K-Means and File 2 is a simulation file written with pyGame (a game library for Python). pyDataCluster class returns the clustered data so data can be viewed in console too.

Background

Machine Learning is an advanced step in AI. Instead of creating a complex algorithm, simple algorithms are used with large amount of previous data to get the optimized results. This process is the base of Learning Algorithm. Clustering is a process where data is grouped in classes. To group the data, different parameters can be employed depending upon the situation. In K-Means algorithm, we cluster in groups by using the mean values of each Cluster which is computed by taking raw data and then processing it repeatedly until mean is not stable.

Basic Workflow

The basic workflow is as follows:

  1. Get the data.
  2. Set the number of clusters you want.
  3. Create an empty 2D array to store the clustered data.
  4. For each Cluster, get a random point value which will serve as initial means.
  5. For each point, calculate the distance with respect to mean.
  6. Put the point in cluster with minimum distance.
  7. Recalculate the means for every cluster and update the means.
  8. Use this updated mean to step 5, repeat until mean from two consecutive repetitions become equal.

Using the Code

Let's look at the code.

Firstly, the clustering class:

To use this class in your code, do this:

Python
from pyDataCluster import *

data=[]
groups=10
for i in range(5000):
    data.append([random.randint(1,500),random.randint(1,500)])
    
cluster = pyDataCluster(groups,data)   

This will randomly initialize the data and will create an object named cluster with 10 groups and data array.

Python
finalCluster = cluster.finalCluster() # return the final cluster
Python
clus = cluster.createCluster() # will return a cluster but not final  

Initialization

The class constructor will initialize the class variable.

Python
def __init__(self,numberOfCluster,Data,initialPoints=[]):
        '''
        Constructor
        '''
        self.Kgroups=numberOfCluster
        self.Data=Data
        self.Cluster=[]
        self.Kmeans=initialPoints
        self.initialMeanPositions()
        self.terminat=True

Either pass the initial points or leave it. initialMeanPositons() will initialize this for you.

Create Cluster

Python
def createCluster(self):
        self.clusterSpace()
        for i in self.Data:
            point=[i[0],i[1]]
            group=self.getClusterGroup(point)
            self.Cluster[group].append(i)
        self.setMeans()
        return(self.Cluster)

This function is the workhorse of the class. It will create the clusters of data on the given mean points. Repeatedly calling this function on the given data will result in better clusters.

Final Cluster

To get the final cluster, this will do the job:

Python
def finalCluster(self):
       while self.terminat:
           clus=self.createCluster()
       return(clus)

This function just goes in a loop until termination signal is not given by the setMeans function.

setMeans

To set the mean, this function will do the job as said in basic workflow:

Python
def setMeans(self):
        means=[]
        x=0
        y=0
        for i in self.Cluster:
            for j in i:
                x=x+j[0]
                y=y+j[1]
            means.append([math.floor(x/len(i)),math.floor(y/len(i))])
            x=0
            y=0
        if(self.Kmeans==means):
            self.terminat=False
        self.Kmeans=[]
        self.Kmeans=means

Assigning the Cluster Group

This function will return the group index where a given point belongs:

Python
def getClusterGroup(self,point):
        dist=[]
        for i in self.Kmeans:
            dist.append(math.fabs(point[0]-i[0])+math.fabs(point[1]-i[1]))
        minIndex = dist.index(min(dist))
        return minIndex

Empty Cluster

For every run, you will need an empty cluster, this function will flush the old values if any and create an empty one:

Python
def clusterSpace(self):
       self.Cluster=[]
       for i in range(self.Kgroups):
           self.Cluster.append([])

Up to this, the Clustering is completed and now the Simulation Part.

clusterSimulation

This requires the PyGame library which can be downloaded from their site.

Python
import pygame, sys, time
from pygame.locals import *
from pyDataCluster import *

data=[]
groups=10
for i in range(5000):
    data.append([random.randint(1,500),random.randint(1,500)])
    
cluster = pyDataCluster(groups,data)
Color=[]
for i in range(groups):
    
    while True:
        cl=((random.randint(0,255)),(random.randint(0,255)),(random.randint(0,255)))
        if cl not in Color:   
            Color.append(cl)
            break
pygame.init()
WINDOWWIDTH = 500
WINDOWHEIGHT = 500
BASICFONT = pygame.font.Font('freesansbold.ttf',50)
windowSurface = pygame.display.set_mode((WINDOWWIDTH, WINDOWHEIGHT), 0, 32)
pygame.display.set_caption('Cluster Simulation')

BLACK = (0, 0, 0)
RED = (255, 0, 0)
GREEN = (0, 255, 0)
BLUE = (0, 0, 255)
WHITE=(255,255,255)

while cluster.terminat:
        points=[]
        
        clus=cluster.createCluster()
        a=0
        for i in clus:
            
            for j in i:
                points.append({'rect':pygame.Rect(j[0],j[1],4,4),'color':Color[a]})
            a=a+1
        for p in points:        
                pygame.draw.rect(windowSurface, p['color'], p['rect'])
        pygame.display.update()
    #time.sleep(0.05)

while True:
    # check for the QUIT event
    for event in pygame.event.get():
        if event.type == QUIT:
            pygame.quit()
            sys.exit()  

Try changing the data amount and groups to see the effects.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)