Developing an fully online data stream clustering algorithm.
The fast development of information technology (IT) leads to generation of massive amount of data or “big data". Big data are daily generated from heterogeneous sources at unprecedented rate. The huge and unbounded series of data points that arrive continuously is referred to as data stream. Comparing to traditional static datasets, data stream poses three additional and special constraints; volume, velocity and variety. Clustering of data stream is one of the vital techniques in the field of stream mining. Traditional data clustering algorithms are best equipped to run one-time on the concept of persistent data sets that are stored reliably in storage. However, several modern applications generate data stream on a continuous basis. Due to volume characteristics of data stream, it is quite impossible and impractical to store the entire data stream in memory for analysis. The data points from data stream passes only once and so multiple scans are not feasible. Low processing time is another requirement to enable real time processing. Given the unprecedented amount of data that will be produced, collected and stored in the coming years, one of the technology industry's great challenges is how to benefit from it. Data analyst always looks for technique which can extract the hidden knowledge in these data stream which can solve social problem towards a comfortable life and a better world for human. Mining data streams is one of such knowledge extraction technique that has attracted the researchers and clustering is a significant part of mining data streams. Increasingly, it is has become a useful, ubiquitous and essential tool in data stream analysis.
The main concern of this research is to design a fully online density based clustering algorithm that handle the challenges of data stream efficiently.