Fuzzy partition technique for clustering Big Urban dataset | IEEE Conference Publication | IEEE Xplore

Fuzzy partition technique for clustering Big Urban dataset


Abstract:

Smart cities are collecting and producing massive amount of data from various data sources such as local weather stations, LIDAR data, mobile phones sensors, Internet of ...Show More

Abstract:

Smart cities are collecting and producing massive amount of data from various data sources such as local weather stations, LIDAR data, mobile phones sensors, Internet of Things (IoT) etc. To use such large volume of data for potential benefits, it is important to store and analyse data using efficient and effective big data algorithms. However, this can be problematic due to many challenges. This article explores some of these challenges and tested the performance of two partition algorithms for clustering such Big Urban Datasets. Two handy clustering algorithms the K-Means vs. the Fuzzy c-Mean (FCM) were put to the test. The purpose of clustering urban data is to categorize it into homogeneous groups according to specific attributes. Clustering Big Urban Data in compact format represents the information of the whole data and this can benefit researchers to deal with this reorganised data much efficiently. To achieve this end, the two techniques were utilised against a large set of Lidar data to show how they perform on the same hardware set-up. Our experiments conclude that FCM outperformed the K-Means when presented with such type of dataset, however the latter is less demanding on the hardware utilisation.
Date of Conference: 13-15 July 2016
Date Added to IEEE Xplore: 01 September 2016
ISBN Information:
Conference Location: London, UK

I. Introduction

Many ongoing and recent researches and development in computation and data storing technologies have contributed to production of the Big Data phenomena. The challenges of Big Data are due to the 5V's which are: Volume, Velocity, Variety, Veracity and Value to be gained from the analysis of Big Data [1]. From the survey of the literature, there is an agreement between data scientists about the general attributes that characterise Big Data 5V's which can be summed as follows:

Very large data mainly in Ter-abytes/Petabytes/Exabyte's of data (Volume).

Data can be found in structured, unstructured and semi-structured forms (Variety).

Often incomplete data and inaccessible.

Data sets extraction should be from reliable and verified sources.

Data can be streaming at very high speed (Velocity).

Data can be very complex with interrelationships and high dimensionality.

Data may contain few complex interrelationships between different elements.

Contact IEEE to Subscribe

References

References is not available for this document.