A Disparateness-Aware Scheduling using K-Centroids Clustering and PSO Techniques in Hadoop Cluster


Share / Export Citation / Email / Print / Text size:

International Journal of Advanced Network, Monitoring and Controls

Xi'an Technological University

Subject: Computer Science , Software Engineering


eISSN: 2470-8038





Volume / Issue / page

Related articles

VOLUME 1 , ISSUE 2 (December 2016) > List of articles

A Disparateness-Aware Scheduling using K-Centroids Clustering and PSO Techniques in Hadoop Cluster

E. Laxmi Lydia * / M.Ben Swarup

Keywords : K-Centroids Clustering, Big data, Hadoop Cluster, data access locality, data replication, systemreliability, particle swarm optimization

Citation Information : International Journal of Advanced Network, Monitoring and Controls. Volume 1, Issue 2, Pages 34-46, DOI: https://doi.org/10.21307/ijanmc-2016-014

License : (CC BY 4.0)

Published Online: 02-April-2018



Big data storage management is one of the most challenging issues for Hadoop cluster environments, since large amount of data intensive applications frequently involve a high degree of data access locality. In traditional approaches high-performance computing consists dedicated servers that are used to data storage and data replication. Therefore to solve the problems of Disparateness among the jobs and resources a “Disparateness-Aware Scheduling algorithm” is proposed in the cluster environment. In this research work we represent K-centroids clustering in big data mechanism for Hadoop cluster. This approach is mainly focused on the energy consumption in The Hadoop cluster, which helps to increase the system reliability. The Hadoop cluster consists of resources which are categorized for minimizing the scheduling delay in the Hadoop cluster using the K-Centroids clustering algorithm. A novel provisioning mechanism is introduced along with the consideration of load, energy, and network time. By integrating these three parameters, the optimized fitness function is employed for Particle Swarm Optimization (PSO) to select the computing node. Failure may occur after completion of the successful execution in the network. To improve the fault tolerance service, the migration of the cluster is focused on the particular failure node. This can recomputed the node by PSO and the corresponding optimal node is predicted. The experimental results exhibit better scheduling length, scheduling delay, speed up, failure ratio, energy consumption than the existing systems.

Content not available PDF Share



V. Mayer-Schonberger, K. Cukier, Big Data: A Revolution That Will Transform How We Live, Work, and Think, Houghton Mifflin Harcourt, 2013.


A. Cuzzocrea, Privacy and security of big data: current challenges and future research perspectives, in: Proceedings of the First International Workshop on Privacy and Securityof Big Data, PSBD ’14, 2014.


Big data, Nature 455(7209) (2008) 1–136.


Dealing with data, Science 331(6018) (2011) 639–806.


C. O’Neil, R. Schutt, Doing Data Science: Straight Talk from the Frontline, O’Reilly Media, Inc., 2013.


Big data, http://en.wikipedia.org/wiki/Big_data, 2014.


G. Li, X. Cheng, Research status and scientific thinking of big data, Bull. Chin. Acad. Sci. 27(6) (2012) 647–657.


Y. Wang, X. Jin Xueqi, Network big data: present and future, Chinese J. Comput. 36(6) (2013) 1125–1138.


X.-Q. Cheng, X. Jin, Y. Wang, J. Guo, T. Zhang, G. Li, Survey on big data system and analytic technology, J. Softw. 25(9) (2014) 1889–1908.


J.Dean,SGhemawaMapReduceSimplified Data Processing on Large ClusterOSDI’04Sixth Symposium on Operating System Design and Implementation, SanFrancisco CA December,2004


http://www.vmware.com/appliances/directory/up loaded_files/What%20is%20Hadoop.pdf.


Haiyang Li ―PWBRR Algorithm of Hadoop Platform.