Clustering the Wireless Sensor Networks: A Meta-Heuristic Approach

Lifetime is one of the most critical indexes of the Wireless Sensor Network (WSN). In this paper, we propose a clustering protocol based on the meta-heuristic approach (CPMA). CPMA takes the network lifetime as the primary consideration and consists of two parts. The first part focuses on the online cluster head selection and network communication coordination. The selection is based on the Harmony Search (HS) Algorithm, which aims to reduce the total energy dissipation and smooth the energy distribution throughout the network. Currently, most clustering protocols cannot automatically tune the corresponding protocol parameters according to the diversity of different WSNs. To solve such issue, the second part of CPMA uses the Artificial Bee Colony (ABC) algorithm to optimize its crucial parameters. The optimization is offline and will be executed only once before the network is working. We make a detailed comparison of CPMA with classical clustering protocols. The results show that CPMA can better prolong the network lifetime and improve network throughput under almost all the network conditions. Furthermore, our simulation also exhibits that CPMA has good adaptability and performs well under different network lifetime definitions. All the results prove that CPMA has the advantages of being suitable and efficient for a wide number of WSN applications.


I. INTRODUCTION
The rapid advances in the field of integrated sensors, low power wireless transceivers, and microcontroller units have made low-cost multi-functional tiny sensing platforms available [1] in the last decades. As the former existence and the technology foundation for the future Internet of Things (IoT) [2], wireless sensor networks play an essential role in many application areas such as security guarding, transport controlling, health and environmental monitoring, etc [3].
Sensing is usually a long-term task. The lifetime is one of the most critical indexes of WSNs [4]. Typically, the WSN is The associate editor coordinating the review of this manuscript and approving it for publication was Fakhrul Alam . composed of one base station (BS) and plenty of sensors. Sensors are battery-powered and are usually deployed randomly throughout the objected area. When the distance between the sensor and the BS is large, the substantial transmission energy cost depletes the sensor battery rapidly. Replacing the battery of the sensor is hard or even impossible in some harsh scenarios. Hence, effective routing protocols are requisite to avoid the sensor depletion and prolong the network lifetime [5].
Based on the structure of WSNs, routing protocols can be divided into flat and hierarchy [6]. Flat routing protocols treat all sensors equally. To save the transmission energy cost, the data produced by sensors may be routed multi-hops to the BS. Nevertheless, unlike traditional wireless networks, WSN has the ''many to one '' character, which means that the sensors VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ close to the BS will consume more energy relaying data for others [7]. The unbalanced energy dissipation rates make flat routing protocols hardly to avoid the energy-hole problem. Clustering, as a hierarchy approach, was first used in cellular networks, where mobile phones communicate with the fixed infrastructures to realize the information exchange. Clustering enables bandwidth reuse and can improve the system capacity [8]. The hierarchy feature of clustering can make the network more stable over the movement of nodes [9]. Thus, many clustering protocols were proposed for adhoc networks to overcome mobility and enable point-topoint connectivity. Those protocols remove the need for fixed infrastructures but dynamically select the coordinators among the nodes to react to the network topology change [10].
The main application scenario of clustering protocols has changed from ad-hoc networks to the WSNs in recent years. Clustering segments the WSN into different clusters [11]. Each cluster consists of one cluster head (CH) and several cluster members (CM). The network communication is disassembled into rounds. In each round, the CH coordinates its members to upload their sensing data in a TDMA schedule. Since the nearby nodes usually have relatively high correlation, the CH can make the local data aggregation to reduce the energy wasted by the redundancy data transmission. Unlike the clustering protocols which are used in ad-hoc networks and try to avoid the re-clustering, clustering protocols for WSNs actively reselect the CHs every round to balance to energy distribution among the network. Moreover, with the implementation of power control, the CHs can flexibly adjust the cluster size. Meanwhile, the CMs can adjust their transmission power to reduce energy dissipation and minimize the interference between clusters [8].
Despite the advantages of clustering, clustering protocols in WSNs usually face two challenges. First, CH-selection is an NP-hard problem. Thus, getting the optimal cluster category under a particular fitness function is not easy. More importantly, the subfunctions included in the fitness function have a distinct influence on the final system performance. Clustering protocols have to construct the fitness function reasonably. Second, clustering protocols are typically sensitive to the key parameters such as the cluster head ratio and the weight factors between the different subfunctions. A clustering protocol shall have the ability to tune the vital parameters properly according to the different application specifications.
In this paper, we propose a clustering protocol based on the meta-heuristic approach (CPMA). The protocol aims to prolong the valid lifetime of the wireless sensor network. The structure of CPMA can be split into two parts. The first part mainly focuses on the dynamic cluster construction strategy. We use the Harmony Search (HS) algorithm to find the proper CH set every round. The fitness function of HS takes both the energy used every round (f 1 ) and the predicted energy distribution ratio (f 2 ) into consideration. A weight factor α is introduced to tune the importance of f 1 and f 2 according to the different application specifications. We have pointed out that the critical parameters usually have a significant influence on the final system performance. Thus, in the second part of the CPMA, we use the Artificial Bee Colony (ABC) algorithm to optimize the cluster head ratio P and the weight factor α jointly. The contribution of our work can be summarized as follows: • We introduce the meta-heuristic intelligence as the primary method and propose an efficient clustering protocol used in WSNs.
• We develop an online cluster head selection strategy based on the Harmony Search Algorithm. The strategy aims to decrease the network total energy dissipation as well as balance the energy distribution throughout the network.
• We use the Artificial Bee Colony algorithm to conduct an offline optimization of the cluster head ratio P and the weight factor α jointly. The optimization can automatically tune the corresponding parameters according to the network specifications and improve the network performance.
• We perform a detailed simulation of the proposed protocol CPMA. The result shows that CPMA can prolong the network lifetime and improve the throughput compared to other protocols in almost all the application scenarios. The simulation also exhibits that CPMA has good adaptability to the different network lifetime definitions. All the advantages demonstrate that CPMA is a highperformance and widely applicable clustering protocol.
The rest of the paper is organized as follows: In Section II, we give a brief review of the existing clustering-based routing protocols. Section III describes the model we use in this paper. In Section IV, we give a detailed introduction to the proposed clustering protocol CPMA. The simulation results are shown in Section V. Finally, Section VI summarizes the paper and points out future work.

II. RELATED WORKS
In recent years, a large amount of clustering protocols have been proposed to improve the network performance. Based on how the CH is selected, we classify those protocols into the classical approach and the computational intelligence approach [12]. In this section, we will give a summary of some most typical clustering protocols.

A. CLASSICAL APPROACH
Leach [13] is one of the precursory clustering protocols. Designed for the periodical data-gathering application scenarios, Leach divides the network communication into rounds. Every round consists of two phases, namely the setup phase and the working phase. Leach is distributed. In the setup phase, it uses the stochastic rotation of CHs to guarantee the expected cluster head ratio P in the network. Each sensor is expected to be the CH once a time in every 1/P rounds. Thus, Leach can spread the energy consumption between sensors. In the working phase, each CM uploads 214552 VOLUME 8, 2020 data to its CH based on a TMDA manner. The CH will first handle the received data and make the data aggregation, and then transmits the gathered data to the BS directly. Compared to flat routing protocols, Leach can effectively improve the network lifetime. However, it has some disadvantages. First, the distribution of cluster heads in Leach may be nonuniform in the sensing area. According to EECS [14], the non-uniform distribution will not only increase the energy cost of both intra-cluster and inter-cluster communication but also increase the imbalance of energy dissipation. Second, Leach does not care about the residual energy of nodes. When the low energy node becomes the CH, it may die rapidly, limiting the network lifetime.
Based on the Leach protocol, EECS [14] takes the residual energy of nodes into account. In the setup phase, the candidate CH will compare the residual energy with others within the competition radius. Candidate CH will be the final CH only if it has the highest residual energy. Normal sensors use a cost function to choose the CH they join. The cost function consists of two parameters. One is the distance from the sensor to the CH, and the other is the distance from the CH to the BS. EECS can improve the lifetime compared to Leach. However, the optimal weight factor between the parameters depends on the specific network scale. The authors did not give a feasible solution.
Designed for the large scale wireless sensor networks, HEED [15] is a multi-hop clustering protocol that considers two parameters to guide the cluster construction process. HEED firstly uses the residual energy of nodes to probabilistically generate the initial set of CHs. Then the intra-cluster communication cost is considered to guide the nodes which fall in the cover range of more than one CHs. The nodes will choose the CH with the minimum intra-cluster communication cost in order to balance the workload between CHs. Heed performs well. However, the cluster formation of HEED is based on iterations, which will bring relatively high overhead by the exchanging of control messages.
UCR [16] is a popular unequal size multi-hop clustering protocol. UCR points out that the multi-hop inter-cluster communication will make the CHs near the BS consume more energy relaying the packets for other CHs. Unlike EECS and HEED, the competition radius of the CH in UCR increases with the distance growth of the CH to the BS. UCR can achieve a good performance. However, it does not concern the node density, which may lead to the swift death of some sensors located in the high-density area.

B. COMPUTATIONAL INTELLIGENCE APPROACH 1) FUZZY LOGIC APPROACH
Fuzzy logic systems have an inherent ability to draw conclusions and make decisions in a complicated situation [17]. It is suitable to use the fuzzy logic to conduct the cluster head selection, which usually needs to consider many parameters [18], [19].
CHEF [20] considers two variables: the residual energy and the local distance as the fuzzy system input. The fuzzy system output is the chance. The tentative CH, which has a higher chance than others within its radius R, will become the final CH. CHEF uses the fuzzy system to let the sensor with more residual energy and less local distance more likely to become the final CH. However, CHEF does not care about the distance from the sensor to the BS. CH in EAUCF [21] uses multi-hop inter-cluster communication to deliver packets. Unlike CHEF, EAUCF uses the residual energy and the distance to BS as the input of the fuzzy system. The fuzzy system output is the competition radius. The CH with more residual energy and is far from the BS will have a bigger competition radius to coordinate more CMs. EAUCF has a better performance than CHEF, especially for large scale networks. Nevertheless, like UCR we introduced before, without concerning the node density, EAUCF may not make a proper decision if the sensors are not uniformly deployed in the network.
MOFCA [22] is a multi-objective fuzzy clustering protocol which is proved to be energy efficient as well as robust to the sensor distribution and movement. MOFCA can be used in both stationary and evolving networks. It addresses the shortcomings of both CHFE and EAUCF by employing the residual energy, distance to the BS, and the node density three parameters into consideration. The fuzzy logic system output is the competition radius of the tentative CH, which may have more than 27 different linguistic results caused by the variant combinations of fuzzy input variables. The simulation shows that MOFCA can achieve a better result than CHEF and EAUCF. Moreover, it has relatively good scalability.
ECH [23] is a novel fuzzy-based clustering protocol. It aims to maximize the network lifetime. To minimize the data redundancy, ECH introduces a sleeping-waking mechanism for overlapping and neighboring nodes. In every round, the protocol will divide the sensors into the sleeping group and the waking group to reduce the energy waste as well as keep the network working normally. Only the working nodes will participate in the following cluster head selection which is distributed and fuzzy-logic based. ECH shows a good performance through simulation and is quite suitable for the application where sensors are highly correlated.
Fuzzy logic can handle the uncertain nature of WSNs and has relatively low complexity [24]. It can be easily implemented on conventional sensors. Thus, clustering protocols based on Fuzzy Logic are suitable for the distributed networks [25], [26]. However, the performance of such protocols is highly sensitive to the setting of the fuzzy system. There have been some papers [27], [28] proposed to use the metaheuristic algorithms to optimize the details of fuzzy rules. However, those papers only consider the heterogeneous network where the data aggregation is not supported.

2) META-HEURISTIC APPROACH
In recent years, meta-heuristic algorithms are widely used to solve NP-hard optimization problems. Researchers have VOLUME 8, 2020 proposed many clustering protocols [29]- [34] that use metaheuristic approaches to construct the cluster.
Base Station in Leach-C [31] uses the Simulated Annealing algorithm to choose the set of CHs. The optimization process tries to decrease the total squad distance between all CMs and their associated CHs. The CH selection strategy of Leach-C aims to minimize the intra-cluster communication cost. Hence the network overall energy consumption is reduced.
Researchers in [32] proposed an energy-efficient clustering protocol based on the Particle Swarm Optimization named PSO-C. The fitness function of PSO-C is formulated based on two considerations. One is the maximum average intra-cluster distance of different clusters, and the other is the ratio of the network total residual energy to the total residual energy of the selected CHs. The core idea of PSO-C is to reduce the transmission cost, meanwhile, avoid the sharp energy decrease of some particular nodes. The cluster head selection of HSACP [33] is based on the HS algorithm. HSACP quantifies the intra-cluster cost similar to PSO-C but has a different quantification of the network energy balance state. Simulation shows the effectiveness of the above three meta-heuristic based clustering protocols. However, owing to the neglect of the distance from the CH to the BS, the performance of these protocols may suffer a discount in the networks where the data aggregation is not ideal.
CRHS [34] is a novel clustering protocol based on the HS algorithm. The fitness function of CRHS is constructed by four parts: the residual energy, the node degree, the average intra-cluster distance, and the distance between the CH to the BS. Different from traditional clustering protocols where sensors usually choose the closest CH to join. CRHS proposes a novel potential function that helps sensors find their associated CHs. Thus, the clusters are more balanced throughout the network. CRHS achieves a good result in simulation. However, the complexity of the setup phase in CRHS is much higher than other protocols. The protocol may be challenging to implement.
The main advantages of CPMA over current clustering protocols are: Firstly, instead of the various considerations which different protocols take, CPMA only uses the total energy cost and the predicted energy distribution ratio to guide the CH selection. We think those two items can give the most direct judgement of the network clustering quality. Secondly, CPMA can optimize the cluster head ratio and the weight factor α according to different application specifications. It has a better performance than the existing protocols and is widely applicable.

III. SYSTEM MODELS A. NETWORK MODEL
In this paper, we focus on a wireless sensor network which is based on the single-hop clustering strategy. The network structure is shown in Fig.1. We declare some underlying assumptions as follows: • The Base Station is energy unlimited and has enough computing power. It always has sufficient knowledge of the entire network.
• All the sensors are with the same initial energy. The network deploys the sensors randomly and does not allow the movement of sensors once deployed.
• All the sensors can transmit their data to the BS directly and are competent for adjusting their transmitting power according to the communication distance.
• All network links are always symmetric. Sensors can calculate the link distance according to the received signal strength indication (RSSI) if needed.
• The network can be homogeneous or heterogeneous.
The data aggregation at the cluster head can be perfect, unavailable, or moderate.

B. ENERGY MODEL
The transceiver of sensors usually consumes the most energy. To describe the energy consumption, we divide the transceiver into three parts: the transmitter power amplifier, the transmitter and receiver radio electronics. We utilize both the free space (d 2 power loss) and the multipath fading (d 4 power loss) channel model to depict the energy decrease through transmission. Define the packet size as l bits. Then the energy cost of transmitting (E tx ) and receiving (E rx ) can be modeled as follows: where E elec is the energy cost per bit by the radio electronics of both transmitter and receiver; E fs and E mp depend upon the character of the power amplifier; d 0 is the threshold distance defined as: If the data aggregation is allowed, the cluster head will consume E da energy to handle one bit of data. 214554 VOLUME 8, 2020 The communication timeline of the CPMA is shown in Fig.2. CPMA divides the communication process into rounds. Each round consists of the setup phase and the working phase. The protocol is centralized. At the beginning of the setup phase, the BS uses the HS algorithm to choose the set of CHs and broadcasts a CH_Selection_Message which contains the result to all the sensors. Each CH then will broadcast a CH_Advice_Message to help sensors choose their CHs. Sensors use the received signal power to calculate the distances to the different CHs and send the Join_Message to the nearest CH. To coordinate the CMs, the CHs will send the Scheduling_Message to their members at the end of the setup phase. All the communications in the setup phase are based on the CSMA strategy to avoid collision.
Cluster building is relatively energy costing for the exchange of control messages. To save energy, CPMA combines several frames into one working phase. Since the CH is a local control center in its cluster, it uses the TDMA straregy to schedule the CMs, which eliminates the interference between CMs and saves the energy. In each frame, the CM will only be awake and send the data packet to the CH in its associated slot according to the Scheduling_Message. After receiving all the packets, the CH will first handle the data, and then access the channel by CSMA to send the data to the BS directly.
Since wireless communication has the broadcast nature. CPMA uses the direct-sequence spread spectrum (DSSS) to reduce the inter-cluster interference. Assuming the network has K clusters, the BS will assign the network with a common spreading code and assign each cluster with a unique spreading code. All the 1 + K spreading codes are different and embedded in the CH_Selection_Message. The common spreading code is used for the control message exchanging in the setup phase and the CHs sending packets to the BS in the working phase. Whereas, the unique spreading codes are used by CMs to upload their data to the CHs in the working phase.
Time synchronization is necessary for CPMA. In this paper, we assume that all the sensors are synchronized and start the setup phase at the same time by receiving the synchronization pulses sent by the BS. According to the pulses, sensors can adjust their clocks to eliminate the clock offset. What's more the BS can adjust the network operating cycle by altering the interval between two pulses.

Step1 (Derivation of the Fitness Function):
To prolong the lifetime of WSNs, the fitness function of the CH selection considers two critical factors. One is the total energy cost (f 1 ), which reflects the energy consumption of both CHs and CMs in the coming round. By decreasing f 1 , we aim to slow down the energy dissipation of the entire network. The other is the predicted energy distribution ratio (f 2 ), which represents the energy distribution state after the coming round. By decreasing f 2 , we try to smooth the energy imbalance between different sensors.

a: THE TOTAL ENERGY COST (f 1 )
Assuming that the network has N sensors and the cluster head ratio is P. Then the number of CH is K = N · P. We split the ith CH energy cost in one frame (Energy CH _ i ) into Data Aggregation (Energy da ), Data Receiving (Energy rx ), and Data Transmitting (Energy tx ). Let N i ( K i=1 N i = N ) be the number of sensors (including both the CH and CMs) in the ith cluster. We can get: Energy CH _ i = Energy da + Energy rx + Energy tx (7) where l is the data packet size, ρ is the data aggregation ratio, d toBS_i represents the distance between the ith CH and the BS. We define d toCH _ij the distance between the CM j and it's corresponding CH i. The one frame energy cost of the CMs in the ith cluster (Energy CM _ i ) is Combining with (7) and (8), the energy cost of the ith cluster in one frame (Energy cluster _ ith ) is For the convenience of expression, we assume that all the distances between CMs and their associated CHs are less than d 0 , and all the distances between CHs and the BS are larger than d 0 .The network total energy cost in one frame (Energy total ) can be expressed as: Note that the first three of (10) are constant. Thus, we define the f 1 as: where γ is the scaling factor.

b: THE PREDICTED ENERGY DISTRIBUTION RATIO(f 2 )
Define Energy re _ ij the residual energy of the CM j in the ith cluster. Let j = 0 mean the CH itself. Then the predicted residual energy after the incoming round (Energy pre _ ij ) is toBS_i , if j = 0, else : (12) where Q is the number of frames in one round. For simplicity, (12) only considers the data packet transmission which dominates the energy consumption of sensors. To smooth the energy dissipation throughout the network, we wish the difference of the energy states between sensors as small as possible. Hence, we define f 2 as: Combining with (11) and (13), the fitness function of the cluster head selection optimization can be expressed as: where α is the weight factor between the two considerations and can take the value from 0 to 1.
Step2 (Initialize the Harmony Memory): Inspired by the music harmony tuning process, authors in [35] proposed the Harmony Search algorithm, which has been proved to be efficient in handling the cluster head selection problem. The Harmony Memory (HM ) can be treated as a matrix that consists of several solution vectors. In CPMA, a solution vector represents a selection of the CHs. To improve the quality of the initial Harmony Memory, the BS will first collect 2 · K sensors with the highest residual energy, and then randomly choose K of them to form an initial solution vector. Let Harmony Memory Size (HMS) mean the number of vectors. We can describe the HM as: where S j i means the ID of the ith CH in the jth solution vector. Step3 (New Harmony Improvisation): After generating the initial HM , the BS will improvise the new harmony S new = S new 1 , S new 2 , · · · , S new K , in which S new i means the ID of the ith cluster head in the new harmony and is generated through the following way: where HMCR is the Harmony Memory Considering Rate, HM (:, i) means the ith column of the HM matrix, and Selected means the set of the selected CH in the new harmony. To generate the ith CH of the new harmony, the BS will compare a random number range from [0 − 1] with the HMCR. If the random number is smaller than HMCR, the ith CH will be chosen randomly from the ith column of the current HM . Otherwise, the BS will pick the ith CH from the entire alive sensors stochastically. To avoid repetition, the chosen CH shall be different from all the CHs which have already been selected.
To escape the local optima, if the S new i is generated from the HM , it may be further mutated according to the Pitch Adjusting Rate (PAR): where S mutate i is the closest sensor of S new i with the energy above the median. And it may replace the S new i to be the final ith CH in the new harmony with the probability of PAR.
Step4 (Updating the HM): To evaluate the generated new harmony, the BS will calculate its fitness function value and compare the result with other harmonies in the HM . If the fitness of the new harmony is better than the fitness of the worst harmony in the HM , the worst harmony will be replaced by the new one, and then the HM will be updated.
The BS will continuously update the HM based on the above Step3 and Step4 until the max iteration number (NI ) is reached. After that, the BS will choose the best harmony and broadcast the result to all the sensors in the network. Upon that, the formation of clusters will be made by sensors. The clustering process of CPMA is summarized in Algorithm 1. for i = 1 to K = N · P do 5: if Random < HMCR then 6: Randomly pick S new i from HM (:, i) 7: if Random < PAR then Evaluate the fitness function value of S new 14: if S new better than the worst harmony in HM then 15: Replace the worst harmony with S new 16: end if 17: end for 18: end while 19: STEP3: Format the cluster: 20: The Base Station chooses the best harmony and then broadcasts the CH_Selection_Message 21: if the sensor is cluster head then 22: Broadcasts the CH_Advice_Message 23: On receiving the Join_Message 24: Broadcasts Scheduling_Message to all its members 25: EXIT 26: else 27: On receiving the CH_Advice_Message 28: Chooses the closest CH to join and sends the Join_Message 29: Receives the Scheduling_Message from its CH The performance of clustering protocols is usually sensitive to the key parameters. Various WSN applications usually have different features such as sensor and base station deployment, data aggregation ratio, and even the judgment of the network lifetime. To guarantee robustness, clustering protocols have to tune their parameters accordingly. Currently, most of the protocols do not have such ability but use the fixed parameters, which may lead to the network performance discount. To verify the influence of the protocol parameters on the network performance, we constructed a network scenario where the network size was 200m × 200m with 100 sensors randomly deployed. The data aggregation  (0,200).The rounds of first node dying (FND) under different P and α were simulated. We first fixed the weight factor α to 0.5 and varied the cluster head ratio P from 0.02 to 0.20. Then we fixed the cluster head ratio P to 0.10 and varied the weight factor α form 0 to 1. The simulation results are shown in Fig.3. From the results, we can draw two conclusions. The first is that different parameters can obviously affect the network performance. The second is that different network features shall match diverse proper parameters. Finding the optimized protocol parameters is also an NP-Hard problem. Hence, in this paper, we utilize the Artificial Bee Colony (ABC) algorithm to tune the clustering parameters automatically.

Algorithm 1 The Clustering Process of CPMA
Compared with other meta-heuristic optimization algorithms, ABC [36] is simple to implement and has an excellent ability of both local and global searching. CPMA uses ABC to automatically tune the cluster head ratio (P) and the weight factor (α) in the fitness function of the online cluster head selection. For each specific application, the optimization is offline and will be executed only once before the official beginning of the network.

2) PARAMETER OPTIMIZATION OVER ABC
Step1 (Derivation of the Fitness Function): CPMA takes the lifetime of WSNs as the most critical consideration. The differences among various application specifications make the definition of the network lifetime diversity. For homogeneous VOLUME 8, 2020 networks where sensors work together to monitor the same or similar phenomenon, the lifetime may be defined as the round of a certain percentage of nodes dying. Whereas, for heterogeneous networks, the lifetime may be defined as the round of the first node dying. The fitness function of the parameter optimization and the corresponding constraints are defined as follows: where FND, HND, and LND represent the round of the first, half, and last node dying. β 1 ,β 2 , and β 3 reflect the importance of different items in the fitness function.
Step2 (Generate the Initial Food Source): A food source of ABC means a possible solution vector. Define SN as the size of the bee population, the ABC optimization will randomly initialize SN food sources through the solution space: (20) where X i is the ith food source, X i,1 and X i,2 represents the cluster head ratio P and the weight-fator α of the ith food source separately. X min and X max mean the lower and upper limit vector. r is a random number that varies from 0 to 1.

Step3 (Population Updating):
The ABC algorithm has three types of bee: the employed bee, the onlooker bee, and the scout bee. The employed bee and the onlooker bee have the same population SN . At the beginning of each iteration, each employed bee will be placed at a food source and exploit a candidate food source V i = V i,1 , V i,2 using the following equation: where q ∈ (1, 2, · · · , SN ) and j ∈ (1, 2) are the randomly chosen indexes; θ is a random number range from [0 − 1]. If the fitness of the candidate food source is better, the corresponding employed bee will abandon the current food source and move to the new one. Otherwise, it will continue to stay unchanged. After exploitation, the employed bees will share the information of the food sources with the onlooker bees. Each onlooker bee will randomly choose a food source with the probability as following: where p i is the probability of choosing the ith food source; fitness i is the fitness function value of the ith food source. After all the onlooker bees arriving their corresponding food sources, they will make the exploitation in the same manner as the employed bees. To increase the global searching ability and avoid the local optima, if the food source cannot be improved through the limit times, it will be abandoned. The related employed bee will become a scout bee and randomly choose a new food source through the whole solution space. We summarize the flowchart of the parameter tuning process in the Fig.4.

C. COMPLEXITY ANALYSIS OF THE CPMA PROTOCOL
CPMA is mainly combined of two parts, namely the online cluster formation and the offline parameter optimization. For the online cluster formation, we pay attention to the control message complexity, which implies the network communication overhead. During every round of the cluster formation, each CH will broadcast a CH_Advice_Message and a Scheduling_Message. The total control messages generated by CHs are 2 · K = 2 · N · P. The total control messages generated by CMs are N − K = N − N · P, because each CM will only need to send a Join_Message. So that the total network control messages in every round are 2·N ·P+N −N · P = N · (1 + P). The control message complexity of online cluster formation is O (N ), which is lightweight.
For the offline parameter optimization, we pay attention to the time complexity which implies the computational burden of the BS. CPMA uses the ABC algorithm to tune the corresponding parameters under the guideline of (18). Given NI ABC as the iteration number of the ABC algorithm, SN as the number of food source. Then the time complexity of the offline parameter optimization can be expressed as O (2SN · NI ABC · TC sim ), where TC sim represents the complexity of evaluating the fitness function of one food source, which needs the network simulation until the LND. To get the detailed TC sim , we break the network simulation into rounds. In every round, the major computing task is to choose the proper cluster set through NI HS iterations based on the HS algorithm. For each iteration, the BS will first construct the clusters and 214558 VOLUME 8, 2020  · P · (1 − P) · NI H S . Hence, we can get that TC sim is O N 2 · P · (1 − P) · NI H S · LND . Therefore, the time complexity of the offline parameter tuning is O 2SN · NI ABC · N 2 · P · (1 − P) · NI H S · LND . Notice that the parameter tuning will be performed only once before the network is working. It will not introduce any overhead to the CPMA online operation but help CPMA to be adequate for different network specifications.

V. CPMA PERFORMANCE EVALUATION A. SIMULATION SETTING
In this section, we compare the performance of CPMA with four famous clustering protocols: Leach, Leach-C, HSACP, and PSO-C. The WSN focuses on a 200m × 200m interesting area with 100 nodes which are randomly deployed in the sensing area. To evaluate the performance comprehensively, we consider three scenarios. Each scenario has a different sensor and base station deployment. Specifically, Scenario 1: the BS locates in the middle of the interesting area; Scenario 2: the BS locates in the border of the interesting area; Scenario 3: the BS locates in the corner of the interesting area. In every scenario, five data aggregation ratios: 0, 0.1, 0.3, 0.5, and 1 are used to imitate the various application diversities. For the online HS optimization, we set the HMS to 20, the HMCR and the PAR are both 0.8. The number of iterations is 200. For the offline ABC optimization, we set the SN to 10 and the number of iterations to 20. The limit, which is used to escape the local optima, is set to 3. For the majority of applications, the FND is the most critical consideration and the HND is the second. Hence we set the β 1 , β 2 , and β 3 in (18) to 0.8, 0.2, and 0 separately. We summarize all the parameters used in the simulation in Table 1.

B. SIMULATION RESULT 1) EFFECTIVE NETWORK LIFETIME
To give intuitive insight into the network lifetime under different clustering protocols, we simulated the Leach, Leach-C, PSO-C, HSACP, and CPMA under different network scenarios and data aggregation ratios. The result is summarized in Table 2, Table 3, and Table 4. The corresponding tuned parameters are summarized in Table 5. We can easily observe VOLUME 8, 2020    that the offline parameter optimization strategy will assign the parameters with different values according to the specific network conditions. The result of network lifetime shows that the proposed CPMA can achieve the best performance in terms of the lifetime of the FND in all the simulation conditions. Especially for the scenario 2 and 3 with relatively high data aggregation ratios, CPMA can mightily delay the time point of the first sensor failure. For example, when the data aggregation ratio is 0.5, in scenario 2, compared with Leach-C, PS0-C and HSACP, the CPMA can improve the round of FND by 95%, 74%, and 185% separately. And the improvement will increase to 222%, 317%, and 343% in scenario 3. For the HND criteria, CPMA outperforms others in the scenario 1 but is weaker in the scenario 2 and 3. Fig.6, Fig.7, and Fig.8 illustrate the detailed change curves of the network alive node number over the working round in different scenarios with perfect data aggregation. Compared with other protocols, CPMA can effectively delay the decline   of the curves but increase the slope of the descent. We believe that this phenomenon is due to the optimization strategy, giving the FND a much higher importance level than the HND in the corresponding fitness function (18). So that CPMA will pay more attention trying to improve the value of FND rather than HND.
Based on the simulation results, we calculated the lifetime fitness value of different protocols based on (18) and illustrated the values in Fig.9. We can easily obverse that CPMA achieves the best performance than others in all conditions. For instance, when the data aggregation is perfect, CPMA can improve the lifetime fitness value of (18) by 11% than Leach-C, by 8% than PSO-C, and by 6% than HSACP in scenario 1; the improvement comes to 51%, 42%, 43% in scenario 2 and 72%, 37%, 38% in scenario 3.

2) NUMBER OF RECEIVED PACKETS
To measure the total useful data packets through the network lifetime, currently clustering protocols usually only consider the number of packets transmitted by the CHs. In contrast such method is not suitable for the comparison between protocols in this paper. The proposed protocol does not fix the cluster head ratio but uses the offline optimization to tune it. For the homogeneous network where data aggregation is   available, the variety of CH number will cause the BS receiving different numbers of aggregated data packets every round. However, the total useful information will be approximate or equal owing to the aggregation. For the heterogeneous network, the packet transmitted by the CH to the BS is only the data combination of the CH and its associated members.
Hence, the BS will receive the same number of sensor data packets no matter what the cluster head ratio is.
In this paper, we only compare the numbers of received packets between protocols under the heterogeneous situation where data aggregation is unavailable. We define the number of received packets as the number of packets from all the individual sensors. Fig.10 shows the number of total received packets by the BS till the round of FND and HND. From the related result, we can observe that CPMA shows the best performance in all the conditions except for the Scenario 2 with HND.
3) ADAPTABILITY TO DIFFERENT LIFETIME OPTIMIZATION OBJECTS Different applications will have variant focuses on the judgement of the network lifetime. To validate the adaptability of our proposed protocol, we consider three network lifetime optimization objects which aim to maximize the FND (Object1), HND (Object2), and LND (Object3) separately. In fact, the optimization object supports discretionary combinations of the FND, HND, and LND in the constraint of (18). Fig.11, Fig.12, and Fig.13 illustrate the simulation results in the three network scenarios. Table 6 gives the values of the corresponding tuned parameters. All the scenarios do not support the data aggregation. The result shows that CPMA can effectively adapt to the different network lifetime optimization objects by tuning the correlated protocol parameters in all the three scenarios.

VI. CONCLUSION AND FUTURE WORK
In this paper, we proposed a clustering protocol based on the meta-heuristic approach (CPMA). CPMA aims to prolong the valid lifetime of wireless sensor networks and can be divided into two parts. The first part is the online cluster head selection and communication coordination. In this part, CPMA utilizes the HS algorithm to choose the set of cluster heads every round, and coordinates both the intra-cluster and inter-cluster communication of the network. The fitness function of HS takes the network total energy cost and the predicted energy distribution ratio into consideration. The core idea is to reduce the network total energy dissipation as well as smooth the network energy distribution. The second part is the offline parameter optimization. In this part, CPMA tunes the cluster head ratio (P) and the weight factor (α) in the HS's fitness function based on the ABC algorithm. The optimization is offline and will be executed before the beginning of the network. Through the second part, CPMA will automatically adapt to the different networks' diversities and explore the best network performance.
To validate the superiority of CPMA, we made a detailed comparison of CPMA with some classical clustering protocols. The simulation considered different sensor and base station deployments and the various data aggregation ratios. The result shows that CPMA can prolong the network lifetime and increase the network throughput in almost all the network conditions. We also considered the cases where the real applications may have various definitions on the network lifetime. A simulation was performed to test the performance of CPMA under different lifetime optimization objects. The result is also positive and inspiring. All the above conclusions demonstrate the advantages and robustness of our proposed protocol: CPMA, which is suitable and efficient for a large number of wireless sensor network applications.
Despite the advantages of the CPMA protocol, it still has some drawbacks. First, the CPMA has a relatively high time complexity of the offline parameter optimization, which will increase the computation burden of the BS. This problem also occurs in [27], [28]. We believe that with the development of computer hardware and cloud computing technology, this challenge will be well solved. Second, CPMA does not consider network mobility. In future IoT application scenarios such as smart city, both the mobile vehicles and even the unmanaged aerial vehicles can become parts of the sensing network. For example, [37] and [38] separately use unmanaged aerial vehicles and mobile vehicles as the mobile sinks to collect the interesting data. Meanwhile, [39] uses trustworthiness vehicles as vehicular sensors to sense the corresponding data. Those applications will demand the clustering protocol to have the ability to handle the movement of sensors and the BS. Finally, communication between the CH and the BS is single-hop in CPMA. The single-hop inter-cluster communication simplifies the protocol. When the network scale is moderate, it can achieve even a better network performance than multi-hop inter-cluster communication according to the simulation result in [22]. However, when the network scale increases over a threshold (like most IoT applications), the clustering protocol will be more energy efficient by using multi-hop inter-cluster communication.
For future work, considering that IoT applications are usually large-scale and highly mobile, we will enhance CPMA with multi-hop inter-cluster communication and take into account the movement in WSNs. Then, we will consider the energy harvesting wireless sensor networks in which sensors can harvest energy from their surrounding environments. CPMA will try to avoid the dormancy of sensors as well as improve the network throughput under the different energy harvesting constraints.