An Improved Energy-Efficient Clustering Protocol to Prolong the Lifetime of the WSN-Based IoT

A wireless sensor network (WSN) is an important part of the Internet of Things (IoT). However, sensor nodes of a WSN-based IoT network are constraining with the energy resources. A clustering protocol provides an efficient solution to ensure energy saving of nodes and prolong the network lifetime by organizing nodes into clusters to reduce the transmission distance between the sensor nodes and base station (BS). However, existing clustering protocols suffer from issues concerning the clustering structure that adversely affects the performance of these protocols. In this study, we propose an improved energy-efficient clustering protocol (IEECP) to prolong the lifetime of the WSN-based IoT. The proposed IEECP consists of three sequential parts. First, an optimal number of clusters is determined for the overlapping balanced clusters. Then, the balanced-static clusters are formed on the basis of a modified fuzzy C-means algorithm by combining this algorithm with a mechanism to reduce and balance the energy consumption of the sensor nodes. Lastly, cluster heads (CHs) are selected in optimal locations with rotation of the CH function among members of the cluster based on a new CH selection-rotation algorithm by integrating a back-off timer mechanism for CH selection and rotation mechanism for CH rotation. In particular, the proposed protocol reduces and balances the energy consumption of nodes by improving the clustering structure, where IEECP is suitable for networks that require a long lifetime. The evaluation results prove that the IEECP performs better than existing protocols.


I. INTRODUCTION
Internet of Things (IoT) is a significant source of technological solutions in several applications. The IoT is pillared by a wireless sensor network (WSN) which decreases the cost of the new technology. Literature verifies that this technology integration will reduce costs and ensure convenience in daily life through smart sensor node networks whereby the nodes have access to internet [1], [2].
WSN, an inexpensive legacy system, has been applied in several fields, such as industrial control, environmental monitoring, military surveillance, and intelligent transportation systems [1], providing large-scale physical data that can The associate editor coordinating the review of this manuscript and approving it for publication was Gongbo Zhou . be further utilized. Thus, by integrating the IoT and WSN applications, no massive paradigm shift is needed [2].
WSN-based IoT is advantageous for its convenient deployment and low cost. Furthermore, it can function independently in harsh or high-risk places where human presence is not possible. However, WSNs have defects that need to be addressed [1]. The network lifetime problem is the main challenge in WSN [3].
The sensor's lifetime is only related to its batteries, which are difficult or impossible to replace or recharge due to the rugged environments where they are operating [4]. This problem undermines the integration of the WSN into IoT, elevating the costs of new technology. Accordingly, prolonged network lifetime is considered as a major challenge in the WSN-based IoT. Consequently, to prolong the network's lifetime and improve energy consumption, a clustering approach is used in the WSN. The clustering protocol, where the sensor nodes are divided into small clusters, is an effective technique to reduce energy consumption and prolong network lifetime by avoiding long-distance communication [5], [6]. Each cluster employs one node as a cluster head (CH) that has duties more than member nodes (MNs). Practically, each MN in the cluster transmits its sensing data to its CH, and then the CH transmits these data to BS via a single-hop or multi-hop manner.
Although the clustering protocol is considered as an effective way to conserve energy for the nodes in WSNs, clustering structure remains a major issue, which adversely affects the network lifetime through the inefficient energy consumption of nodes [4], [7]- [10]. Furthermore, the WSN poor clustering structure frequently affects the subsequent procedures of the network, such as data aggregation and routing discovery, where it prepares the network for operation [11]. Consequently, the clustering structure efficiency has a considerable effect on the WSN lifetime.
The first of those issues is when determining a sub-optimal number of clusters (less or more than the optimal number), leading to the increase in the energy consumption of nodes [12]. Most of the clustering protocols that create balanced overlapping clusters suffer from the inaccurate determination of the optimal number of clusters when using current mathematical models because the distance to the CH has not been estimated correctly. The second issue is related to cluster formation, which can drastically affect the lifetime of WSN [13]. On certain occasions, an FCM algorithm (that is widely used in the WSN domain for cluster formation) produces unbalanced clusters (large and small) because of the random nodes deployment in the area, hence, resulting in unbalanced energy consumption for nodes. In large clusters, the selected CHs are burdened by data more than the CHs of the other clusters, thus, consuming more energy for the transmission of data [10], [14]. The third issue is improper CH selection, where most of the distributed methods do not take into account the routing information as a parameter in the CH selection. Consequently, an irregular distribution of the CHs occurs, where the transmission distances among the CHs in the network are uneven. Hence, some CHs are obliged to increase their signal strength in order to transmit data to the next hop, leading to unbalanced energy consumption for CHs in the network [15], [16]. The final issue of the clustering structure is the rotation of the CH function among members of the cluster. As a fixed value of energy is used as a threshold for CH rotation, the nodes demonstrate a dysfunctionality in terms of the CH and MN functions in the cluster. This dysfunctionality leads to unbalanced energy consumption for nodes that have been sequentially selected as CHs in the cluster, which subsequently accelerates the first node death (FND). As a result, two problems occur in relation to the CH that generate unbalanced energy consumption. The problems include the unbalanced transmission distances among CHs in the network, and the use of a static value of the threshold to rotate the CH function among members of the cluster. Therefore, this research is very significant as it addresses the main research question: how to prolong the network lifetime for the WSN-based IoT? Several sub-questions are identified as follows: How to determine the optimal number of clusters in case of the formation of the overlapping-balancing clusters?, How to form balanced clusters with little cost of the intra-distance of clusters in the random nodes distribution?, How to achieve balanced energy consumption among the CHs of clusters?, How to achieve balanced energy consumption of the successive CHs in the cluster?
To address these issues pertaining to the clustering structure that adversely affects the network lifetime through the inefficient energy consumption of nodes, and to answer the posed questions, this study proposes an improved energy-efficient clustering protocol (IEECP) to prolong the lifetime of the WSN-based IoT which consists of three parts: Firstly, a modified mathematical model is proposed based on the analysis of the energy consumption model for multi-hop communications and overlapping clusters in order to determine the optimal number of clusters. Secondly, a modified fuzzy C-means algorithm (M-FCM) is proposed in order to produce balanced cluster. Thirdly, a new algorithm is proposed known as CH selection and rotation algorithm (CHSRA) that integrates the back-off timer mechanism for CH selection, with a new rotation mechanism for CH rotation among members of the cluster.
The main contribution by the proposed protocol is the prolonging of the WSN-based IoT lifetime that depends on the node's battery, which extensively increases the applications' range of the WSN-based IoT. This major contribution can be achieved through the following tasks: 1) Selecting the optimal number of clusters based on the modified mathematical model by considering the overlapping case among clusters and multi-hop communications, 2) Forming balanced clusters that reduce the cost in the intra-distance based on modified fuzzy C-means algorithm (M-FCM) that result from a combination of the FCM algorithm with a centralized mechanism, 3) Reducing the energy overhead that results from the CH selection process in each round by a new integration of the back-off timer mechanism for CH selection with rotation mechanism in one algorithm known as CH selection and rotation model (CHSRA), 4) Balancing the communication distance among the CHs in the network based on a new objective function for the back-off mechanism, and 5) Balancing the life of the selected CHs in the cluster based on a new dynamic threshold. For straightforward ease in reading, most of the abbreviations used in this study are illustrated in Table 1. The rest of this paper is organized as follows: Section 2 provides a brief survey of the clustering algorithms and their advantages and disadvantages in literature. In section 3, the radio energy consumption model is introduced. Section 4 details the proposed protocol. Then, the results of the discussion are explicated in Section 5. Finally, section 6 concludes the study.

II. RELATED WORK
Among the principal goals of the cluster-based protocol is an effective clustering structure of the network, which enables the decrease in the energy being consumed, and offers balanced energy consumption [5]. The first proposed clustering protocol is the LEACH protocol [17]. The primary idea is to select the CH in a distributed manner at each round and let the nodes join the nearest CH to form a dynamic cluster. This network topology is formulated based on the selected CHs, which is basically not efficient due to its disregard for the residual energy of nodes [18]. If the CHs are not optimally selected for the frequent rounds, the network will suffer from a poor clustering setup even when it has the ability to adapt, thus, weakening the protocol performance. Therefore, the major challenge for dynamic clustering is in the CH selection [16]. Furthermore, the priority for CH selection leads to the formation of dynamic clusters at each round, thereby increasing the energy overhead as a result of the cluster formation after each re-selection process for CHs [19], [20]. Another version of the protocol is a LEACH-centralized protocol (LEACH-C) [21], where the optimal number of clusters K is determined based on a mathematical model. In contrast to LEACH concerning the CH selection and cluster formation, Base Station BS is responsible for these processes through the utilization of the simulated annealing optimization method, where at every round, the nodes that have more than the average energy will transmit their information to the BS. In addition, nodes with less than average energy do not stand a chance of being a candidate for the CH function. However, energy overhead remains when the information is transmitted to BS and the round trip is time-consuming at the CH selection process [1], [22].
To address the energy consumption and delay issues, the energy delay index for the trade-off (EDIT) protocol is proposed [23]. This protocol uses a back-off timer mechanism to select the CHs using the objective function that depends on the residual energy and distance to the BS with the number of neighbor nodes. Then, the nodes will join the nearest CH to form the clusters. For data transmission, this protocol uses a multi-hop method to send the sensing data to the BS based on the energy-delay function. This protocol enables the reduction of the overhead in CH selection by using the back-off timer mechanism. However, it does not guarantee that the selected CHs are evenly distributed in the monitoring area [20], leading to unbalanced energy consumption and longer time-consumption due to the re-selection of all CHs simultaneously [1]. Furthermore, it suffers from the formation of unbalanced clusters by letting nodes join the nearest CH. In addition, dynamic clustering overhead is also formed.
Another proposed protocol used to minimize the overhead of the CH selection and delay is delay-constrained energy multi-hop (DCEM) [24]. In this protocol, the CHs are selected based on the back-off timer mechanism according to the residual energy and distance to BS for the nodes. Then, it lets the nodes join the nearest CH. This proposed protocol is a fully distributed approach, which is similar to the EDIT protocol. Furthermore, the multi-hop communication method is used by transmitting data to BS based on the cost-function of energy and end-to-end delay. However, this protocol has the same problems as the EDIT protocol.
Ray and De [25] proposed an energy-efficient clustering protocol based on K-means (EECPK-means). At the initial stage, this protocol determines the optimal number of clusters based on the mathematical model. Next, it overcomes the problems caused by the formation of the dynamic clusters by generating the static and balanced clusters through the improved KM algorithm that is executed at the BS. Subsequently, the CHs are selected in the distribution by a given ID number for each node based on the distance of the node from the centroid of the cluster. To reduce the overhead of nodes, the CH re-selection process is not executed in every round but through a rotation mechanism that is based on a fixed value of energy as a rotation threshold. If the residual energy of the CH node is less than the threshold value, then the CH changes. For data transmission to the BS, this protocol uses an energy-cost function for the next-hop selection to save the energy for the CHs, hence, prolonging the network lifetime.
Jain [14] proposed a Traffic-Aware Channel Access Algorithm for Cluster-Based Wireless Sensor Networks to form balanced and static clusters. In the later sections, this protocol is called (TACAA) for simplicity. This protocol presents a modified-FCM by re-arranging the degrees of belonging for the nodes to produce balanced clusters to overcome the random node deployment issue in the sensing area. In terms of the CH selection, after the current CH works for a set of rounds as a threshold for CH rotation, the current CH selects the next CH in the cluster according to residual energy and degrees of belonging of the node. However, the reliance on the degrees of belonging is inefficient because of a normalization condition in the membership function [26], where the actual distance from the centroids is not indicated. Consequently, this condition impacts the energy consumption of nodes by increasing the intra-distance for each cluster. In addition, the reliance on the set of rounds as the threshold value is inefficient [27]; if the selected CH does not have enough energy to perform its job over this number of rounds, then the cluster becomes an island that is isolated from the network. OCM-FCM [10] also improves the structure of the cluster by presenting a new mathematical model to determine the optimal number of clusters based on the energy consumption model analysis for nodes. Furthermore, static clusters are formed based on the improved FCM algorithm. This protocol uses the distributed approach for the CH selection that reduces the overhead on nodes and the round-trip time by transmitting their information to the BS, where the current CH selects the next CH of the cluster based on residual energy for nodes. However, the CH re-selection process is conducted at every round, increasing energy consumption through the exchange of the control message among members of the cluster. Moreover, this process suffers from the same problem in terms of the CH distribution in the monitoring area because it relies only on the residual energy of the node in the CH selection.
Based on literature, the clustering protocol should consider four aspects: 1) the optimal number of clusters, 2) the formation of balanced and static clusters, 3) the evenly distribution of the selected CHs in the monitoring area with low overhead in the selection process, and finally, 4) the CH rotation process that relies on a threshold value. However, these factors have not been addressed in depth by the existing studies, hence, affecting the clustering protocol performance.

III. RADIO ENERGY CONSUMPTION MODEL
The energy consumption for nodes is measured by using the radio energy consumption model that depends on the distance between the transmitter and receiver [28]. Based on the distance between them, the free space or multi-path models can be utilized. Therefore, as a message (L bit) is transmitted through a distance (d), the energy consumption for a transmitter node can be formulated as where d 0 = ε fs /ε amp is the distance threshold between the transmitter and receiver, which equals to (78.7); E elec represents the energy consumption for the electronic system, whether sending or receiving one bit; E AD represents the energy consumption for data aggregation; and ε fs and ε amp are the energy consumption of the free space propagation and power consumption of multipath propagation, respectively: where E RX (L) is the energy consumption at the node as one bit of the message is received.

IV. IEECP
This section clarifies the proposed protocol which consists of three parts: determination of the optimal number of clusters based on a modified mathematical model, formation of balanced clusters based on a modified fuzzy C-means (M-FCM) and selection and rotation of the CH for clusters based on the CH selection-rotation algorithm (CHSRA).

A. DETERMINATION OF THE OPTIMAL NUMBER FOR CLUSTERS
The mathematical model is popularly used in the domain to ascertain the number of clusters. This method is less time-consuming in finding the number of clusters, where the number of clusters is defined prior to the execution of the deployment process for nodes. Accordingly, it is suitable for all types of applications, especially for real-time applications, hence, drawing many studies to utilize this method to determine the optimal number of clusters. This method is often executed by the BS. The mathematical model relies on a disk model to represent the distance to the CH. The disk model [7] is often utilized for studying the WSN communication, taking into account the coverage area for the transmission and entailing a disk of the plane with radius R, as shown in Figure 1-a. This value represents the distance to the CH in the mathematical model. The estimated value of radius (R) has a significant effect on the final result of the number of clusters, as proven later in this paper. Although other studies consider the distance to CH is the same whether the clusters are overlapping or isolated, in reality, the value of the radius is greater in the overlapping clusters, as shown in Figure 1 We assume that N nodes are deployed randomly in a square sensing area (M2). If K clusters exist, then the means number of the total nodes for each cluster is N/K (one CH and N/K -1 of MNs). Every cluster head consumes an amount of energy when receiving data from MNs, aggregating them from MNs, and transmitting the aggregate data to the BS. As the BS is located outside the sensing area, the multi-hop communication is used to transmit the sensing data to BS. Therefore, the energy consumption follows the free space model (d < d 0 ) shown in Eq. 1.
where d 2 BS refers to the distance from the CH to the next hop in a multi-hop communication with the assumption of perfect aggregation for the data.
The MN of the cluster merely needs to send its data to the CH. As the distance inside the cluster between MNs and their CH is not big, the energy consumption also observes the free-space model. Consequently, the energy consumption utilized in each MN is as follows where d 2 CH refers to the distance from MN to the CH. The area occupied by each cluster is around M 2 /K [29]. Generally, this is an arbitrary-shaped area with a node deployment ρ (x, y).
The predicted distance from MNs to their CH that is supposed to be at the center of the cluster is given by In the separated clusters, the area is a circle with radius R = M / √ πK and ρ (r, θ) is constant for r and θ [10]. Nonetheless, the radius R over of the overlapping clusters is more than the radius R sprt of the separated clusters for the same distributed area. Therefore, the area of the overlapping clusters is a circle by radius (R over ); thus, the radius of the overlapping clusters becomes To estimate the appropriate value of C over , we execute intensive simulations at various values of C over . Based on these simulation results, the range of the appropriate C over value is from 0.04 to 0.09 of R. The determination of C over value [30] used the sum of the weighted values of the range, so the value of C over ∼ = 0.06, as illustrated in the Results and Discussion section.
Therefore, Eq. 6 is as follows: If the density of nodes is uniform throughout the cluster area, then The energy consumption in the cluster is as follows: For K clusters, the total energy consumption is as follows: By equating the derivative of the total energy of the network with respect to K to zero, The optimum number of clusters can be obtained and given by

B. FORMATION OF BALANCED CLUSTERS
For the formation of balanced clusters, a modified fuzzy C-means algorithm (M-FCM) is proposed in this study by combining the FCM with a centralized mechanism. Before discussing the proposed algorithm to form balanced clusters, the conventional FCM algorithm is illustrated in the next section to provide a general idea.

1) FCM ALGORITHM OVERVIEW
The FCM algorithm has been widely used in the clustering processes for WSN cluster formation. This algorithm was originally presented by Dunn [31]. The goal of FCM is to form better clusters by reducing the summation of distances between the objects (N) and the cluster centers (C) by using the objective function. In WSN, the objects refer to nodes that are already distributed in the sensing area. The FCM objective function for organizing nodes into clusters in the WSN can be formulated as follows: where K refers to the number of clusters, N refers to the number of nodes, µ refers to the membership of node (i) to cluster (j), Cj refers to cluster centroid; d refers to the distance between a node (i) and centroid (cj), commonly described by Euclidean distance; and m is the value of the fuzzifier that is chosen as a real number greater than 1 (m ∈ [1, ∞)). m approaches to 1 clustering tend to become crisp (same as K-means algorithm) but when it reaches to the infinity, clustering becomes fuzzified (unreliable) [32]. Therefore, the value of fuzzifier is usually chosen as 2 in most of the applications [33], [34]. To terminate the algorithm, we use the condition Uij(t)− Uij(t− 1) < ε, where t is the current iteration, and ε is a very small number close to zero (e.g., 0.001).
On certain occasions, FCM produces unbalanced clusters because of the nature of the random deployment of sensor nodes in the monitoring area [35], as shown in Figure 2. This situation leads to unbalanced energy consumption for nodes, which adversely affects the network lifetime [10], [14], [25]. Some of the studies sought to overcome this problem by rearranging the degrees of belonging for nodes to produce balanced clusters, as shown in [14]. However, relying on the degrees of belonging is inefficient because of a normalization condition in the membership function, leading to an increase in the intra-distance for the clusters. Consequently, this condition increases the energy consumption of nodes [26]. To address this issue, a modified clustering algorithm has been proposed in this study to form balanced clusters with minimal intra-cluster distance by relying on the actual distance from centroids rather than the degrees of belonging for nodes.

2) MODIFIED FCM (M-FCM)
The proposed clustering algorithm is executed at the BS and consists of two phases: 1) initial cluster formation, which is based on the FCM, and 2) balanced cluster formation, which is based on the CM. In the initial cluster formation, the FCM is applied to form the clusters as shown in the algorithm, and then the process shifts to the second phase. The balanced cluster formation phase consists of two subphases. The first subphase consists of the following steps: 1) The cluster threshold (Th cluster ) is determined based on Eq. 24. 2) Clusters are sorted based on size. Minimum cluster size is compared with that of the Th cluster . If the size is greater than the Th cluster , then the FCM creates balanced clusters. Otherwise, the process shifts to the second subphase.
where Pe is the permittivity value equals to 0.85 [25], and K signifies the number of clusters.
In the second subphase, CM considers the final centroids of the clusters that were produced from the previous phase (FCM phase) as initial points to form balanced clusters. Steps of the CM are as follows: 1. The distance between the initial points and nodes is determined. 2. Nodes are arranged based on their distance from the initial points. 3. The initial points select the nearest number of nodes that are equal to the threshold of the cluster value to join it. 4. The remaining nodes that are still non-jointed join the nearest initial point to construct the final clusters. This procedure ensures that the minimum cluster size is equal to or greater than the threshold cluster range with lower intracluster distance.  In an optimal situation, we assume that each of the cluster embraces 20 as the mean of these nodes. As shown in Figure 3-a, when applied to the conventional FCM, the cluster with the sky nodes has only 14 nodes, but the cluster with the green nodes has 28 nodes. The size of these clusters varies considerably from the mean value of the cluster as aforementioned. When applied to M-FCM, the cluster with the sky nodes has only 20 nodes, and the cluster with the green nodes has 19 nodes, as shown in Figure 3 Accordingly, the size of these clusters matches the mean value of the cluster as aforementioned, which is 20 nodes.

C. CHSRA
The CH selection and rotation issues have gained a great interest in researchers. Furthermore, in this study, a new algorithm has been proposed by integrating the back-off timer mechanism for CH selection with a rotation mechanism called CHSRA. In this algorithm, the CH is selected accurately by using a new objective function. Furthermore, the CH function is rotated among the members of the cluster based on a new rotation mechanism, where it is executed without any contribution to the BS.
The goal of CHSRA is to reduce the overhead by selecting the CH within members of the cluster only. Furthermore, it balances the distance among CHs in adjacent clusters by adopting the routing information in the CH selection process that leads to balanced energy consumption for CHs. Besides, the CHSRA ensures the balance in energy consumption for the successive CHs of the cluster. The CHSRA comprises two phases: 1) CH selection phase implemented by the back-off timer mechanism, and 2) CH rotation phase implemented by the dynamic threshold mechanism.

1) CH SELECTION PHASE
The back-off timer mechanism is used to select the CH, which is a distributed mechanism. This mechanism is widely used in the literature because it reduces the overhead for nodes and has the least delay in the selection process [7], [36]. In this mechanism, each node in the cluster sets its timer. The node is set as either CH or CM according to its timer (Tb) and the advertisement (ADV) message is received before the timer terminates. If the node received the ADV message from another node in the cluster, then it will cancel its timer and become CM. However, if the timer expires and the node does not receive any message, it broadcasts the ADV message and becomes a CH [37]. The timer value is set based on an objective function (F) of the node, where the timer value is the converse of the objective function as follows: This is presumably, the first time that the back-off timer mechanism is applied to select the CH within members of the cluster. In the current study, this mechanism is applied to the CH selection in all network nodes, thus, increasing time and energy consumption. Another significant contribution concerning the CH selection is to propose a new objective function for this mechanism that provides efficient distribution for the selected CHs in the network through selecting them in the optimal location. In this new objective function, the distance between a specific node to the forward CH (FCH) and the backward CH (BCH) is adopted along with the adjustment of coefficient for distances (ACD), in order to show the balance of distance between FCH and BCH and the residual energy of the node as the selection parameters for the CH selection process. This procedure ensures that the selected CH is in an optimal location according to the adjacent CHs of the other clusters. The proposed objective function relies on the aforementioned parameters rather than the residual energy of the node only [3], [36] or the residual energy of the node and distance to the BS [24], as they do not guarantee the efficient distribution of the CHs in the network. Figure 4 shows the effect of distances on the CH selection. Consequently, each node in certain clusters computes the following parameters to define the objective function F, which are: • residual energy Er to prevent selecting CH with low energy where E ini refers to the initial energy and E con refers to the consumption energy of the node. 1) Euclidean distance from the nearest forward CH(d FCH ) to reduce the energy consumption for the candidate CH (j) 2) Euclidean distance from the nearest backward CH (d BCH ) to reduce energy consumption for backward CH (j-1) 3) ACD for the node; this coefficient is responsible for showing the balance of distance between FCH and BCH.
According to these parameters, the objective function F for CH selection is The selected CH based on this proposed algorithm overcomes the following two issues: 1) The energy overhead (additional energy cost) in the CH selection process is minimized by using the back-off time mechanism with members of the cluster rather than using all nodes in the network as in the current studies.
2) The CH is selected optimally because the required criteria for a balanced energy consumption in the selection are considered. VOLUME 8, 2020

2) CH ROTATION PHASE
To solve the problem of unbalanced energy consumption for the successive CHs in the cluster, we set a dynamic threshold value for the CH rotation mechanism rather than the fixed value as in the other studies, where this value gradually increases with each process of the CH reselection. In this proposed mechanism, the energy consumed and the ratio from the initial energy (T) are used to estimate the threshold value. The first action taken by the selected CH directly after selection is calculating the value of its threshold for rotation (E TH ) based on Eq. 31.
where E con is the consumption energy of the node, E ini is the initial energy of the node, E r is the residual energy of the node, and T is a constant value of initial energy but may differ from one cluster to another subjects to the number of members in the cluster. The T value is estimated only once for the cluster throughout the network lifetime. The T value can be calculated as follows: where R CHs refers to the rounds of all CHs in the cluster at the E TH , with E rth as the residual energy of the node at the E TH value, R n represents the rounds of the member nodes in the cluster at the E TH , E CH −per−rnd is the energy consumption per round for the CH, E n−rnd is the energy consumption per round for the nodes; and E TH is the threshold value within the range from 0.1 to 0.9 of the initial energy values for the node.
The appropriate value of T represents the intersection point of the curve of all CH rounds with the curve of members' rounds.
At the end of each round, the CH in the cluster compares its energy with the threshold value that is computed based on Eq. 32. If the residual energy of the current CH is equal to or less than the threshold value, then the current CH changes. Otherwise, the CH continues its function.
Finally, the preceding algorithms are combined in the IEECP framework that represents the main objective of this study, as shown in Figure 5.
For data transmission, members of the cluster directly transmit the sensing data to their selected CH. Then, the CH transmits this data to the BS by using a multi-hop manner. This manner is considered an advantageous option used to reduce energy consumption in case of relatively long-distance transmission [28]. In this manner, we adopt the same mechanism as portrayed in the [25]; the chosen CH checks if its transmission distance to BS is less than d 0 ; it transmits the Although this mechanism will lead to some delay in data arrival at BS, it can bypass the long transmission distance for the selected CH, leading to energy saving for CHs.

D. COMPLEXITY ANALYSIS FOR IEECP
As mentioned earlier, the execution of the IEECP protocol processes occurs in two different places. The first place is the BS, where the number of clusters is computed initially based on the modified mathematical model, and then the balanced clusters are formed based on the M-FCM. The second place is the node, where the CH selection and rotation are processed based on the CHSRA algorithm.
The determination of the number of clusters does not contribute to any time complexity, hence, it is deemed suitable for real-time applications. 200508 VOLUME 8, 2020 For the M-FCM, the time complexity is O((NK 2 xI FCM )+ NK). The time complexity for FCM is O(NK 2 xI FCM ) as reported by [20], where N is the number of sensor nodes of the network, K is the required number of clusters, and I FCM is the number of FCM iterations. In M-FCM, the time complexity has been increased by NK rather than in FCM. However, due to the one-time execution of this procedure through the BS prior to the network operation, there is no contribution to the time complexity related to M-FCM at the network operation. Moreover, since the BS does not have constraints related to the memory as found in the sensors, the space complexity of the M-FCM algorithm does not constitute any obstacle at the formation of clusters. Therefore, the parts of the IEECP protocol executed at the BS do not perform any time and space complexity with regard to the network operation.
Since CHSRA is a distributed algorithm which is being applied within the cluster (it is not applied for the whole network), the member of the cluster updates its information at each round. The CH re-selection phase occurs when the energy consumption of the current CH is more than the threshold; members of the cluster (n) need to update their information to select the next CH for the cluster by relying on CHs for other clusters (K-1). Thus, the analysis of the time complexity is based on the equations given in (30). Therefore, the time complexity is O (nxI round +K-1), where I_round is the number of rounds for the node until it dies. Consequently, the time complexity for the CHSRA is identical for the linear function, which is a small contribution for time complexity in terms of CH selection and rotation processes [38]. Furthermore, the space complexity of the CHSRA is O (K 2 +50), where it is an acceptable contribution of the space complexity for processes of the CH selection and rotation.
As for the overload complexity, the nodes do not suffer from overhead during the formation of clusters due to the fact that this process occurs only once in the network, executed by the BS, getting the benefit of forming static clusters through a centralized approach. Likewise, for CHSRA, the overload for the CH selection is reduced to the maximum possible extent, as during the selection process only the node that will be the CH broadcasts ADV message to the rest of the cluster members for joining it. Similarly, for the re-selection process, only the CH broadcasts the message to the rest of the cluster members for the re-election. Therefore, the overhead complexity of the CHSRA is dependent upon the number of CH in the network, which equal to the number of clusters K. The overhead complexity of the CHSRA is constant and identical to an O(1) because K is a predefined fixed value. In other words, the overhead complexity is independent of the network size.

V. RESULTS AND DISCUSSION
This section presents the significant results of the proposed protocol implemented in Matlab. In this study, we adopted two pertinent phases of the evaluation to show the significance and reliability of this study, where the first phase involved the evaluation of the proposed algorithms separately. The second phase detailed the IEECP performance evaluation by comparing it with related works.
For the network scenario, the proposed protocol was performed on a WSN with nodes that were randomly deployed on a two-dimensional square area of length (M×M), where only one base station was situated outside the network. Other assumptions of the network are observed as follows: • Nodes, as well as the base station, are stationary after deployment.
• Nodes have equal initial energy and are not physically accessible. Thus, these nodes could not be rechargeable.
• The base station is not limited in terms of energy, memory, and computational power.
• Base station knows the identifier of all node. • Nodes are subject to the radio energy consumption model.
• Nodes know their geographical position.

A. EVALUATION OF THE PROPOSED ALGORITHMS
This section reports the ability of each algorithm to overcome the problems that drive the proposal. The proposed algorithms are separately evaluated according to some evaluation parameters, as stated in the following: VOLUME 8, 2020

1) DISPERSION MEASUREMENT OF ENERGY CONSUMPTION AMONG CHS
The performance of the proposed objective function for CH selection of the back-off timer mechanism was compared with another objective function of the back-off timer mechanism that used the energy and distance to the BS for the CH selection as contained in the DCEM [24]. In this evaluation, we used the standard deviation (STD) to measure the variance of the energy consumption in the CHs according to the objective function of DCEM and proposed objective function, respectively. Figure 6 illustrates the (STD) of the energy consumption for the CHs over the network lifetime, where the STD of energy consumption based on the proposed objective function is less than the STD of the energy consumption based on DCEM objective function. Consequently, the energy consumption of the selected CHs based on proposed objective function was much more balanced than the energy consumption of the CHs that were selected based on the DCEM objective function. While the distance to the forward CH (d FCH ) and the backward CH (d BCH ) along with the adjustment coefficient for distances (ACD) was used in the proposed objective function, the distance to BS was used in the DCEM objective function.

2) ENERGY CONSUMPTION EVALUATION FOR THE SUCCESSIVE CHS OF THE CLUSTER
The performance of proposed rotation mechanism is evaluated through a comparison with the rotation mechanisms for the EECPK-means protocol [25] based on the energy consumption for the selected CH nodes. The threshold energy for the EECPK-means protocol was equal to the sum of energy consumption for the CH when receiving, aggregating, and transmitting data to the BS for the mean number of nodes [25]; thus, it was a static value for all CHs of the cluster. As shown in Figure 7, in the EECPK-means algorithm, the first node that became a CH for the cluster consumed more than 98% of energy in its function as a CH until the threshold value was reached, and then it was transformed into an ordinary node. Its remaining energy was less than 2%, which was deemed impossible for the node to survive longer in the network. On the other hand, the last node that became a CH for the cluster consumed most of its energy as an ordinary node before becoming a CH for the cluster, enabling it to remain for a long duration of time in the network. Besides, it consumed 24.8% of energy until the threshold value was reached. This problem is common to all protocols that use a static threshold value. In contrast, due to a dynamic energy threshold reliance for the CH rotation that was gradually increasing according to the energy consumption of the node with each CH re-selection, the energy consumption was almost balanced for all the successive CHs in the cluster of the proposed protocol, as shown in Figure 7.

3) EVALUATION OF BALANCED CLUSTER FORMATION
The formation of balanced clusters in the proposed protocol was evaluated through a comparison made with the clustering algorithms in the EECPK-means [25], TACAA [14], and OCM-FCM [10] protocols based on measurement parameters used in the literature namely balanced cluster size and cost of intra-cluster distance. This is due to the clusters become more balanced at the cost of the intra-cluster distance [39]. Seven observations similar to those presented in [25] were compared, as shown in Table 2.

a: VARIATION IN SIZE OF CLUSTERS (VSC)
This parameter measured the dissimilarities of the sizes among the clusters, where the smaller the factor, the better. This condition signified that a balance existed in the cluster size.
where S j refers to cluster size (j) andx refers to the mean of the cluster size. As the network has 100 sensor nodes with  five clusters, the value ofx = 20. The results are illustrated in Table 3. As shown in Figure 8, the variation in cluster size that was formed based on the EECPK-means was less than the variation in cluster size for OCM-FCM (which was desirable in the cluster formation) and was occasionally less than the variation in cluster size for the TACAA protocol and proposed algorithm M-FCM. Likewise, the variation in cluster size for the proposed protocol was less than that of the variation of the clusters for OCM and TACAA. It was also occasionally less than the variation in cluster size for the EECPK-means protocol. However, M-FCM was deemed more superior than other protocols based on the stability in the results of the variation, signifying that the performance of the M-FCM algorithm was stable in the formation of balanced clusters for all observations.

b: COST OF INTRA-CLUSTER DISTANCE (D T )
This evaluation parameter was crucial because it showed the significance of producing balanced clusters on the total energy consumption in the network. Therefore, when forming the balanced clusters, the increase in the intra-distance should be in acceptable range and not significant. The total intra-distance DT could be computed based on Eq. 38.
where d(x i , c j ) is the distance from a node x i to the cluster centroid c j , n is the number of cluster members, and K is the number of clusters. The results are illustrated in Table 3. Based on Table 2, the OCM-FCM protocol had the lowest intra-distance of clusters because it improved only at the initial selection of the cluster's centroid. On the other hand, the intra-distance of clusters for IEECP was slightly more than that for OCM-FCM and less than that of the other protocols, where the cost of IEECP hit the highest level by the intra-distance (24.793 m) in the fifth observation. This cost was equal to 1.5% of the total intra-distance for OCM-FCM for the same observation. This cost of the intra-distance was acceptable when compared with the cost of the protocols, hence, indicating that IEECP was better than others in reducing the cost of producing balanced clusters. Overall, M-FCM of our proposed protocol was significantly superior in comparison to other clustering algorithms of the existing protocols in terms of the formation of balanced clusters.

B. EVALUATION OF THE PROPOSED PROTOCOL PERFORMANCE
The effectiveness of IEECP over the existing protocols was evaluated by means of comparison with certain selected studies based on the clustering structure factors, where these protocols were known as the EECPK-means [25], OCM-FCM [10], and TACAA [14]. In this phase of the evaluation, two different scenarios were used to indicate the scalability of the proposed protocol, as follows: • 100 nodes were deployed randomly in the 100 × 100m 2 WSN sensing area size, as the literature widely utilizes this scenario.
• 1000 nodes were deployed randomly in the 1000 × 1000m 2 WSN sensing area size, where in current times, the WSN includes a huge number of nodes that are deployed in large-scale areas. Other details are illustrated in Table 4. Some of the measurement parameters used in this evaluation were as follows: 1) NETWORK LIFETIME Two measuring parameters based on the first node dies (FND), or the last node dies (LND) were used for the network lifetime. Numerous protocols rely on the FND to measure the network lifetime [18], [14], where the is occasionally called the stable period. Furthermore, there are certain protocols that  are dependent upon the LND to measure the network lifetime as shown in [40].
There is a stable and also an unstable period of the network lifetime that is defined from the FND to LND. Therefore, we used the FND and LND in this evaluation, in addition to a new factor that showed the relationship between the stable and unstable periods, namely weighted first node dies (WFND). Whenever the WFND is high, the network has a good stable period. Consequently, for improved network performance, the highest value for FND and WFND should be achieved along with the high value for LND. The WFND was computed based on Eq. 39.
The results were based on the first scenario; the number of alive nodes over time for different comparison protocols are illustrated in Figure 9 and Table 5.
The nodes of the EECPK-means protocol possessed the lowest lifetime according to FND, and the highest lifetime according to LND among the protocols, owing to a small value of the threshold of the CH rotation for this protocol. In the same context, TACAA and OCM-FCM protocols demonstrated better FND and WFND than the EECPKmeans protocol, but with LND less than IEECP and EECPKmeans protocols. The IEECP demonstrated great stability and maintains the lives of all nodes as long as possible compared with other protocols. Furthermore, the proposed protocol possessed the highest WFND among other protocols as shown in Table 5, where all the nodes in it died in a narrow period of time, thereby, reducing the unstable period.  For the second scenario, the number of alive nodes over time for comparion with different protocols are illustrated in Figure 10 and Table 6. The results of this scenario are similar to the first one in terms of arranging the protocols for the FND and LND, where the effectiveness of IEECP over other protocols according to FND and WFND is demonstrated. Additionally, it is revealed that the EECPK-Means protocol has achieved a better LND.
The results of the second scenario indicated that the FND was reduced for all protocols even with the increase in the number of nodes in the network. This is due to the increase in the sensing area as compared with the first scenario where the number of relay points increased, signifying an increase in the overhead for the CHs, and an acceleration of their deaths. Moreover, there was an increase in the average cluster size compared with the first scenario, where the size of the message transmitted by the CH increased, signifying an increase in the energy consumption for CHs. In contrast, the LND for all protocols increased, as this is due to the increase in the total number of nodes that only impacted the extension of the network lifetime in terms of the LND metric.
Based on this experiment, there are significant differences in the lifetime among the baseline protocols according to the FND in both scenarios, as illustrated in Tables 5 and 6. The IEECP has the highest lifetime duration according to the FND and WFND, which extends the stability period of the network more than other protocols, and enhances the network performance. In contrast, there are no considerable differences in the lifetime span among the baseline protocols according to the LND.

2) STATISTICAL INFERENCE FOR NETWORK LIFETIME
In this section, the statistical inference of the network lifetime for the baseline protocols in the above scenarios was illustrated by using the paired t-test, as shown in [41].
The network lifetime followed a bivariate normal distribution when pairwise comparison for the network lifetime was performed between IEECP with EECPK-Means, TACAA, and OCM-FCM, where the lifetime of the IEECP was coupled with each lifetime for the existing protocols as pair of (xi, yi) that xi was the lifetime for IEECP and yi was the lifetime for the baseline protocol.
The lifetime for the pairwise comparison of IEECP with EECPK-Means, TACAA, and OCM-FCM was supposed to be equal based on the Null Hypothesis (H0). In contrast, the Alternative Hypothesis (H1) showed that the lifetime of the IEECP was greater than the existing protocol.
The t-test was defined with n-1 degrees of freedom as follows: t =x/(STD/ 2 (n − 1)) (40) wherex and STD refers to the mean and standard deviation of the difference of the lifetime for two correlated sized of the Comparison-protocols.
In T-test, the p refers to the probability of the calculated value with n-1 degrees of freedom. If this value is less than 0.05, the Null Hypothesis is rejected at 5% significance level. Thus, the Alternative Hypothesis was accepted at 95% confidence level. Table 7 illustrated the results obtained by using t-test of the IEECP with EECPK-Means, TACAA, and OCM-FCM, respectively. In each case, the P value was less than 0.05 and the t value was less than t-critical, so the Null Hypothesis was rejected at 5 % significance level and Alternative Hypothesis was accepted at 95% confidence level, assuming that the lifetime of the IEECP was greater than the existing protocols.

3) ENERGY DISSIPATION IN INEFFICIENT NETWORK
We used the measures of the energy dissipation for the network to demonstrate the superiority of the proposed protocol VOLUME 8, 2020 in managing energy consumption over the related works. After half of the nodes die (HND), a major change in the network topology occurs [42], and the current topology of the network becomes ineffective relative to the energy consumption and greatly weakens the network performance in the next rounds. Accordingly, after the death of half of the network nodes, the energy consumed in this ineffective network is considered as energy dissipation. In other words, the sum of the original or initial energy for the nodes is not fully used in an effective network. However, part of this energy is wasted in a network that is unable to sense the entire monitoring area. Therefore, the energy of nodes needs to be managed properly to make the network as efficient as possible.
Based on Table 5 and 6, the HND for the EECPK-means protocol occurred in round 1840 for the first scenario and in round 1460 for the second scenario. Therefore, the energy consumption for the effective network (from the start of the network operation until the occurrence of HND) was 65% of the initial energy for nodes for the first scenario and 43% for the second scenario, as shown in Figure 11 and Table 8. Accordingly, the remaining energy for nodes, which were 35% and 57% of the initial energy had been consumed in an inefficient network for the first and second scenario, respectively, as shown in Figure 12. This ratio of the remaining energy was considered as the energy dissipation for the EECPK-means protocol, which showed the highest energy dissipation among the protocols i.e. it had the lowest energy management for nodes among protocols.
Similarly, the occurrence of the HND occurred in round 1740, and 1492 for the OCM-FCM protocol, as well as 1865 and 1600 for the TACAA protocol, within the first and second scenario respectively, as illustrated in Tables 5 and 6. For the OCM-FCM of these rounds, the energy consumptions for the network were 75% and 59% of the initial energy for nodes in both scenarios. Therefore, the energy dissipation were 25% and 41% of the initial energy for the first and second scenario, respectively, as shown in Figure 12. In terms of TACAA protocol, the energy consumptions for the network were 83% and 63% of the initial  energy for nodes in both scenarios. Therefore, the energy dissipation were 17% and 37% of the initial energy for the first and second scenario, respectively, as shown in Figure 12.
For IEECP, the HND occurred in rounds 2113 and 1843 for the first and second scenario, respectively. At these rounds, the energy consumption of the nodes was 89% of the initial energy for nodes in the first scenario and 72% in the second scenario, as shown in Figure 11 and Table 7. Accordingly, the remaining energy for nodes were 11% and 28% of the initial energy for nodes had been consumed in an inefficient network for the first and second scenario, respectively, as shown in Figure 12 and Table 7. Consequently, the energy of the nodes was efficiently consumed in the proposed protocol more than in other protocols prior to the network topology changes and became an ineffective network, where the proposed protocol had the lowest energy dissipation among the protocols. The reason behind this is that the proposed protocol has the highest lifetime according to FND and WFND, which results in the HND occurring at the later rounds compared with other protocols. As a result, the proposed protocol not only prolong the network lifetime duration but also efficiently utilizes and manages the energy of nodes to ensure that the network retains an efficient operation for as long as possible.

4) NUMBER OF MESSAGES ARRIVED IN BS
The main goal of WSN was to sense the environment and then transmitted the sensed data to the BS. Therefore, when the BS received more sensing data, the performance of the network improved. Accordingly, the parameter of the number of messages received by the BS is used in this evaluation, where this parameter is important in clustering protocols such as the previous evaluation parameters [1].
As shown in Figure 13, the numbers of messages that arrived at the BS by utilizing the proposed protocol were similar to those utilizing EECPK-means, TACAA, and OCM-FCM until the 600 th round for the first scenario, and 350 th round for the second scenario.
Nevertheless, after the 600th and 350th rounds for the first and second scenario, respectively, the difference in the messages that arrived at the BS was shown among the protocols.
The highest number of messages that arrived at the BS were found in the proposed protocol, whereas the lowest number of messages were found in the EECPK-means protocol for both scenarios, although it had the highest lifetime duration according to LND. This is due to the prolonged stable period of the network which IEECP possesses, that is deemed advantageous more than other protocols.
As discovered from this evaluation, it can be seen that the EECPK-Means protocol performs better with regard to the LND. However, it has the lowest energy management for nodes and the number of messages received by the BS in both scenarios. Therefore, it is revealed that the LND parameter is not a sufficient indicator of good performance for the protocol in prolonging the network lifetime.
Accordingly, the proposed protocol shows the best performance among all the other protocols in all the evaluation parameters. This outcome is expected because the proposed protocol seeks to overcome the problems caused by inefficient and unbalanced energy consumption of nodes in the clustering protocol.

VI. CONCLUSION
In this significant work, we propose an improved energyefficient clustering protocol (IEECP) to prolong the lifetime of WSN-based IoT network through overcoming the problems of the clustering structure that adversely affect the protocol performance. Evidently, the proposed protocol reduces and balances the energy consumption of nodes by improving the clustering structure. Hence, the IEECP is deemed suitable for networks that require a longer lifetime. In general, the results yield that the IEECP performs better than the existing protocols. Our proposed protocol will be a beneficial contribution to the field that will enhance the daily operations in many areas of life, which utilize WSN in the IoT world. The energy consumption of the network is analyzed to compute the optimal number of clusters based on the distance to the CH in the case of the overlapping clusters. Then, the modified FCM algorithm (M-FCM) is proposed by combining it with a centralized mechanism to form static and balanced clusters. Finally, a new CH selection-rotation algorithm (CHSRA) is presented by integrating the back-off timer mechanism for the CH selection with the rotation mechanism for CH rotation. The CHSRA has relied on a new objective function for selecting CHs in optimal locations to balance the energy consumption among CHs for the clusters. Furthermore, it has relied on a new dynamic threshold for CH rotation within members of clusters to balance the energy consumption for the successive CHs in the cluster. In future work, we aim to enhance the protocol by improving the FCM algorithm concerning the random initial selection. Moreover, we believe that improving the objective function of CH selection through the reliance on weighted energy-based distance for adjacent CHs is also crucially significant. We anticipate that the future VOLUME 8, 2020 clustering protocol can perform excellently when these limitations are taken into consideration.