A QoS-Aware Data Collection Protocol for LLNs in Fog-Enabled Internet of Things

Improving quality of service (QoS) of low power and lossy networks (LLNs) in Internet of things (IoT) is a major challenge. Cluster-based routing technique is an effective approach to achieve this goal. This paper proposes a QoS-aware clustering-based routing (QACR) mechanism for LLNs in Fog-enabled IoT which provides a clustering, a cluster head (CH) election, and a routing path selection technique. The clustering adopts the community detection algorithm that partitions the network into clusters with available nodes’ connectivity. The CH election and relay node selection both are weighted by the rank of the nodes which take node’s energy, received signal strength, link quality, and number of cluster members into consideration as the ranking metrics. The number of CHs in a cluster is adaptive and varied according to a cluster state to balance the energy consumption of nodes. Besides, the protocol uses the CH role handover technique during CH election that decreases the control messages for the periodic election and cluster formation in detail. An evaluation of the QACR has performed through simulations for various scenarios. The obtained results show that the QACR improves the QoS in terms of packet delivery ratio, latency, and network lifetime compared to the existing protocols.

A QoS-Aware Data Collection Protocol for LLNs in Fog-Enabled Internet of Things applications [1]. A LLN is a collection of resource-constrained sensor devices deployed in an area of interest for sensing and gathering data. The collected data are sent to one or more control station called base station (BS) or gateway (GW). In a Fog-enabled IoT, the collected date are stored in a Fog server for pre-processing before sending it to cloud, so that computation, storage, and networking services can be performed locally [2], [3]. The data of LLNs are distributed for several services in IoT applications. A Fog-enabled IoT consists of three-layer architecture, that is, the LLNs of sensor nodes that is deployed on the network edge, the Fog-enabled BSs/GWs that aggregate data from LLNs and perform various operations, and the IoT middleware (MW) plays a role as a back-end cloud to provide services. The resources of the sensor devices used in a LLN/wireless sensor network (WSN) are constrained with regard to battery power, processing, storage capacity, and communication bandwidth. To address this issue, different sensor node deployment optimization [4], [5] and data collection protocols [6] have been proposed. Some of the protocols are application specific [7]- [11], while others are for general application [12]. Routing protocols based on the clustering are considered an energy-efficient technique in terms of data aggregation and sending to the BS [13]. In clustering, the deployed nodes are partitioned into groups called clusters and cluster heads (CHs) are elected in the clusters. A CH aggregates data from its cluster members (CMs) and eliminates correlated data to reduce the amount of data sent to the BS.
The clustering techniques can be classified into three main categories: static, dynamic, and hybrid under centralized or distributed process [18]- [40]. Static clustering is predefined, and it divides the network into levels, grids, regions, sectors, and so on [30], [37]- [40]. Dynamic clustering is usually selfconfigured (can be AI-assisted) and random according to the network state [18]- [29], [31]- [33]. On the other hand, hybrid clustering adopts either static or dynamic clustering according to what is needed [34]- [36]. Sensor nodes are deployed uniformly or non-uniformly in LLNs/WSNs. In most clustering techniques, a node needs to detect which cluster it belongs to. For this, either received signal strength (RSS) or global positioning system (GPS) is usually used. RSS may not serve the purpose when the environment is considered very noisy. On the other hand, each node needs to be equipped with GPS in order for its geographical location to be obtained, which might not be applicable in some environments such as inside a building or cave. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ Clustering without knowing the nodes' connectivity might not be efficient in LLNs due to lack of knowledge of location and link quality of the nodes. These problems may become severe when the environment is uneven and noisy. For example, if nodes are non-uniformly distributed over a sensor field, the number of CMs in the clusters varies significantly. This results in an imbalance in the energy consumption of the nodes. Meanwhile, the lack of knowledge about the link quality in CH election may increase the packet drop probability during communication. This may cause an unreliable communication and cause nodes to dissipate some energy in the data retransmission process. These greatly impact the QoS of the network, particularly with regard to the packet delivery ratio (PDR) and the network lifetime (NL).
To address the aforementioned problems, a QoS-aware clustering-based routing (QACR) protocol for LLNs in Fogenabled IoT is proposed. The protocol adopts the community detection technique in clustering, by which the available nodes' connectivity information is utilized. The CH election and relay node selection are based on the rank. The rank of the nodes is weighted by the residual energy, RSS, link quality and the number of CMs. Data routing is hierarchical from a CM to the BS based on the constructed routing path. This reduces the packet loss, latency, and ensures a balanced energy consumption of the nodes. The major issues are taken into consideration in clustering and routing to improve the QoS of the network.
The remaining of the paper is organized as follows. Section II discusses the data collection protocols using clustering. Section III presents the system model. Section IV describes the details of the proposed data collection technique QACR. The performance of QACR is evaluated in Section V. The concluding remarks of this work are given in Section VI.

II. CLUSTER-BASED ROUTING PROTOCOLS
The clustering of sensor nodes is usually adopted in large-scale networks. Cluster-based networks provide more reliability, better coverage, greater fault tolerance, and better task allocation and energy-efficiency [13]- [17]. Several cluster-based routing protocols for LLNs/WSNs have been well-studied and proposed in the last decade in attempts to resolve the "energy-hole" problem [12]. The low-energy adaptive clustering hierarchy (LEACH) [18] is in this category. The idea of LEACH is the clustering of nodes on independently elected CHs with a probability and rotating of the CH role to balance the energy consumption. However, it requires a large control-message overhead resulting in additional energy dissipation of the nodes. Inspired by LEACH, many protocols have been proposed in which different weight functions are considered in the CHs and route-selection processes to improve the performance over the protocol. The centralized LEACH (LEACH-C) [19], multilevel clustering EEMLC [20], EADEEG [21], RDBC [22], and EDDEEC [23] are examples of routing protocols in the LEACH family [24]. In EADEEG, the CH election is based on the ratio of the average residual energy of neighbors to the residual energy of a node itself. This balances the energy consumption of nodes and prolongs the NL over the LEACH. The drawback of this protocol is that some of the nodes may be isolated from the rest of the nodes.
The authors in [25] proposed an energy-aware distributed unequal clustering (EADUC) protocol. It addresses the nodeisolation and NL issue. The adopted CH election with a probability at time T is based on the ratio of the residual energy to the average residual energy of neighbors of a node. A CH is elected in each competition range R c and broadcasts a head message within a range R r = 2R c . The R c is varied according to the distance from the BS. If an exception happened, a node which has not received the head message from a tentative CH is elected as a CH independently. A cluster is formed with the nearby nodes defined by the distance from a CH. A relay CH is selected based on the shortest distance from the BS. Unlike previously discussed protocols, EADUC achieves an enhanced NL. The authors in [26] studied a k-connected overlapping technique in cluster formation to solve the coverage and NL issue. The CH election is based on the available energy status of the nodes. It achieves a prolonged NL compared to the other protocols. A NL maximization protocol has been proposed in [27]. When electing a CH, this protocol considers the available energy of nodes and the required energy consumption of the route towards the BS. It distributes the load of data routing to the nodes. The authors in [28] proposed a link-aware clustering protocol for event driven WSNs. In CH election, this protocol employs the predicted transmission count as a clustering metric. It ensures a reliable and energy-efficient route in data communication.
An energy-efficient routing protocol for non-uniform node distribution has been proposed in [29]. The network is partitioned into equal clusters defined by a competition range R c . The CH election and clustering policy of this protocol are similar to that of EADUC. The criterion of a relay CH selection is the higher relay value defined by the residual energy and the number of CMs in each cluster. It achieved a more prolonged NL than either the LEACH or EADUC protocol. A decentralized cluster-based routing protocol called DHCRA has been proposed in [30]. The main approach of this scheme is that CHs are elected at different levels along with the construction of the routing trees. The CH election is weighted by the residual energy and distance from the BS. The policy is intended to reduce the control-message overhead with regard to the route construction. But the CH election process can be disrupted due to the election policy, which may result in an unreliable communication at a certain round over the time. The authors in [34] proposed an adaptive and distributed clustering method called DARC. The main idea of DARC is to distribute the routing tasks among relay CHs by adjusting the routing mode of CHs. Although the periodic CH election is able to distribute the data collection and routing tasks among nodes, the energy consumption of nodes is imbalanced due to the random distribution of nodes. This method adopts a relay CH selection on the basis of two relay modes, CH with low energy and CH with high energy to address the problem. During the data communication, a CH with low energy selects one of the CHs with high energy as a relay node towards the BS. A CH with high energy transmits data to the BS directly. This process distributes the routing tasks among the CHs with high energy and balances the energy consumption of the nodes. It results in a prolonged NL compared to other protocols.
Despite the conventional CH election techniques, a hybrid unequal clustering (HUCL) protocol has been proposed to reduce the control-message overhead, so that the NL is improved [35]. It suggests a CH handover and piggybacking technique. Once a node is elected as a CH, the node hands over the CH role to other in the next round. The weight value of the CMs are piggybacked along with the local data, so that the number of control-messages broadcast regarding the CH elections is reduced. It enhances the steady state, as well as the NL. The authors in [36] proposed a modification of EADUC called IEADUC. It used the CH role handover approach of HUCL and the CH election method of EADUC. Besides, it defines a modified relay function which has been used in EADUC. The adopted relay CH selection is based on the relay value, where the residual energy of a CH, number of CMs, and energy consumption in data processing to a next hop towards the BS are utilized. The IEADUC protocol achieves more enhanced NL than EADUC and HUCL.
A clustering hierarchy protocol (CHP) based on particle swarm optimization (PSO) algorithm has been proposed in [37] to improve the NL. Three different types of nodes CHs, relay nodes (RNs) and common nodes (CNs) are defined in the network. The CHs and RNs are selected by the BS using the PSO algorithm utilizing the fitness functions that include nodes' residual energy and location from the BS. A node with higher residual energy and near the BS is preferred to be selected as a CH. The clusters are formed based on the RSS on received advertisement messages from the selected CHs. A CH selects an RN towards the BS. The protocol improves the NL than that of other protocols. An energy centric cluster-based routing (ECCR) protocol has been proposed in [38]. It divides the network into a number of static grids called clusters. It adopts the handover technique inspired by the HUCL protocol, but the weight functions of CH and relay node selection differ. In CH election, the weight function includes the residual energy and the average distance among CMs. The adopted route selection includes the factors that have the major influence in energy consumption of the relay nodes. The ECCR protocol improved the NL more than the HUCL and IEADUC protocols. However, the existing protocols emphasized to prolong the NL only, where the PDR and latency of QoS have not taken into consideration.

A. Fog-Enabled IoT Architecture
The architecture of the IoT system assumed here is threelayer consisting of LLN sensor nodes, IoT-GW, and IoT-MW as shown in Fig. 1. The LLN, where the IoT-devices are deployed and collect data locally. The IoT-GW contains the BSs/GWs that play the role of a Fog layer. This layer belongs to the nearby LLN to process data aggregated by the network. The IoT-MW plays the role of the back-end cloud with the duty of virtualization, data storage and service providing. The components of the architecture are described subsequently.
1) LLN: It is comprised of resource-constrained sensor nodes that are connected to each other wirelessly. The specification of the devices is standardized by the IEEE 802.15.4. The sensor nodes are deployed in a LLN can be analog or digital, and able to sense data such as temperature, humidity, light, noise, and so on. The data are transmitted to the BS/GW within a range for further processing.
2) IoT-GW: The IoT-GW is positioned as a bridge between the LLN and the MW. It handles various functionalities from sensor node deployment to the data forwarding, including (de-)registration of sensor nodes, network maintenance, data collection protocols and data communication to the MW. Necessarily, it acts as a communication hub of the two other layers. Depending on the applications and the IEEE standard offered, it enables connectivity to the LLN. On the other hand, connectivity to the MW is provided by the Ethernet or ISM band (i.e., Wi-Fi, Bluetooth, ZigBee, etc.) interface of a running operating system.
A Fog-enabled GW is used to (re-)register sensor nodes and collects data from the LLN. Sensor nodes are registered to the GW by sending a registration-request message providing necessary information such as ID in IPv6. The GW stores the information in its database and sends the message to the MW. Furthermore, for a service request provided by an appropriate communication protocol (i.e., HTTP, FTP, etc.) from the MW, it manages the (de-)registration and (de-)activate of a node in the network. For time-critical applications, it processes data (i.e., optimization, averaging, pattern recognition, etc.) locally to provide QoS. Additionally, it includes various security measures and reliability in terms of connectivity between the MW and the GW. This ensures the local logging and sending data to the MW to perform uninterruptedly.
3) IoT-MW: The IoT-MW is a back-end cloud where services are requested and provided. Along with the security and privacy features, the services include data processing and availability, and virtualization of data. The implementation of MW is depended on the application independent software, its openness features and design selection criteria defined by the IoT architecture. The functional components can be pointed as follows: 1) Handling the service requests from the applications through validation of the virtual entities that are interested of the applications. 2) Creating and managing the digital representation of the sensor nodes. 3) Creating and managing communities of sensor nodes, which offer advanced services to the users. 4) Managing and processing data (i.e., optimization, averaging, filtering, pattern recognition, etc.) based on application requirements and service request.

B. LLN Model and Assumptions
A LLN comprises of N sensor nodes S = {s 1 , . . . , s N } and a BS. The nodes are deployed in an area of interest M to monitor the environment. Once the nodes and the BS are deployed, they are static. Each node has an ID number, and nodes are location unaware. Considering the initial energy, the nodes can be homogeneous or heterogeneous. The nodes and BS are capable of adjusting their transmission power in accordance with the distance to the desired recipient [41], [42].

C. Energy Consumption of LLN Model
The radio energy dissipation of LLN model used in this paper is referred to [37], [43]. A node consumes energy in transmitting E TX and receiving E RX of l-bits data over a radio range d according to Eqs. (1) and (2), respectively. Depending on distance d, the energy consumption of a node in terms of E TX and E RX are defined by the free space ε fs and multipath fading ε mp frequency models where the d 2 and d 4 power loss are used, respectively. A CH consumes E da (nJ/bit/signal) in data aggregation and E com (nJ/bit/signal) in processing the values (i.e., rank and residual energy) sent by a CM.
where, E elec is the power consumption of the transceiver circuits. The signal is amplified that depends on the d and the

D. Data Aggregation Model
Data aggregation compresses an amount of data into a packet of fixed-size, regardless of the number of packets received. The data compressibility model used in this work is similar to that of in [14]. In this model, the data of CMs are aggregated at their respective CH prior to transmission to the BS. During the network operation, each CM's radio is turned off until the transmission time assigned to the node. Also, it needs to be turned on for the CH until all data is received from the CMs.

IV. QACR PROTOCOL DETAILS
The QACR operation starts with the network initialization phase. In this phase, a set of information of deployed nodes such as RSS from BS, RSS from neighbor nodes, and link quality with neighbor nodes are collected. Then the nodes are partitioned into clusters. The clustering is carried out by the BS. The number of clusters and the CMs is static. The remaining operations of the protocol are conducted in regular rounds. Every round comprises the CH election and data communication phases. CMs collect local data and transmit to their respective CH. The CH aggregates data from its CMs and transmits the data to the BS using single hop or multi hop that depends on the distance of the BS. The data communication phase should be longer than the CH election phase to minimize the control-message overhead and enhance the NL. Several types of control messages used during the protocol operation are shown in Table I.

A. Network Initialization Phase 1) Information Collection:
The network initialization phase starts through the information collection. At the beginning of this phase, the BS broadcasts a Hello_Msg 1 multiple times at a certain power level over the sensor field. The power level is adjusted in accordance with the network area. Nodes receive the messages, measure the approximate RSS (BS , s i ) and compute the approximate distance d (s i , BS ) based on the RSS [44]. Then each node broadcasts a Hello_Msg 2 using a radio range R n , where R n is less than or equal to the maximum transmission range R max of a node. The nodes reside the range, receive the message, measure the RSS (s j , s i ), compute the approximate distance d (s i , s j ), and count the number of retransmissions of the message. A node lists its neighbors' information in its neighbor table (NT). Two nodes (s i , s j ) are said to be neighbor if a node s i receives the message from a node s j directly.
Once the Hello_Msg 2 s broadcast is completed, each node calculates the expected transmission count (ETX). The channel condition of wireless links varies among nodes according to the environment. The data delivery is likely to fail through an unreliable link that leads to packet retransmissions. ETX is the expected number of transmissions required by a packet to be delivered successfully [45]. It is a metric of link reliability and usually used to evaluate the level of link quality. A link quality increases with decreases the value of ETX. The expected bidirectional transmission count of a link between node s i and node s j can be defined as in Eq. (3), which has been proposed in [24].
where, p f (s i , s j ) and p r (s i , s j ) represent the forward and reverse delivery ratios from node s i to node s j , respectively. The forward delivery ratio is the measured probability that a data packet is received successfully by the recipient. The reverse delivery ratio is the measured probability that the acknowledgement packet is successfully received. The number of packet retransmissions of a dropped packet is restricted by a threshold TH RC . Unlike the previous work, the proposed QACR defines the link quality between two nodes as 'good' or 'bad' that are referred by LQ(s i , s j ) = 1 and LQ(s i , s j ) = [0, 1), respectively. Each node calculates the LQ with the neighbor nodes and updates its NT. The LQ is defined by the ETX as in Eq. (4).
When the NTs of the nodes are completed, each node shortlists its neighbors to a subset called neighbors in an adaptive range table (NAT), where an adaptive range R adp is less than or equal to R n . Each node sends its NAT along with a Hello_Msg 3 to the BS. If the BS is out of R max from a node s i , the node selects a node with the higher RSS from the BS belonging to its NT as a relay node towards the BS. The packet forwarding is continued until the packet reaches the BS. The BS constructs an adjacency matrix A(S, E) of size N × N based on the received information of the nodes. Fig. 2 illustrates an example scenario of the message sending from a node to the BS. As seen in the figure, RSS of the nodes from the BS are different in accordance with their locations. A node s 4 is out of R max from the BS and it selects a relay node s 3 having the higher RSS(dBm) from the BS among the neighbor nodes s 1 , s 2 and s 3 .
2) Clustering and Membership: The clustering is based on the community detection algorithm [46]. A community detection algorithm is often used to detect partitions of nodes in a network (i.e., computer network, social network, biological network, etc.) that are more densely connected internally than with the rest of the network. The BS detects the communities called K-clusters using the modularity maximization algorithm. The modularity maximization is an optimization method of clustering that defines the number of clusters based on a quality function Q. The clustering process is recursively partitioning the graph G(S, E) into two subgraphs and repeatedly applying the same procedure to the subgraphs. The value of K is defined by the quality function Q(G, C). When the value of Q(G, C) is a constant (i.e., a value of Q(G, C) is an interval of [0.5, 1]), the K is a reciprocal function of R adp . An R adp can be defined by the node density factor γ (the higher the density, the lower the value of γ). The detail of the clustering is as follows: Let's consider a graph G is used as a synonym of a network; cluster as a synonym of a community; clustering as a synonym of community detection.

5)
In particular, we write E i k = E kk and E g k = ∪ l =k E kl . In other words, the set E i k contains the internal edges of c k , with both their ends belonging to the same cluster, while the set E g k contains the external edges of c k , which have one end and the other end in S − c k , the set of nodes which do not belong to c k . 6) A community can be defined in terms of a quality function. A quality function is a function Q(G, C) (i.e., it depends on the both graph G and C), the value of which characterizes how good C is as a partition of G. Hence the best composition of G into communities is the partition C * = {c * 1 , . . . , c * K } which maximizes Q, for instance, C * = argmax C Q(G, C ). And then good communities are the elements of a good C, for instance, which achieves a high Q(G, C) score. Obviously, this definition of communities depends on the particular quality function Q used. A large number of quality functions have been proposed in [47]. The QACR protocol uses the Girvan-Newman modularity maximization [46], the most popular quality function. The Q(G, C) is defined as in Eq. (5).
where, m k = s i ∈c k s j ∈S A s i , s j is the number of edges of a cluster c k and m is the number of edges of G(S, E).
Once the clustering is completed, the BS calculates the rank of each cluster. The rank of a cluster is defined by the total RSS of N nodes from the BS and the total RSS of CMs in the cluster from the BS as in Eq. (6). A higher rank of a cluster indicates that the probability of the CMs nearer to the BS with good link quality (to some extent) is higher. The BS multicasts the individual clusters' information along with a Hello_Msg 4 to the CMs. A CM receives the message and lists the information in its cluster member table (CMT).
where, n is the number of CMs of cluster c k , k = 1, 2, 3, . . . , K , RSS (BS , s i ) denotes the RSS of node s i from the BS, and RSS (BS , s j ) denotes the RSS of node s j ∈ c k from the BS.

B. Cluster Head (CH) Election Phase
The number of CHs in a cluster is adaptive and depends on the different cases of a cluster state. The community detection algorithm does not consider the distance between nodes. Instead, it uses only the link information of the nodes. Therefore, some of the CMs can be out of R n from other members in a large cluster. However, the CH election in each cluster depends on the following cases in two ways. The election is based on the rank of the CMs in all cases.
At the first round (r = 1), each CM calculates its rank and broadcasts the value of rank and residual energy by a Node_Msg within its R min , where R min is the distance of the farthest alive CM from the node in a cluster. A CM updates its CMT on received the message. The rank of a CM is defined by its residual energy, average RSS from CMs, and average link quality with CMs as in Eq. (7). The equation states that a CM having higher residual energy, higher average RSS, and higher average link quality has the higher rank. If the higher the rank, the better the option to achieve a better QoS during intra-cluster communication.
where, α is the weight factor between the value of (0, 1], E res (s i ) is the residual energy of node s i , E max is the maximum initial energy of nodes S, n is the number of alive CMs belonging to the cluster of node s i , RSS (s j , s i ) denotes the RSS of node s i from node s j , and LQ(s i , s j ) denotes the link quality between node s i and node s j . The factor α can be defined based on the QoS requirements of the intra-cluster communication.
Unlike the first round, from the second round (r > 1), a former CH of the previous round (r − 1) hands over the CH role to a next CH in a current round r. If the local data or the other information (i.e., rank and residual energy) has not been sent form a CM to the associate CH during the communication, the node is considered a missing node or a depleted node and will be eliminated from the remaining operations. In an exception, for example, if a Schedule_Msg has not been broadcasted from a tentative CH during the assigned time, the competition of the CH election in the cluster is similar to that described in the first round. Otherwise, a CH is elected by a former CH in a cluster recursively over the rounds.
Case 1: If all CMs in a cluster are within the R n from any other CM and the nodes with flag = 'FC' (i.e., see Cluster 1 and 2 in Fig. 3), the higher ranked CM is elected as a CH. If there are multiple CMs have the same rank in a cluster, the residual energy of the nodes plays a role as the tie breaking metric, the higher residual energy obtained node is selected as a CH. An elected CH broadcasts the time division multiple access (TDMA) slots for the CMs by a Schedule_Msg. The number of slots is assigned according to the packet drop probability of a CH. Case 2: If some CMs in a cluster are not within the R n from other CM and with flag = 'NFC' (i.e., see Cluster 3 in Fig. 3), the number of CHs elected in the cluster varies according to the following three conditions. Condition 1: A CM with flag = 'FC' has the higher rank and belongs to an intersection region in a cluster and is elected as a CH (see Table (a) in Fig. 4). The rest of the process is same as in Case 1.
Condition 2: If multiple independent sets in a cluster exist, several CHs can be elected in the cluster. An example scenario can be seen in Cluster 3 in Fig. 3. The sets of nodes {s 1 , s 2 , s 3 } and {s 6 , s 7 , s 8 } are two independent sets after subtracting the subset {s 4 , s 5 } from the set {s 1 , . . . , s 8 }. Herein, the nodes with the higher ranks can belong in different independent sets (see Table (b) in Fig. 4), therefore, they are elected as the CHs intuitively. Each elected CH broadcasts a Shcedule_Msg within its R min . The CMs in the intersection region receive the message(s) and decide the CH with the higher rank to join. A CM in the intersection region keeps the information of all elected CHs in the cluster.
Condition 3: A node may obtain the higher rank in a subset (i.e., s 4 in {s 4 , s 5 } ∈ {s 1 , . . . , s 5 }). But the node may not be elected as a CH due to the possibility that another node has the higher rank in another subset (i.e., {s 4 , . . . , s 8 }, see Table (c) in Fig. 4). Herein, the higher ranked node is elected as a CH and broadcasts a Schedule_Msg for the CMs in the cluster. The CMs within R n receive the message and update their CMT. It may happen that a number of CMs can be out of R n from an elected CH in a cluster; therefore, the message has not received by the nodes, it results in that the nodes are unaware of the elected CH. In this condition, an out-of-ranged node selects a CM with the higher rank and flag = 'FC' from its CMT and sends a Join_Msg to the node to become a relay node towards the CH. On receiving the message, a relay node sends back a Schedule_Msg to the sender according to the time slot that has already been assigned by the CH. A relay node receives the local data from the node and forwards the packet during the allocated time slot.
Case 3: If there is only one node alive in a cluster, the node is elected as a CH without any competition. The data aggregation does not take place at the node.

C. Data Communication Phase
The data communication phase is divided to two subphases as follows.
1) Intra-Cluster Communication: During the allocated time for each node, a CM sends its local data to the respective CH using R min , where R min is the distance between a CM and the CH. A CH aggregates the received data into a packet called aggregated data. It is assumed that the sensory data of the nodes is highly correlated. If there is no exception (i.e., multiple CHs can be elected in a cluster, as in Condition 2 in Case 2), the values of rank and residual energy of a CM are piggybacked along with the local data sent to the CH. Otherwise, it also sends the values without data to other CHs during the assigned time slots by the CHs, accordingly.
2) Inter-Cluster Communication: Once the data aggregation is completed, inter-cluster communication can begin. During this time, the routing paths are constructed and aggregated data are forwarded towards the BS. In routing, a relay CH is selected based on the rank of the CHs. The rank of a CH is defined by its residual energy, RSS from the BS, and number of CMs. Each CH calculates its rank by using Eq. (8) and broadcasts it along with a Route_Msg within R max . The CH resides the range, receives the message and updates its routing table (RT). The ranking function ensures that a CH having higher residual energy, higher RSS from the BS, and minimum number of CMs has the higher rank. The higher the rank, the better the option to achieve a better QoS during inter-cluster communication. If the BS is out of R max from a CH, the CH selects one of the higher ranked CHs and belongs to a higher ranked cluster from RT as a relay node. When there are multiple CHs which have the same rank, the residual energy plays a role as the tie breaking metric, the higher residual energy obtained CH is selected as a relay node. The data received by a relay node from other CHs are not aggregated to send to the BS.
where, β is the weight factor between the value of (0,1], E res (CH i ) is the current residual energy of CH i , n is the number of alive CMs belongs to the cluster of CH i , E max is the maximum initial energy of nodes S, and RSS (BS , CH i ) denotes the RSS of CH i from the BS. The factor β can be defined based on the QoS requirements of the inter-cluster communication.
If a data packet drops at a receiver, an automatic repeat request ARR_Msg is sent to the sender node. On receiving the message, the sender node retransmits the packet to the receiver. Each node stores a packet until (defined by the TH RD ) it is successfully received by the receiver. It may happen that a packet needs to be retransmitted a large number of times due to the bad quality of a link between nodes. To resolve this issue, there are many solutions that can be considered [48]. We consider a threshold TH RD that can be defined by the link conditions and required QoS. If the number is exceeded, the packet is considered to be a lost packet.

A. Simulation Setup
The simulations were conducted in MATLAB R2018a to evaluate the performance of the protocols. Two network topologies of Scenario 1: random with uniform and Scenario 2: random with non-uniform node distribution were considered. The nodes were heterogeneous in terms of initial energy. The weight factors α and β were set to 0.5 for both scenarios in QACR. The value of the factors for the other protocols was set accordingly. The R n and R adp were adjusted according to the node density and to avoid the frequent situation (nodes are not fully-connected in a cluster) that has been mentioned in Case 2 in Section IV, respectively. Table II shows the common parameters used in the simulations. Simulation results were averaged over 100 runs.

B. Simulation Results
The following performance metrics are evaluated and compared to the existing protocols for various network parameters. The parameters used in the simulations for various nodes N, packet sizes, and BS location are shown in Tables III, VII, and IX, respectively.

1) Packet Delivery Ratio (PDR):
The PDR is the ratio of successfully received data at BS to the data sent. The PDR of ECCR, CHP, EADC, and EADUC are random due to the CH election and route selection policy of the protocols (i.e., the RSS and link quality are not considered), where a higher PDR is not guaranteed compared to QACR. In a random case, if the random noise is distributed over the network, the link conditions are varied among nodes. In this case, if N is constant, the packet drop and loss probability increase with increasing  the number of elected CHs and the average number of hops towards the BS during communication. On the other hand, when the number of CHs is not varied significantly according to N, the metric also decreases with increasing the density of nodes due to the probability of the nodes associated with bad link quality to become the CHs increases. It is to be mentioned that, the average number of CHs in every round for QACR, ECCR, CHP, EADC, and EADUC were around 7, 14, 16, 5, and 10 respectively. The number of the CHs of QACR was higher than that of EADC and lower than that of ECCR, CHP, and EADUC. The CH election of the five protocols was periodic. The cluster formation of QACR and ECCR were static, whereas it was periodic in other protocols.
The RSS and link quality along with the residual energy of the nodes in CH election in QACR play an important role to reduce the number of packets drop and loss. It is intuitive that when the distance between nodes is equal and the signal-to-noise ratio is higher, the RSS among them is higher. Nodes with higher RSS, the probability of packets drop is lower. Furthermore, if the noise is less and/or constant, the average RSS between nodes is higher as the average distance between them decreases, which ensures a minimum cost during communication. Fig. 5(a-b) provides the simulation results of PDR of the protocols. The results show that the PDR of the five protocols decrease with increasing the value of N. Because increasing of N results in increasing the number of packets in the network, the probability of packet collisions also increases. The comparison of the protocols concerning the PDR has been given in Table IV. The results depict that the QACR achieves a higher PDR than that of other protocols under the scenarios.
2) Latency: It measures the delay in the data reaching their destination across the network. We consider the delay proportional to the average number of hops in the data packets routing from CHs to the BS. The routing path selection of the protocols is according to their routing policy. The ECCR constructs a route among the relay nodes that are within R max to each other, where the R max is defined by the grid (cluster) size. According to the protocol, the value of R max decreases with increases the value of N. The lower the value of the R max , the higher the number of hops required to construct a route. Unlike the ECCR, if the BS is within the communication range of all the nodes, the CHP always constructs a route from a CH to the BS using 2-hops. Each CH selects a dedicated relay node to send data to the BS. The EADC constructs a route with the goal of the load balancing during relay node selection which incurs to construct a route with a higher number of hops. On the other hand, the EADUC constructs a route based on the shortest distance where the number of hops is not taken into consideration. Although the constructed routes are shortest paths, some of the routes are constructed with a higher number of hops. In contrast, the routing policy of QACR utilizes the maximum capacity of the transmission range of a node to collect the relay nodes' information. The relay node selection criteria include the ranking metrics (the RSS in particular) and the rank of a cluster that ensures a reliable route with a minimum number of hops. Fig. 6(a-b) provides the simulation results of the average number of hops of the protocols. The results show when the  node density is not higher (i.e., N = 100), the ECCR constructs routes with a minimum number of hops compared to QACR. The metric of ECCR increases with increases the value of N due to the number of clusters of ECCR increases compared to QACR (according to the clustering policy of the protocols). Table V compares the protocols in terms of the metric. These comparative results clearly demonstrate the QACR is able to construct routes using a minimum number of hops under the scenarios.

3) Network Lifetime (NL):
The steady state and NL are defined by the time the first node dies (FND) and a number of alive nodes (here, 60%), respectively. Unlike the clustering of QACR (community detection) and ECCR (static grid), the other protocols form cluster periodically based on the Voronoi cell which is defined by the transmission range of the advertisement message. Although the range of the cluster formation is equal, the clusters and the number of CMs are not distributed properly due to the random location of the elected CHs. Thus the energy consumption of the nodes is imbalanced. Meanwhile, some of the nodes are elected as CHs and participate as relay nodes towards the BS over the rounds repeatedly. In this regard, the nodes consume energy more frequently than other nodes and die early. This problem becomes severe when the nodes are distributed non-uniformly. In addition, the periodic cluster formation technique of CHP, EADC, and EADUC increases the control-message overhead throughout their NL and the nodes dissipate a significant amount of energy. In contrast, the clustering of QACR distributes the clusters and the number of CMs properly. It results in that the energy consumption among the nodes is distributed properly and the nodes save energy with the less control-message overhead. However, recall the CH election of the protocols, where the RSS and link condition are not considered, the nodes dissipate an amount of energy due to the higher number of retransmissions of the dropped packets which has an impact on the metrics of the protocols. Unlike the other protocols, the nodes save energy regarding the less number of retransmissions by selecting the CHs associated with higher RSS and good link quality in QACR.
Figs. 7(a-b) and 8(a-b) provide the simulation results of steady state and NL of the protocols. As seen in the figures, although the metrics of QACR are higher with compared to the other protocols, the steady state of QACR tends to decrease after a certain N. This comes from that if a higher number of nodes is out of R max from the BS, some of the nodes participate as relay nodes to relay the nodes' information towards the BS during the network initialization and consume a significant amount of energy. It results in that the nodes die early, which cause a shorter steady state of the network. However, the comparison between QACR and the other protocols in terms of the steady state and NL has been given in Table VI. The results show that the QACR improves the metrics than that of other protocols under the scenarios.
Figs. 9(a-b) and 10(a-b) provide the simulation results of steady state and NL of the protocols for different packet sizes. During the simulation, the average number of CHs in every round for QACR, ECCR, CHP, EADC, and EADUC were around 5, 6, 7, 4, and 9, respectively. If N, network area and location of the BS are constant, the packet size does not have a greater impact on the performance of the protocols concerning the PDR and latency, but the NL of the protocols are varied significantly. The results depict that the NL of the five protocols decrease with increasing the packet size as the transmission and processing cost increase. The comparison between QACR and the other protocols regarding the steady state and NL has been given in Table VIII. The results show that the QACR outperforms the protocols in terms of the metrics under the scenarios. Fig. 11(a-b) provides a randomly selected simulation result of steady state and NL of the protocols for the location of the BS at center (BS is within the maximum range of all the nodes). During the simulation, the average number of elected CHs in every round for QACR, ECCR, CHP, EADC, and EADUC were around 4, 7, 3, 3, and 6, respectively. If N, network area and packet size are constant, the NL of the five protocols increase with decreasing the average distance of the nodes from the BS. The comparison between QACR and other protocols in terms of the steady state and NL has

C. Discussion
Before drawing conclusions, we first highlight the key points in our proposition. Then we further discuss some design limitations and extension of QACR for future improvement. Based on the performance analysis, it is clear that QACR has overall superior performance in comparison with other protocols under various scenarios. The performance supremacy comes from several factors shown in Table XI are summarized  as follows: 1) The metrics are used in clustering, CH election, and route selection in QACR to utilize the network resources  effectively, then to achieve a better QoS of the network. Residual energy metric is to select a node having high energy, RSS to measure the approximate distance and link condition (to some extent), link quality to provide reliability, and number of CMs to select a relay CH of minimum CMs. 2) The adopted clustering technique distributes the nodes properly with available nodes' connectivity in the clusters that reduces the number of packets loss and balances the energy consumption of the nodes. 3) The static clustering and CH role handover technique similar to ECCR reduces the control-message overhead and energy dissipation of the nodes for the periodic CH election and cluster formation. The criteria of CH election and relay node selection include the most influential factors of the QoS related with the data communication that achieves a higher PDR, lower latency and prolonged NL.
While QACR proved its efficiency in the simulation tests, there are still some aspects that need further analysis. In our proposed CH election and relay node selection, the value of α and β were constant throughout the network lifetime. The value of the factors could be optimized and dynamic according to the network states using some latest techniques such as machine learning. However, it should be reasonable for resource constrained devices in LLNs. The QACR was described by which the link quality and the packet drop probability of the nodes are defined by the Gaussian random noise in static. The link condition can be dynamic and varied over time. This might have an impact on the performance of the protocols. In view of the significance and dynamic usability of this protocol for LLNs, we leave this issue for future improvement.

VI. CONCLUSION
This paper presents a QoS-aware data collection protocol for LLNs in Fog-enabled IoT. It applies an effective technique in clustering, CH election, CH re-election and route selection for the network. The clustering adopts the community detection algorithm which is considered more resilient than other clustering techniques. The CH election and route selection are defined by a set of metrics that reduces the packets loss, number of hops in data routing and balances the energy consumption of the nodes. As a result, the network can sustain a prolonged lifetime with a higher packet delivery ratio and lower latency. Two different network topologies of a LLN with uniform and non-uniform node distribution were considered and tested over the described experimental setup. The results validate that QACR protocol improves the QoS of the LLN as compared to the existing protocols under various scenarios.