Reliability-Driven End–End–Edge Collaboration for Energy Minimization in Large-Scale Cyber-Physical Systems

In recent years, cyber-physical systems (CPS) have been widely deployed in industrial manufacturing fields and our daily living domains. End–end–edge collaboration, coupling mobile edge computing and device-to-device communication, is a promising computation paradigm to meet the stringent real-time demands of large-scale CPS applications. However, energy and reliability concerns should be carefully addressed in end–end–edge collaboration-empowered large-scale CPS due to the limited energy supply and inherent openness characteristic of end devices. In this article, we explore the reliability-driven energy optimization of end–end–edge collaborated large-scale CPS applications. We develop a reliability-driven end–end–edge collaboration approach to deal with the energy minimization problem. Our approach first designs a clustering method to quantify differentiated energy demands by analyzing the energy dissipation composition of heterogeneous applications. Afterward, our approach leverages incremental control and swarm intelligence-based techniques to obtain energy-efficient reliability-guaranteed task offloading solutions for differentiated application clusters. Experimental results reveal that our approach achieves 51.48% energy savings compared with peer algorithms.

A variety of large-scale CPS applications, e.g., smart grid, industrial control systems, intelligent transportation, and personalized healthcare, have been widely deployed across diverse domains.Oftentimes, these CPS applications have stringent real-time requirements since delayed outputs may incur unacceptable timing faults [2], [3].
Recently, an appealing paradigm of end-end-edge collaboration [4] coupling mobile edge computing (MEC) and device-to-device (D2D) communication techniques has attracted widespread attention.In the common MEC paradigm, real-time tasks on an individual end device are able to complete their execution locally or offload computation instructions to MEC servers.However, this computing paradigm cannot well handle the heterogeneity of end devices during task offloading procedures, resulting in unbalanced resource utilization among end devices in a local network [5], [6].Unlike the MEC paradigm, the D2D communication technique permits end devices with high resource utilization to request nearby end devices with underutilized resources to facilitate task execution.Consequently, MEC and D2D communication techniques are complementary to each other, which inspires us to develop the end-end-edge collaborated task offloading method.In this integrated paradigm, real-time tasks on an end device can offload their computation to either MEC servers or adjacent end devices for accomplishment.To fulfill task real-time demands, it is natural to incorporate the end-end-edge collaboration method into the design of largescale CPS.
Nevertheless, energy management should be carefully conducted due to the economical deployment and maintenance concerns in the large-scale CPS.To this end, considerable works [7], [8], [9], [10] have been devoted to developing energy-aware endend-edge collaboration solutions.For instance, Cao et al. [7] developed an energy-efficient framework based on convex optimization techniques to minimize the energy consumption for both binary and partial CPS computation offloading scenarios.Kai et al. [8] devised a two-stage method to optimize the number of executed tasks (i.e., system throughput) under energy constraints.Yang et al. [9] proposed a game theoretic offloading approach to jointly optimize the response latency and energy dissipation of independent tasks.Leveraging the alternating direction method of multipliers algorithm, Sun et al. [10] aimed to minimize the task response latency under energy budget constraints.However, all the aforementioned works [7], [8], [9], [10] fail to take the intertask dependence into energy optimization.In addition to energy management, reliability augmentation is also a hot topic in CPS environments because real-time tasks are vulnerable to both bit and soft errors arising from the inherent openness characteristic of end devices.From the perspective of reliability optimization, Naithani et al. [11] presented an online task scheduler to optimize the system overall reliability by analyzing the reliability features of running applications.Ansari et al. [12] put forward a two-stage scheme to wisely determine the optimal replicas of individual real-time tasks.Li et al. [13] demonstrated a feedback controlling method to improve the reliability of EtherCAT networks.An extremal optimization theory-based heuristic algorithm was designed by Savino et al. [14] to augment the system resiliency to soft errors.Leveraging the processor-merging technique, Hu et al. [15] exhibited an energy-efficient task scheduling approach to reduce the system energy dissipation under timing and reliability constraints.Cao et al. [16] investigated the joint optimization of response latency and processor wearout under reliability and energy constraints for large-scale CPS.
As an intuitive presentation, we compare the related works [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16] in Table I.As shown in the table, existing works fail to jointly bring the end-end-edge collaboration paradigm and reliability concerns into the energy optimization of the large-scale CPS.In this article, we investigate the reliability-driven energy optimization of end-end-edge collaboration-empowered large-scale CPS.Particularly, we consider both the intertask dependence depicted by the directed acyclic graph (DAG) and the concurrent task offloading of multiple DAG applications.In summary, we make the following main contributions.
1) We design a clustering approach to quantify differentiated DAG energy demands by analyzing the energy dissipation composition of heterogeneous DAG applications.2) We devise an incremental control-based task offloading scheme for computation-intensive DAG clusters by analyzing the computation energy optimality.3) We develop a swarm intelligence-based task offloading scheme for communication-intensive DAG clusters by integrating a novel DAG type transformation technique.

4)
We validate the performance of our approach on both synthetic and real-life DAG applications.Experimental results show that our approach achieves 51.48% energy savings compared with peer algorithms.The rest of this article is organized as follows.Section II introduces the system architecture and models.Section III presents the problem formulation and solution overview.In Section IV, we present a DAG clustering method.In Sections V and VI, we devise task offloading schemes for computation-intensive and communication-intensive DAG clusters, respectively.We evaluate our solution in Sections VII and VIII.Finally, Section IX concludes this article.

A. System Architecture
We consider an end-end-edge collaborated large-scale CPS architecture consisting of M end devices a base station B, and an edge server S. For each end device D m (1 ≤ m ≤ M ), its computing capacity is depicted by a processor Θ m with supply voltage v m and operating frequency f m .The edge server S is empowered by the container-based virtualization techniques that support virtual packaging and isolation of different applications [17].Let C = {C 1 , C 2 , . . ., C H } be a collection of total H available containers created by edge server S.Then, every container C h (1 ≤ h ≤ H) can be depicted by a tuple C h : {b h , f h }, where b h is the communication bandwidth and f h is the operating frequency.Note that base station B is generally deployed near the location of edge server S, and it acts as a global governor grasping the whole system information, e.g., routing selection, workload, end device states, etc.

B. Application Model
Support that every end device D m is associated with an application described by DAG volume from task τ m,i to task τ m,p .If task τ m,p is a direct successor of τ m,i , e m,i,p is hence set to 1, i.e., task τ m,p cannot start its execution until it receives a total amount of η m,i,p communication data from task τ m,i .For every task τ m,i , its characteristics are depicted by a tuple τ m,i : {μ m,i , W m,i , Υ m,i , T m , R m }, where μ m,i ∈ [0, 1] is the task activity factor [18], W m,i is the number of CPU instruction cycles, Υ m,i is the data volume of CPU instruction cycles, T m is the common deadline, and R m is the reliability goal.In end-end-edge collaborated systems, task τ m,i is equipped with local, D2D, and remote execution modes.In the local execution mode, task τ m,i should completely finish its execution on end device D m without the assistance of edge server S or other end devices.In the D2D execution mode, task τ m,i is able to offload its computation via D2D links to another end device D π (1 ≤ π ≤ M, π = m), and then, asks end device D π to accomplish task execution.Similar to the D2D mode, the remote execution mode allows task τ m,i to transmit its computation to edge server S, and then, request edge server S to perform task execution.Since base station B is a global governor, it needs to decide the task execution modes according to the system information.

C. Reliability Model
We consider the occurrence of both bit errors and soft errors.Specifically, bit errors primarily occur on communication links due to ambient interferences or bit synchronization errors [13].Let R m,i,ρ biterror be the capacity of task τ m,i in tolerating bit errors, where ρ ∈ {m, π, h} indicates the execution mode of task τ m,i .In cases of task τ m,i in the local execution mode (i.e., ρ = m), the communication reliability R m,i,ρ biterror is deemed to be 1.Moreover, in cases where task τ m,i is offloading to end device D π or container C h at time instance t (i.e., ρ = π or ρ = h), R m,i,ρ biterror is given by [ where λ m,ρ biterror is the constant bit error rate of the communication link.Unlike bit errors, soft errors may appear during task executions on processors.Let R m,i,ρ softerror denote the capacity of task τ m,i in tolerating soft errors at time instance t, the execution reliability is then expressed as where λ ρ softerror is the constant soft error rate.R m,i,ρ parent is the probability that the correct communication data of all direct parents is successfully delivered to task τ m,i .Combining (1) and (2), the reliability of task τ m,i is hence inferred by

D. Energy Model
The energy dissipation of end devices can be decomposed into static and dynamic components [18], [20].Let P m static be the static power of end device D m , then the static energy consumption during one scheduling horizon T is attained by The dynamic energy consumption of end device D m depends on the execution mode of task τ m,i .In the local execution mode, the dynamic energy consumed by processor Θ m when running task τ m,i is depicted by [18], [20] where ψ m is the effective switch capacitance of the processor Θ m .Meanwhile, end device D m should deliver the output results of task τ m,i to direct successors.Let I m,i child denote a collection of direct successors of task τ m,i , where each element τ m,p ∈ I m,i child has to receive a total amount of η m,i,p communication data from task τ m,i .Consequently, the communication energy of end device D m when delivering the output results of task τ m,i to task τ m,p assumed to be executed on end device where P m d2d denotes the D2D transmission power of end device D m .ϑ m,k represents the D2D communication rate between end devices D m and D k , which can be derived by [21], [22].b m,k is the communication bandwidth, g m,k is the channel gain, and ω stands for the background interferences.The aforementioned data transmission producer will be triggered when the destination (i.e., end device D k ) of task τ m,p is determined.Apart from the local execution mode, task τ m,i can also select the D2D execution mode of offloading its computation to end device D π .Then, the energy dissipation of delivering task τ m,i is calculated as where ϑ m,π is given by b m,π × log 2 (1 + P m d2d × g m,π /ω) [21], [22].The energy dissipation of end device D π when handling offloaded task τ m,i is inferred by where T m,i,π stands for the D2D communication time between edge devices D m and D π for task τ m,i , and it is given by T m,i,π = Υ m,i /ϑ m,π .P π receive is the receiving power of targeting edge device D π .Similarly, the energy dissipation of edge device , where P m remote is the transmission power of end device D m in the remote execution mode [21], [22].

A. Problem Formulation
We are dedicated to minimizing the whole energy dissipation of end devices in end-end-edge collaborated large-scale CPS.We present a problem formulation based on the integer linear programming (ILP) through introducing binary variables α m,i , Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
β m,i,π , γ m,i,h , and δ i,j as follows: where τ n,j is the Our ILP objective function is expressed as In the objective function, E m energy denotes the energy dissipation of end device D m , and it is derived by Meanwhile, the following linear constraints cannot be violated for the sake of generating a feasible task offloading solution.1) Every task should satisfy its deadline constraint.Let T m,i start be the start time of task τ m,i , then we have 2) Every task is performed in only one execution mode 3) Every task should meet the reliability constraint 4) The energy dissipation of every end device is upper bounded.Let E m upper be the threshold on energy consumption of end device D m , then we have 5) The intertask precedence constraint should be met.Let T m,i finish be the finish time of task τ m,i , then we acquire 6) All tasks are executed within their durations with no overlap.Given two tasks τ m,i and τ n,j , and a large enough constant number Z (e.g., 100 000 in our experiments), the following conditions are hence held: and τ m,i = τ n,j , we readily get the following inequalities.

B. Proposed Approach
As presented previously, our studied problem can be expressed as an ILP problem.Although existing ILP solvers (e.g., an open-source ILP solver in [23]) are capable of deriving an optimal solution, they may incur unaffordable time overheads for dealing with large-scale problem instances.Given this dilemma, we develop a reliability-driven heuristic task offloading approach to achieve the goal of energy consumption minimization.As shown in Fig. 1, our approach first exploits the metric of communication-computation-ratio (CCR) to quantify differentiated DAG energy demands.Therefore, the iterative self-organizing data analysis technique algorithm (ISODATA) is utilized to group DAG applications based on their CCR values.At this moment, an individual DAG cluster can be distinguished into a computation-intensive or communicationintensive category.Afterward, for computation-intensive DAG clusters, our approach leverages the incremental control technique to determine energy-efficient task execution modes based on an analysis of computation energy optimality.Meanwhile, for communication-intensive DAG clusters, our approach designs a DAG type conversion technique to reduce the intertask communication overheads.Thanks to this conversion operation, communication-intensive DAG clusters are now inverted to computation-intensive DAG clusters, and thus, their task  offloading solution can be derived by invoking the incremental control-based method.

IV. CCR-GUIDED DAG CLUSTERING
In this section, we first present an observation on DAG heterogeneity in energy demands, and then, exhibit our DAG clustering method followed by the DAG clustering algorithm.

A. Observation on DAG Heterogeneity in Energy Demands
As explored in [24], the heterogeneity of DAG applications can be depicted from multiple perspectives, such as the CCR, DAG size, parallelism factor, etc.Among these indicators, CCR is widely adopted to characterize the DAG application heterogeneity in computation and communication overheads.Specifically, the CCR value of a given DAG application is calculated as the average communication overhead divided by the average computation overhead during its execution on a specific hardware platform.Essentially, this metric implies either the communication energy or computation energy occupies a larger proportion of the whole energy dissipation of a DAG application.As an example, Fig. 2 demonstrates the energy dissipation composition of some representative real-life DAG applications, including fast Fourier transform (FFT), Gaussian elimination (GE), CyberShake, Montage, and LIGO inspiral analysis.As observed, the CCR values of different DAG applications vary significantly.For example, on one hand, the two applications of CyberShake and Montage maintain higher CCR values of 3.5 and 2.7, respectively.On the other hand, the remaining three applications of FFT, GE, and LIGO inspiral analysis attain lower CCR values of 0.6, 0.8, and 1.4, respectively.This observation inspires us to leverage the DAG CCR values to quantify differentiated energy demands in the communication and computation of DAG applications.

B. DAG Clustering Method
We have known that DAG applications have a great distinction in computation and communication energy demands.Inspired by this observation, it is natural to dedicate task offloading methods for the computation-intensive DAG applications and the communication-intensive DAG applications, respectively.We select the popular ISODATA technique to cluster DAGs into multiple groups.Compared with traditional clustering methods, ISODATA is adaptive to the cluster number by introducing novel merging and partitioning mechanisms [25].During every round of grouping elements, the merging mechanism monitors both the size of each cluster and the distance between any two clusters.If the size of a single cluster is small enough, it will be wisely merged into an adjacent cluster.Meanwhile, if the distance between two clusters is less than a particular threshold, they will be combined as a new cluster.On the other hand, the partitioning mechanism keeps track of the number of elements and the average dispersion degree of all elements in an individual cluster.If the number of elements or the average dispersion degree of all elements is large enough, this cluster will be split into two small clusters immediately.We should emphasize that a valid distance criterion should be given in advance to measure the distance between merged elements ahead of running the ISODATA method.In the context of our DAG clustering, we adopt the metric of CCR value as a distance criterion based on the observation on DAG heterogeneity in energy demands, as detailed in the following section.

C. Algorithm of CCR-Guided DAG Clustering
We develop a CCR-guided DAG clustering scheme exploiting the ISODATA technique, as shown in Algorithm 1. Line 1 derives the CCR values of individual DAG applications.For application Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where a 1 and a 2 are the energy dissipation coefficients in respect of computation and communication of the target system, respectively.Line 2 initializes the total number K of clusters.Line 3 selects K clustering centers {w 1 , w 2 , . . ., w K d } in a random manner.Lines 4-6 iteratively allocate each application G m to cluster Φ w k with a smallest CCR difference between CCR m and CCR w k .After the aforementioned initialization operations, our algorithm then enters a loop of finding an optimal clustering solution in lines 7-26.Specifically, line 9 checks whether or not the total number of elements in cluster Φ w k is below a predefined lower boundary.If yes, all elements in cluster Φ w k are assigned to an adjacent cluster and the overall number K d of clusters is decreased (lines 10-12).On the contrary, if the total number of elements in cluster Φ w k is greater than a predefined upper boundary, cluster Φ w k will be partitioned into two small clusters and the overall number K of clusters is accordingly increased (lines [13][14][15][16].Lines 17-20 ascertain whether the distance between any two clusters is closer enough.If yes, lines 21-22 first merge two similar clusters into a new cluster, and then, update the cluster number K.Meanwhile, if the CCR variance of all elements in cluster Φ w k exceeds an allowable threshold, cluster Φ w k will be

V. INCREMENTAL CONTROL FOR TASK OFFLOADING OF COMPUTATION-INTENSIVE DAG CLUSTERS
In this section, we design an incremental control scheme for the task offloading of computation-intensive DAG clusters.

A. Observation on Computation Energy Optimality
Algorithm 1 produces a collection of DAG groups by leveraging the ISODATA clustering technique.Afterward, we can classify each DAG cluster into a computation-intensive category or a communication-intensive category based on its resulting CCR value on average, that is, where At this moment, we observe the optimality of computation energy E(Ω w k ), shown as follows.

B. Incremental Control Algorithm
We learn from Theorem 1 that an optimal computation energy consumption is achieved when the processors with smaller power coefficients are to execute the tasks with larger power coefficients.Inspired by this observation, we put forward an incremental control scheme in Algorithm 2 for the task offloading of computation-intensive DAG clusters.Line 1 sorts all tasks in set Ω w k in the descending order of task power coefficient while maintaining the topological order.Line 2 sorts all processors in set Θ in the order of processor power coefficient from low to high.Lines 3-18 enter into a procedure of iteratively searching for an optimal task offloading solution with the help of proportional-integral-derivative (PID) control techniques.In the procedure, line 3 judges whether or not the termination condition is met.If no, line 4 sets the PID controller for communication energy restriction, i.e., none of the communication energy E m,i,p localcom , E m,i,π d2dcom , and E m,i,h remcom can exceed threshold E limit comm .In this step, E limit comm is updated by [13] where Q 1 , Q 2 , and Q 3 denote the proportional, integral, and derivative coefficients of the PID controller, respectively.R ratio (u) refers to the difference between the desired reliability satisfaction ratio (i.e., 100%) and the number of tasks meeting reliability constraints divided by overall task number at the uth iteration.U 1 denotes the number of iterations during which the integral errors are accumulated.U 2 represents the number of iterations during which the derivative errors are measured.

VI. SWARM INTELLIGENT TASK OFFLOADING FOR COMMUNICATION-INTENSIVE DAG CLUSTERS
In this section, we design a swarm intelligent task offloading scheme for communication-intensive DAG clusters.

A. DAG Type Transformation
We devise a novel DAG type transformation method, named PSOSR, to convert communication-intensive DAG clusters into computation-intensive DAG clusters.This method is inspired by the idea of hybrid algorithm design explored in [26] and [27], and it combines the advantages of the state-of-art version of particle swarm optimization (PSO) [28] and the sequential rounding (SR) [29] in effectively solving complex problems.Let Γ w d = {τ 1 , τ 2 , . . ., τ L } denote a set of all tasks in a communicationintensive DAG cluster Φ w d ∈ Φ, where the index of the end device owning task τ l (1 ≤ l ≤ L) is omitted for easy presentation.Further, let X w d = {X 1 , X 2 , . . ., X L } be a set of binary merging points, where variable X l (1 ≤ l ≤ L) is set to 1 only Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.if task τ l is selected as a merging point.To reduce intertask communication overheads, we restrict that task τ l and all its direct successors not selected as merging points need to execute at an identical destination.
Specifically, the PSOSR method relaxes l merging points to take arbitrary real numbers from [0, 1], i.e., the inequality 0 ≤ X 1 , X 2 , . . ., X l ≤ 1 holds.At this moment, there obviously coexist continuous variables {X 1 , X 2 , . . ., X l } and discrete variables {X l+1 , X l+2 , . . ., X L }. Afterward, the up-to-date PSO variant detailed in [28] is adopted to address the mixed-variable task merging problem.Unlike other PSO variants, this latest PSO technique is characterized by three stages of mixed-variable encoding, hybrid offspring reproduction, and adaptive parameter tuning.In the first stage, the particle position vector (i.e., transformation solution) is divided into two varied segments for encoding continuous and discrete variables, respectively.Consequently, different evolutionary operators can be exploited to separately evolve continuous and discrete variables.In the second stage, two reproduction schemes are designed to create a fraction of the offspring position vector linked with continuous and discrete variables in parallel.Then, a complete offspring particle is readily constructed by putting together the two position vector segments.To exactly determine merging points, we incorporate the SR technique [29] to round continuous variables to either 0 or 1 via comparing with a predefined threshold.Following the aforementioned stages, the third stage concentrates on optimizing critical evolutionary parameters for the next iteration.Fig. 3 depicts an example of our DAG type transformation method assuming that the current DAG cluster only contains one element.

B. Swarm Intelligent Task Offloading Algorithm
Algorithm 3 presents our swarm intelligent task offloading scheme for communication-intensive DAG clusters.Line 1 produces initial feasible particles X = {S 1 , S 2 , . . ., S J }. Line 2 evaluates every initial feasible particle in terms of fitness, which is calculated as the difference between the CCR value of the particle and the predefined CCR threshold.Lines 3-18 enter into an iterative procedure of finding task offloading solutions.Specifically, if the termination condition is not met, line 4 sorts all particles in a descending order.For every particle S j , line 6 randomly selects a total of l j binary variables to relax their value ranges.At this step, particle S j is divided into two segments S j,1 and S j,2 of separately storing continuous and discrete variables.Afterward, line 7 builds an offspring S j,1 child for continuous variables in segment S j,1 by invoking function OffspringContinuous(S j,1 ).Likewise, line 8 produces an offspring S j,2 child for discrete variables in segment S j,2 by calling function OffspringDiscrete(S j,2 ).Line 9, therefore, generates a complete offspring S j child by combining S j,1 child and S j,2 child .Line 10 rounds continuous variables in segment S j,1 by using SR function SR(S j child ).Line 11 checks whether or not the current CCR value of task set Γ is below a predefined CCR threshold when task merging solution S j child is applied to task set Γ.If yes, flag Δ j is set to true.Line 13 compares offspring S j child and parent S j by exploiting comparison function Compare(S j child , S j ).If offspring S j child is superior to parent S j , lines 14-15 replace parent S j with offspring S j child and adjust evolutionary parameters of particle S j , respectively.After all particles are examined, line 16 selects a particle S opt with best fitness from swam X by using selection function Select(X ).Accordingly, line 17 derives task power coefficients by using function PowerFactor(Γ, S opt ).In this step, the task and all its direct successors not selected as merging points are constructed as a big task τ l , and their communication energy overheads are equal to zero [see ( 6

VII. SIMULATION
In this section, we conduct simulation experiments to validate the effectiveness of our solution for synthetic applications.

A. Simulation Settings
In simulation settings, the operating frequency f m of the processor Θ m is randomly selected from [500, 1500] MHz [9].The Dell PowerEdge R930 server [30] equipped with an Intel Xeon E7-8894 24-core processor is selected as the edge server.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
We suppose that a total of 15 containers are installed on our edge server and the operating frequency of one container is randomly selected from [2000, 4500] MHz.The average fault arrival rates of individual processors and containers are both assumed to be in [4 × 10 −6 , 7 × 10 −5 ] according to their computation capacities [31], [32].The bit error rate of one communication link is chosen from [2 × 10 −6 , 5 × 10 −5 ] [13].The D2D transmission and receiving power of a single end device falls into [200, 1000] and [100, 800] mW, respectively [5].The D2D communication bandwidth between any two end devices varies from 20 to 100 MHz [5].The transmission power of an arbitrary end device in the remote execution mode is selected from [600, 1500] mW, and its communication bandwidth to the destination container is picked from [100, 400] MHz.We leverage the tool TGFF [33] to generate diverse synthetic DAG applications with the task number varied in [20,300].Accordingly, a set of heterogeneous DAG applications with varied CCR values is readily produced.The reliability goal of each DAG application is randomly chosen from [0.70,0.9999].The proportional, integral, and derivative coefficients of our PID controller are set to 0.5, 0.005, and 0.1, respectively [13].
As summarized in Table I, we conduct energy optimization for large-scale CPS with joint considerations of task latency, task reliability, task dependence, and end-end-edge collaboration.We pick the following representative benchmarking strategies that have the most similar concerns to our problem.
1) ELYO [5] exploits the Lyapunov optimization technique to achieve time-average energy dissipation minimization for independent tasks.It takes both task local and D2D execution modes into account but ignores the task remote execution mode provided by edge servers.2) ELDM [7] leverages the Lagrange duality method to decide one of the three execution modes for every task, thereby minimizing the system overall energy consumption.neither the intertask dependencies nor the task reliability demands are considered.3) ELGT [9] is a game theoretic task offloading algorithm to jointly minimize the energy optimization and processing latency of independent tasks with the help of end-end-edge collaboration.However, it fails to consider both the toleration of soft errors during task executions and the occurrence of bit errors during intertask communication.4) ERPS [15] aims at minimizing the energy consumption of dependent real-time tasks by using processormerging and slack time reclamation methods.It considers the toleration of soft errors during task executions but neglects the occurrence of bit errors during intertask communication.Moreover, the computation paradigm of end-end-edge collaboration is not used during energy optimization.5) EILP utilizes an open-source ILP solver [23] to tackle the ILP problem of energy minimization formulated in Section III-A.As mentioned earlier, this scheme yields a globally optimal task offloading solution, but is highly likely to incur huge runtime overheads.

B. Simulation Results
In the comparative study, we conduct a total of 100 experiments to obtain averaged evaluation data.Table II exhibits the energy dissipation and the corresponding energy savings achieved by our approach when running synthetic DAG applications.On one hand, we observe that our approach significantly reduces the whole energy consumption of end devices, especially when more end devices are involved in the task offloading process.On the other hand, we also see that our approach is inferior to benchmarking algorithm EILP, with 9.86% degradation on average in terms of energy consumption.
Table III lists the runtime overheads of task offloading algorithms and the resultant speedup attained by our approach when running synthetic DAG applications.The results in this table confirm the effectiveness of our approach in shortening the runtime spent on deriving desirable task offloading solutions.In addition, the results also reflect that the runtime overheads of our approach tend to grow slowly rather than rapidly as the number of end devices increases.This is mainly because the incorporation of ISODATA technique into our approach empowers the parallel searching for task offloading solutions of different DAG clusters.
We further explore the schedulability of six task offloading algorithms.The schedulability of an algorithm is the ratio of the number of DAG application instances satisfying specific constraints to the total number of DAG application instances under test (i.e., 100 in our experiments).Table IV shows the results when only considering the task dependence or deadline constraints of synthetic DAG applications.As observed, when only considering the intertask dependence constraints, the schedulability of benchmarking algorithms ELYO, ELGT, and ELDM is merely 43%, 59%, and 42% on average, respectively.When only considering the task deadline constraints, the schedulability of benchmarking algorithms ELYO, ELGT, ELDM, and ERPS is 38%, 53%, 36%, and 65% on average, respectively.This is because the four benchmarking algorithms are customized for the latency-aware task offloading of a single application while neglecting the concurrent task offloading of individual applications with distinct timing requirements.Conversely, our approach and benchmarking algorithm EILP always achieve 100% schedulability when imposed either the intertask dependence or task deadline constraints.
Similarly, Table V presents the schedulability of six task offloading algorithms when considering the task reliability and all constraints of synthetic DAG applications.As observed, when benchmarking algorithms ELYO, ELGT, and ELDM are all imposed on the task reliability constraints, they inevitably suffer from undesirable reliability violations (up to 44%) due to the neglect of tolerating both bit errors and soft errors.Although benchmarking algorithm ERPS can handle the soft errors during task executions, it ignores the tolerance of bit errors during intertask communication, thereby resulting in a lower task overall reliability.Further, when considering all design constraints (i.e., intertask dependence, task deadline, and task reliability constraints), benchmarking algorithms ELYO, ELGT, ELDM, and ERPS cannot realize 100% schedulability.In contrast, our approach and benchmarking algorithm EILP Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.always maintain 100% schedulability under varied settings of the number of end devices.However, compared with EILP, our approach can reduce the algorithm runtime overheads by a factor of 139.12 on average while achieving striking energy savings of end devices, as shown in Tables II and III.

VIII. FURTHER INVESTIGATION
We further investigate the effectiveness of our approach when running real-life DAG applications in CPS environments.

A. Real-Life Application Description
A collection of real-life DAG applications drawn from [24], [34], [35], [36], [37], and [38] are tested in this section, including the FFT, GE, mpegplay, madplay, tmndec, toast, Montage, CyberShake, Epigenomics, LIGO inspiral analysis, molecular dynamics code, Sipht, in-tree, out-tree, mean value analysis, Laplace equation solver, fork-join, LU-decomposition, face recognition, AIRSN, Chimera, navigator, SignalGuru, and Twit-terSentiment.These benchmarks cover a wide spectrum of CPS applications, and hence, facilitate a comprehensive investigation on our approach and benchmarking algorithms.Fig. 4 illustrates the structures of partial DAG applications, while the structures of other DAG applications can be found in their original study.For every topological layer in a single DAG application, the number of tasks it contains can fluctuate to accommodate actual requirements.Hence, for each type of individual DAG applications, we randomly select the overall task number from [50, 500] such that a plentiful of DAG application variants with varied CCR values are, hence, constructed.Similar to synthetic DAG applications, the reliability goal of each real-life DAG application is also randomly chosen from [0.70,0.9999].
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

B. Investigation Results
We first investigate the energy consumption of six task offloading methods when running real-life DAG applications.We observe from Table VI that our approach achieves 33.40%, 23.84%, 21.10%, and 51.48% energy savings on average compared with peer algorithms ELYO, ELGT, ELDM, and ERPS, respectively.More importantly, our approach exhibits an accelerating trend in terms of energy savings as the number of end devices in the system increases.The main reason is that our approach fully exploits the heterogeneity of DAG applications in energy demands, whereas the differentiated energy requirements of DAG applications are largely ignored by all peer algorithms.In addition, we witness that the energy consumption of our approach is 8.56% higher on average than that of benchmarking algorithm EILP.This is because benchmarking algorithm EILP derives globally optimal computation offloading solutions by exploiting the ILP technique.
We then investigate the runtime of six task offloading rithms when running real-life DAG applications.As demonstrated in Table VII, our approach achieves 27. 24, 33.22, 31.20, 36.02, and 123.47 times of runtime speedup on average compared with peer algorithms ELYO, ELGT, ELDM, ERPS, and EILP, respectively.As aforementioned, the reason is that our approach can search for energy-aware reliabilityensured task offloading solutions for differentiated DAG groups in parallel with the help of ISODATA clustering technique.
We finally investigate the schedulability of six task offloading algorithms when running real-life DAG applications and summarize the comparison results in Tables VIII and IX.Clearly, the results are consistent with the observations for synthetic DAG applications.That is, except for our approach and benchmarking algorithm EILP, all other algorithms (i.e., ELYO, ELGT, ELDM, and ERPS) cannot guarantee 100% schedulability for real-life applications.Meanwhile, our approach yields a better tradeoff between the holistic energy savings of end devices and the runtime overheads compared with benchmarking algorithm EILP.

IX. CONCLUSION
This article aimed to address the problem of energy minimization under DAG timing, precedence, and reliability constraints in end-end-edge collaborated large-scale CPS.To this end, our approach first utilized a clustering method to distinguish differentiated energy demands of DAG applications.Then, our approach developed a PIDbased task offloading scheme for computation-intensive DAG clusters and a PSOSR-based task offloading scheme for communication-intensive DAG clusters.In the future work, we plan to extend the current study from the following three aspects.
1) Integrate the popular dynamic voltage and frequency scaling technique into end devices for energy optimization.2) Consider the modern end devices powered by renewable generations, e.g., solar energy.3) Investigate the approximate computing requirements of DAG applications.

APPENDIX
For the sake of easy presentation, let θ denote the sorted processor-dependent parameters in ascending order, i.
By comparing the computation energy consumption E(Ω w k ) and E swap (Ω w k ), we obtain (35) due to the optimality assumption of energy consumption E(Ω w k ), that is, Considering the inequality i < j holds, we deduce that Y i should be no less than Y j , i.e., the inequality Y i ≥ Y j holds.Based on the aforementioned analysis, we can iteratively exchange the position of any two elements in matrix Y.In every iteration of the element position swapping, it is clear that the inequality Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Y i ≥ Y j holds for i < j.At the end of the whole swapping procedure, the inequality p} is a set utilized for capturing the task dependence and communication data Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Algorithm 1 :
CCR-Guided DAG Clustering Scheme.G m , its CCR value CCR m is estimated by

Algorithm 2 :
Incremental Control Algorithm for Task Offloading of Computation-Intensive DAG Clusters.split into two small clusters in lines 23-26.The entire algorithm ends up outputting the final clustering result Φ in line 27.
)].Since the DAG type transformation has been finished right now, line 17 calls Algorithm 2 to produce offloading solution O w d for task set Γ w d .When the termination condition is satisfied, line 19 exists after outputting solution O w d .Similar to Algorithm 2, Algorithm 3 can also be performed in parallel for communication-intensive DAG clusters, thereby saving the time overheads in producing an energy-efficient task offloading solution.

TABLE I COMPARISON
OF OUR WORK WITH RELATED WORKS IN LITERATURE the DAG cluster Φ w k is naturally classified into the computation-intensive category; otherwise, it is, therefore, classified into the communicationintensive category.Suppose that Φ w k is a computation-intensive DAG cluster and Ω w k = {τ 1 , τ 2 , . . ., τ φ } is a collection of all tasks in DAG cluster Φ w k .We neglect the computation energy analysis of those tasks selecting the remote execution mode since they only incur communication energy consumption of end devices.Let D(Φ w (5) = {D 1 , D 2 , ..., D |Φ w k | } be a group of end devices whose tasks are in DAG cluster Φ w k .Then, let Ω w k = {τ 1 , τ 2 , ..., τ φ } be a set of the tasks selecting either the local or D2D execution mode.Let τ ε (1 ≤ ε ≤ φ ) denote the εth element in task set Ω w k , and the index of the end device owning task τ ε is omitted for easy presentation.According to(5), the computation energy of task subset Ω w k is derived by Ω m ⊆ Ω w k represents a group of real-time tasks that are locally executed or offloaded to be completed on end device D m .Furthermore, let θ = [θ 1 , θ 2 , . . ., θ |Φ w k | ] represent a vector storing processor-dependent parameters, where θ m = ψ m × v 2 m refers to as the power coefficient of processor Θ m .Similarly, let Y = [Y 1 , Y 2 , . . ., Y |Φ w k | ] represent a vector storing taskdependent parameters, where Y m = τ ε ∈Ω m (μ ε × W ε ) refers to as the power coefficient of task subset Ω m .Obviously, E(Ω w k ) can be rewritten as the product of processor-dependent parameters and task-dependent parameters, i.e., E(Ω w k

TABLE II ENERGY
DISSIPATION ACHIEVED BY TASK OFFLOADING ALGORITHMS WHEN RUNNING SYNTHETIC APPLICATIONS TABLE III RUNTIME OVERHEADS OF TASK OFFLOADING ALGORITHMS WHEN RUNNING SYNTHETIC APPLICATIONS

TABLE IV SCHEDULABILITY
WHEN ONLY CONSIDERING TASK DEPENDENCE/DEADLINE CONSTRAINTS OF SYNTHETIC APPLICATIONSAuthorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE V SCHEDULABILITY
WHEN CONSIDERING TASK RELIABILITY AND ALL CONSTRAINTS OF SYNTHETIC APPLICATIONS Fig. 4. Structure of partial real-life DAG applications under test.

TABLE VI ENERGY
DISSIPATION ACHIEVED BY TASK OFFLOADING ALGORITHMS WHEN RUNNING REAL-LIFE APPLICATIONS TABLE VII RUNTIME OVERHEADS OF TASK OFFLOADING ALGORITHMS WHEN RUNNING REAL-LIFE APPLICATIONS

TABLE VIII SCHEDULABILITY
WHEN ONLY CONSIDERING TASK DEPENDENCE/DEADLINE CONSTRAINTS OF REAL-LIFE APPLICATIONS

TABLE IX SCHEDULABILITY
WHEN CONSIDERING TASK RELIABILITY AND ALL CONSTRAINTS OF REAL-LIFE APPLICATIONS