Scheduled Maintenance on August 31st, 2016:
IEEE Xplore will undergo system maintenance from 1:00 - 3:00 PM EDT. During this time there may be intermittent impact on performance. We apologize for any inconvenience.
By Topic

IEEE Quick Preview
  • Abstract



The engineering research field of cyber–physical systems (CPSs) has drawn a great deal of attention from academia, industry, and the government due to its potential benefits to society, economy, and the environment [1]. As a whole, CPSs refer to the next generation of engineered systems that require tight integration of computing, communication, and control technologies to achieve stability, performance, reliability, robustness, and efficiency in dealing with physical systems of many application domains [2].

Even though the specific context of problems and challenges of today's CPSs is different from those in the past, the basic goal of developing control systems through integration of technologies from computing and communication has roots that go back nearly a century. For example, at the time of World War II, the development of automatic antiaircraft guns was one of the most important and challenging problems that required tight integration of technologies from the mechanical, electrical, electronics, and communication fields [3], [4]. In a much broader sense, we may also interpret CPSs as physical systems controlled or manipulated in a principled manner through engineering technologies. With such an interpretation, the history of CPSs can easily be traced back to the Industrial Revolution sparked by the development of the steam engine governor in the 18th century. Hence, we can view and understand the emergence of today's CPSs as a continuation of technological evolution that started from the early uses of feedback control technologies.

Over the last several decades, the advancements in computing and communication technologies have been so significant that we now refer to them as having collectively given rise to an information technology (IT) revolution. In fact, every aspect of today's individual, social, industrial, and economic activities are highly dependent on such cyber–system technologies. In particular, the Internet has changed the way we interact and communicate with each other and also how we create, distribute, and consume information. Continuing this trend, the advent of ubiquitous embedded computing, sensing, and wireless networking technologies are becoming the key enabling technologies for how we interact, control, and build physical engineered systems such as automobiles, aircrafts, power grids, manufacturing plants, medical systems, and building systems, on which our modern society and economy are becoming highly dependent.

The potential benefits of the convergence of computing, communication, and control technologies for developing next-generation engineered systems that can be called CPSs are transformative and wide ranging. Through real-time embedded systems for distributed sensing, computation, and control over wired or wireless communication networks, multiobjective optimization, high-level decision-making algorithms, and formal verification technologies, engineered systems in many societally critical application domains such as transportation, energy, and medical systems can be designed and developed to be much more smart, reliable, secure, efficient, and robust. Needless to say, there are many challenges ahead that need to be addressed in the future. These efforts will have to span all the constituent fields.

The spectrum of research fields relevant to CPSs is very broad. This overview paper is not an exhaustive survey that covers every aspect of CPS research, and is necessarily limited by the knowledge of the authors. In Section II, we review the history of control, communication, and computing technologies leading to CPSs. Then, we review recent achievements in many research disciplines. In Section III, we review research advances in selected areas, networked control systems and hybrid systems, which constitute some of the theoretical foundations for design and analysis of the dynamical behavior of CPSs. In Section IV, we discuss theories and technologies vis-a-vis real-time computing and networking. In Section V, we review fundamental theoretical results and implementation platforms for wireless sensor networks. In Section VII, we discuss the design and development of CPSs from the software engineering point of view. In Section VIII, we conclude by envisioning opportunities and challenges in some domains.



Computers were originally invented to perform computation. The first computer ENIAC [5] was constructed in 1946 to perform ballistic calculations. However, computers subsequently began to be used to close control loops around physical systems. This motivated the development in 1973 of real-time computation [6], [7], which involved the problem of how to schedule computational tasks so that every job in every task was completed before its deadline. This constituted a significant shift in the usage of computers. If performing calculations correctly was the only purpose, then all one needed was to ensure the order of computations. There is no need to deal with physical time. However, if one is interfacing a computer with a physical plant, then the time by which computations are performed is important. So, already by this time, there was interest in CPSs, though the name itself was to be invented much later.

In the 1990s, there began to appear much greater interest in the interaction between computational and physical systems [8]. Specifically, with the physical plant modeled by differential equations, and the computational systems modeled by finite state machines or other discrete models of computation, the interest centered on how the interaction of the two evolved. This field was called hybrid systems, reflecting the composite nature of the overall system.

Around 2006, researchers, predominantly in real-time systems, hybrid systems, and control systems, coined the name “cyber–physical systems” to describe this increasingly important area at the interface of the cyber and physical worlds.

There are several other paths also leading to this area of interest. From its origins as ARPANET [9] in 1969, the Internet developed into a worldwide network connecting computers. Around 1973 was the beginning of the cellular telephony revolution. Also around 1971 the ALOHA network was developed to interconnect users across the Hawaiian islands with a mainframe computer in Oahu [10]. Its pioneering ideas, concerning how to resolve contention of the shared wireless medium, were used in Ethernet as well as packet radio networks. In 1977, the U.S. Defense Advanced Research Projects Agency (DARPA) tested the PRNET packet radio network [11]. In 1978, the U.S. Army deployed the Single Channel Ground and Airborne Radio System (SINCGARS) packet radio system [12]. Subsequently, in 1997, the IEEE 802.11 WiFi standard was developed and proliferated across offices and homes after the introduction of IEEE 802.11b [13]. All this, including the landline telephone network, have led to a communication revolution. The goal of interconnecting computers to form a communication network has played a central role in ALOHA, the Internet, and WiFi. Thus, we see here the convergence of communication and computation.

Around 1998, a new element was added to the mix—sensing, with the development by the Smart Dust project [14] of a mote, a tiny device capable of sensing, communication, and computation. These motes allowed the attachment of sensors to the nodes, bringing information about the physical environment into the interconnected wireless communication network of computational nodes.

When nodes in a communication network are connected to both sensors and actuators, one obtains a networked control system. Thus, again, we see an evolutionary path from communication and computing to, in this case, networked CPSs.

There is yet another path that one can trace to the present interest in CPSs. In the modern electronic era, the first generation of control systems, analog control, was based on the operational amplifier [15]. To use this technology, a theoretical framework was needed. The appropriate framework was the frequency domain approach, developed by Nyquist [16], Bode [17], Evans [18], and others. This also led to CPSs—though based on analog computation. One can regard Ziegler–Nichols tuning rules [19], for example, as methods to adjust the overall CPS to achieve desired behavior. Already, by 1954, there was beginning to emerge the second generation of control—digital control [20]. This was spawned by the development of the digital computer. Now simple calculations on algorithms could be performed on the measured signals before closing the loops. This too required a theoretical framework; the appropriate one in this case was the state–space approach. This was developed by Bellman [21], Pontryagin [22], Kalman [23], [24], [25], [26], [27], and others under the leadership of Solomon Lefschetz at the Martin Company's Research Institute for Advanced Study in Baltimore which was founded in 1955. This led to a very strong foundation of systems theory, with a thorough investigation of optimal control [28], stability [29], linear systems [30], nonlinear control [31], stochastic systems [32], adaptive control [33], robust control [34], infinite-dimensional systems [35], decentralized control of complex systems [36], discrete event systems [37], and even attempts at integrating automata theory and control [38].

Digital control is more than 50 years old, and in the intervening years there have been dramatic advancements in the power of computers as well as the proliferation of embedded computers. There has also been enormous growth in the complexity of software and in the programming abstractions that have been developed for building them. Finally, wireline and wireless data networks were nonexistent 50 years ago. Thus, the emergence of networked CPSs is leading to a third generation of control systems. There has been evolution in the technology of control system implementations on distributed systems. In process control, the controller area network (CANBus) [39] has been used to provide the underlying communication network for distributed control systems. There has also been developed the Field Bus system [40] for interconnection. There is also interest in the “Internet of Things,” where physical objects are assigned addresses and interconnected with each other, with interest therefore focused on the communication–physical system interface. All this, together, constitutes yet another platform revolution. At such a time of platform revolution, it is necessary to examine both mechanisms as well as policies. By “mechanisms,” we mean how to implement a system, while by “policies,” we mean what to implement, for example, which control law.

There is also a great impetus from the viewpoint of applications of societal interest to develop more complex control systems featuring sensing, actuation, and computation capabilities connected by a communication network. There is an increasing demand for more and better transportation systems, energy systems, healthcare systems, and water systems, across all segments of the planet. Due to these demands, as well as the increasing awareness of the resource limitations of the planet, the 21st century could well be the age of building large systems. Many if not most or all of these systems will be composed of complex CPSs.

All these trends—the convergence of several disciplines, the evolution of technology in various fields, and the increasing need to build large scale systems to meet the burgeoning societal needs in an environment of resource frugality—have led to great research interest in the issues sought to be captured by the phrase of CPSs [1].



The dynamics of CPSs is complex, involving the stochastic nature of communication systems, discrete dynamics of computing systems, and continuous dynamics of control systems. In this section, we review recent theoretical results on modeling and analysis of dynamical behavior of CPSs from different points of view.

A. Networked Control Systems (NCS)

One of the fundamental characteristics of today's CPSs is the existence of a communication network mediating between and among computing and physical entities as shown in Fig. 1. The interactions between controller and the physical system can therefore experience network-induced delay. Packets can even occasionally be lost. The network's links can be regarded as communication channels that are subject to data rate constraints. Hence, some of the fundamental questions that are of importance for networked control systems are as follows. 1) How do the network-induced delay, packet loss, and communication channel affect the stability of the system? 2) Under what conditions is an NCS stabilizable, and how does one stabilize it?

Figure 1
Fig. 1. Structure of networked control systems.

The first issue is when to sample a physical system. The traditional approach is to sample it periodically or at predetermined instants. An alternative is to sample it when specific events occur, e.g., when a signal crosses a level. These have been called Riemann and Lebesgue sampling [41]. The latter approach requires continuous monitoring of the system to detect when to sample it. An alternative is to decide a safe interval for which the system can be left unsampled and an appropriate time to sample it next. This is called self-triggering and can lead to more efficient monitoring as well as usage of resources, and even be used to guarantee stability based on some knowledge of the plant [42], [43], [44].

To study the effect of network-induced delay, consider an NCS modeled as consisting of a linear continuous time plant and a controller exchanging data packets over a lossless communication network that is shared with other unrelated nodes [45], [46]. Define the network-induced error Formula$e(t):=[ \mathhat{y}(t)\,\mathhat{u}(t)]^{T}-[y(t)\,u(t)]^{T}$ where Formula$y(t)$ is the output of a plant, Formula$u(t)$ is the output of a controller, and Formula$\mathhat{y}(t)$ and Formula$\mathhat{u}(t)$ are the most recently received versions of Formula$y(t)$ and Formula$u(t)$, respectively. If there is no network-induced delay between plant and controller, then Formula$\mathhat{y}(t)=y(t)$ and Formula$\mathhat{u}(t)=u(t)$ and so Formula$e(t)=0$ for all Formula$t$. A network scheduling strategy, called maximum-error-first with try-once-discard (MEF-TOD), which dynamically assigns the packet transmission order among nodes to share the network is proposed in [45] and [46]. The notion of maximum allowable transfer interval (MATI) is introduced to bound the amount of time between transmission events and derive a sufficient condition in terms of MATI for stability of the NCS.

Another approach to the stability analysis of an NCS [47] is by using hybrid systems analysis techniques [48]. As a model of an NCS, consider a plant Formula$\mathdot{x}(t) = Ax(t)+Bu(t)$ for Formula$t\in[kh+\tau,(k+1)h+\tau)$, and a state feedback controller Formula TeX Source $$u(t^{+})=-Kx(t-\tau), t\in\{kh+\tau:k=0,1,\ldots\}\eqno{\hbox{(1)}}$$ where Formula$h$ is the sampling period, Formula$\tau$ is the fixed network-induced delay that is the sum of the delays from sensor to controller and controller to actuator, and Formula$u(t^{+})$ is piecewise continuous changing values only at Formula$kh+\tau$. Stability is guaranteed if the following matrix has all its eigenvalues inside the unit disk: Formula TeX Source $$H=\left[\matrix{e^{Ah}&-E(h)BK\cr e^{A(h- \tau)}&-e^{A\tau}\left(E(h)-E(\tau)BK\right)}\right]\eqno{\hbox{(2)}}$$ where Formula$E(a):=\int_{0}^{a}e^{A(a-s)}ds$. Instead of (1), one can consider a state feedback controller that uses an estimated plant state Formula$\mathhat{x}(kh+\tau)$ to compute a control input at Formula$kh+\tau$ [49].

A more general framework for stability analysis of the NCS is to consider a nonlinear NCS with disturbance and also a general class of network scheduling protocols, called Lyapunov uniformly globally exponentially stable (UGES) protocols [50]. Both the round-robin (RR) static scheduling protocol and the MEF-TOD dynamic scheduling protocol considered in [45] turn out to be Lyapunov UGES protocols. Moreover, the input–output Formula${\cal L}_{p}$ stability of the NCS for Lyapunov UGES protocols is shown in [50] based on the small gain theorem.

A data packet that is transmitted, especially over wireless, can be dropped. One way to model packet loss [51] is as an asynchronous sample and hold switch which closes with a certain rate Formula$r$. The NCS with packet loss can then be modeled as an asynchronous dynamical system (ADS) incorporating both discrete and continuous dynamics, and its stability analyzed through Lyapunov-based analysis. Lower bounds on the transmission rate Formula$r$ needed for stability can be obtained [52].

The stabilization of an NCS over a channel that is prone to packet drops can be addressed through robust control analysis and synthesis techniques [53]. An NCS can be viewed as a feedback interconnection of a deterministic nominal system, denoted as Formula$G$, and a zero-mean stochastic structured model uncertainty, denoted as Formula$\Delta$. The stability problem can then be formulated as a linear matrix inequality (LMI) feasibility problem. Using the notion of mean square structured norm of Formula$G$, denoted by Formula$\mu_{\rm MS}(G,\Delta)$, the controller design problem for stabilizing an NCS can be posed as an optimization problem Formula TeX Source $$\eqalignno{\mu_{\rm MS}^{\ast}(G,\Delta)=&\,\inf_{K-{\rm stab,LTI}}\mu_{\rm MS}(G,\Delta)\cr=&\,\inf_{\theta\,>\,0,{\rm Diag.}}\inf_{K-{\rm stab,LTI}}{\Vert\theta^{-1}G\theta\Vert}_{\rm MS}^{2}&\hbox{(3)}}$$ where the infimum is taken over all stabilizing LTI controller Formula$K$ for the given feedback interconnection of Formula$G$ and Formula$\Delta$. However, it turns out that the search for the controller Formula$K^{\ast}$ with the largest stability margin Formula$\mu_{\rm MS}^{\ast}(G,\Delta)$ is nonconvex with respect to the parameter Formula$\theta$. Hence, the optimization problem (3) is intractable in general. However, it is shown in [53] that, for any fixed Formula$\theta>0$, the optimization problem (3) can be converted into an equivalent LMI optimization problem and the optimal controller Formula$K^{\ast}$ can be determined through it.

The problem of state estimation over a lossy communication link corresponds to a filtering problem with intermittent observations [54]. More explicitly, the plant can be modeled by a discrete time linear Gaussian system Formula$x_{t+1}=Ax_{t}+w_{t}$, where packets Formula$y_{t}=Cx_{t}+v_{t}$ arrive with probability Formula$(1- \alpha)$ as a Bernoulli process, and Formula$w_{t}$ and Formula$v_{t}$ are independent identically distributed (i.i.d.) Gaussian random vectors. If the matrix Formula$C$ is invertible, then for a stable Kalman filter, it is necessary that the packet drop probability Formula$\alpha\ <\ (1/(\max_{i}\vert\lambda_{i}(A)\vert)^{2})$, where Formula$\lambda_{i}(A)$ are the eigenvalues of Formula$A$.

One can formulate the control of NCSs as an optimal control problem for LTI systems over a lossy communication link, with an uplink from sensor to controller, and a downlink from controller to an actuator that is collocated with the plant [55], [56]. A fundamental problem that arises when there are packet drops between the controller that computes a potential value of control to be applied, and an actuator that actually applies control inputs, is the resulting nonclassical information pattern which renders very difficult the computation of the optimal control law under a linear quadratic control framework [57]. This difficulty disappears when the network protocol is a TCP-like protocol, i.e., a notification of successful reception is available. Then, there is indeed separation of estimation and control [58]. A sufficient condition for the stabilizability of an NCS is Formula$\max\{\alpha,\beta\}\ <\ (1/(\max_{i}\vert\lambda_{i}(A)\vert)^{2})$, where Formula$\alpha$ and Formula$\beta$ are critical values of drop probabilities for uplink and down link.

In the NCS, it is also important to determine where in the network to perform calculations required by the control law. Under some conditions, the optimal placement of a controller in the NCS is to collocate it with the actuator [59]. Also, the above condition is then a necessary and sufficient condition for the existence of a stabilizing controller in the presence of packet drops, even when the matrix Formula$C$ is not invertible.

Another important issue is how the presence of the data-rate limited communication channels affects the stabilizability of the system. An early precursor [60] considers the problem of optimal control with respect to a long-term average quadratic cost criterion of a linear Gaussian system. The channel is modeled as one that appropriately delays finite length codewords. It is shown that when the encoder codes the innovations process of the state estimate rather than the state itself, then there is a separation theorem and the optimal control is linear in the state estimate. More recently, there has been increasing attention paid to the problem of stabilizing a linear system when some of the feedback loop has to be closed over a communication channel of limited data rate. In an early work [61], the plant is modeled as a linear deterministic continuous system. The communication channel's limited data rate is modeled by assigning a long time delay, proportional to the number of bits that are sought to be communicated. An instantaneous output measurement taken at a certain time is simply quantized by a symbol from a finite alphabet. The decoder can however choose an appropriate control, from a finite set, based on the past history of all encoded measurements received. An unstable system is not asymptotically stabilizable, and an appropriate notion of containability related to the ability to keep the system in an open sphere around the origin when it is started close enough to the origin is introduced. It is shown that an inequality relating the rate of change of the system and the data rate is sufficient for containability. Another early paper [62] considers a scalar plant with a channel capable of transmitting Formula$R$ bits per second without noise, and shows that in order to keep the trajectory bounded when sampling uniformly it is necessary for the rate Formula$R$ to exceed a multiple of the logarithm of the absolute value of the unstable eigenvalue. In [63], the problem of quantization is studied where the sensitivity (i.e., fineness) of the quantizer is varied within a bounded neighborhood of the origin. In [64], it is shown that for quadratic stabilizability, the optimal sampling time depends on the sum of the logarithms of the magnitude of the unstable eigenvalues, and that the optimal quantization levels are logarithmic. In [65], a discrete linear system is considered, and the channel is modeled as being able to transmit Formula$R$ bits perfectly in each second, i.e., as a bit pipe. It is shown that for asymptotic stabilizability it is necessary that the data rate exceeds the sum of the logarithms of the magnitudes of the unstable eigenvalues of the system matrix. When the encoder has access to past control inputs that were applied, some of the complications caused by information patterns, as in [57], do not arise, and it is shown that such a rate is also sufficient for asymptotic stabilizability. A companion paper [66] considers the case of noisy channels, and a similar necessary condition is shown on the rate, defined in a Shannon-theoretic sense, for almost sure asymptotic stabilizability. For certain channels with erasures, and when past control inputs are available to the encoder, this rate is also shown to be sufficient. In [67], a deterministic scalar autoregressive-moving-average (ARMA) system with a random initial condition is considered. The channel is modeled as a bit pipe, and a necessary and sufficient condition is obtained on the rate to ensure that the Formula$m$th moment of the state is driven to zero. The minimum data rate needed for mean-square stabilizability of a system with both state and observation noises in treated in [68]. A dynamic quantizer is used to account for possible unbounded values of the state. There has also been attention to the case of bit pipes where the data rate varies randomly. In fact, the case of dropped packets can be regarded as a special case where the rate can be zero. The case of i.i.d. rate variation is considered in [69] and [70]. The case of a channel which changes between 0 and a certain rate as a two-state Markov chain is considered in [71]. The case of a more general Markovian channel rate evolution is examined in [72]. The concept of anytime capacity is introduced in [73] to capture a noisy communication channel when it is used as part of a feedback loop to stabilize an unstable linear system. Again, for scalar systems, the required rate is larger than the logarithm of the unstable systems' gain. The issue of coding for noisy communication channels when they are used to close control loops is examined in [74], [75], [76], [77]. The case when the channel noise is Gaussian is simpler because uncoded transmission can be used [78]; this problem is connected to the problem of communication in the presence of feedback.

Further results on the NCS can be found in [79], [80], [81] on optimal control over a communication channel, in [82] and [83] on NCS with sampling and delay, in [84] and [85] on stability and control analysis of the NCS through delayed differential equation framework, in [86] on wireless control network where the entire network itself acts as the controller, in [87] and [88] on decentralized control problems, and the references in [72] and in [89] and [90] for a survey of this field.

B. Hybrid Systems

CPSs are typically required to adapt to various changes in internal and external factors. One way of adaptation is through “switching” between different operation “modes” which results in a switched system. The class of systems with switching can be described by Formula$\mathdot{x}=f_{\sigma}(x)$, where Formula$\sigma:[0,\infty)\rightarrow{\cal P}$ is a piecewise constant function of time, called a switching signal, and Formula${\cal P}$ is some index set [91]. The stability of such systems has been studied, and recent results can be found in [91] and [92] and the references therein.

A more general modeling framework for CPSs is hybrid automaton (HA), which can be used to model complex dynamics of CPSs through various mathematical formalisms [93], [94], [95], [96], [97], [98] that can capture both the transition between discrete states and also the evolution of continuous states over time. One useful HA model developed for algorithmic verification of CPS has the following components [94].

  • A finite directed graph Formula$\langle V,E\rangle$ where each Formula$v\in V$ is called a control mode or a location, and each edge Formula$e\in E$ is called a control switch.
  • A finite set of continuous real-valued variables Formula$X=\{x_{1}, \ldots,x_{n}\}$. The first derivative of Formula$X$ is written as Formula$\mathdot{X}=\{\mathdot{x}_{1},\ldots, \mathdot{x}_{n}\}$ and Formula$X^{\prime}=\{x_{1}^{\prime},\ldots,x_{n}^{\prime}\}$ is used for the value of Formula$X$ at the conclusion of a discrete change.
  • Two edge labeling functions guard and reset that assign to each Formula$e\in E$ predicates of variables from Formula$X\cup X^{\prime}$ to indicate a discrete transition condition and a reinitialization of a continuous variables.
  • Three vertex labeling functions init, invariant, flow to indicate an initial, invariant, and flow condition for each Formula$v\in V$. Both init(v) and invariant(v) are predicates with variables from Formula$X$, while flow(v) is a predicate with variables from Formula$X\cup\mathdot{X}$ to describe the dynamics of Formula$X$ within a mode.

A simple example of an HA is Fig. 2 in which Formula$V=\{{\rm on,off}\}$ and Formula$X=\{x\}$. For the mode on, three vertex labeling functions are Formula$x=2$ for init(on), Formula$x\in[1,3]$ for invariant(on), and a differential equation Formula$\mathdot{x}=-x+5$ for flow(on). The discrete transition, or control switch, from on to off occurs based on an edge labeling function guard Formula$x>3$ for the mode on, and the variable Formula$x$ is reset to a value by a reset map Formula$x^{\prime}:=3$ during the transition.

Figure 2
Fig. 2. An example of a hybrid automaton [99].

Safety verification of a given hybrid automaton Formula${\cal A}$ can be addressed by determining whether the set Formula$\overline{\rm Reach}({\cal A},{\cal I})\cap{\cal U}$ is nonempty, where Formula${\cal I}$ denotes a given initial set of states, Formula${\cal U}$ denotes a set of unsafe states, Formula${\rm Reach}({\cal A},{\cal I})$ represents the set of states reached by executions of Formula${\cal A}$ starting from Formula${\cal I}$, and Formula$\overline{\rm Reach}({\cal A},{\cal I})$ is an overapproximation of Formula${\rm Reach}({\cal A},{\cal I})$.

Formula${\rm Reach}({\cal A},{\cal I})$ can be computed through the iteration Formula$\varphi_{k+1}={\rm Post}(\varphi_{k})$, where Formula$\varphi_{k}$ is the set of reached states at the Formula$k$th step, and Formula${\rm Post}(\varphi_{k})$ is the set of states that is the union of Formula$\varphi_{k}$ and the set of states reached from Formula$\varphi_{k}$ through a discrete transition and continuous flow. If Formula$\varphi_{k}$ and Formula$\varphi_{k+1}$ coincide for some finite number Formula$k$, then the algorithm terminates, returning Formula${\rm Reach}({\cal A},{\cal I})$. However, it is well known that the exact computation of Formula$Reach({\cal A},{\cal I})$ is undecidable in general [100], [101]. Hence, in such cases, computing Formula$\overline{\rm Reach}({\cal A},{\cal I})$ is also an important research issue as we will discuss later.

The first subclass of hybrid automata for which reachability was shown to be decidable is the class of timed automata [102]. Roughly, timed automata are those where 1) the vertex labeling function flow(v) is of the form of Formula$\mathdot{x}_{i}=1$; 2) the edge labeling function reset(e) either does not change the value of Formula$x_{i}$ or resets Formula$x_{i}$ to zero during a discrete transition; and 3) the sets associated with init, inv, guard are all in rectangular form, i.e., a finite boolean combination of the form Formula$x_{i}\oplus c$ for some Formula$c\in\BBQ$ and Formula$\oplus\in\{<,\leq,=,\geq,>\}$. It is important to note that even though the continuous dynamics of timed automata is very simple from a control perspective, introducing time in a model of computation was a significant conceptual advance in the area of algorithm verification, and a precursor to a lot of work on hybrid systems.

The notions of simulation and bisimulation relations were established in the area of formal methods and used successfully for complexity reduction in discrete systems [103], [104], [105]. It turns out that they are also very useful for complexity reduction of hybrid systems to address reachability. In [102], the reachability problem for timed automata is shown to be decidable since there exists a finite quotient transition system Formula${\cal R}({\cal A})$ which is bisimilar to the original timed automaton. A quotient transition system is one that is constructed by the partition of the continuous state space. Transition systems Formula${\cal T}_{1}$ and Formula${\cal T}_{2}$ are said to be bisimilar if there exists a bisimulation relation Formula${\cal B}$ between Formula${\cal T}_{1}$ and Formula${\cal T}_{2}$. Definitions of transition system, simulation, and bisimulation relation are as follows [106].

Definition 1 (Transition Systems)

A (labeled) transition system with observations is a tuple Formula${\cal T}=(Q,\Sigma,\rightarrow,Q^{0},\Pi,\langle\langle\cdot\rangle\rangle)$ where 1) Formula$Q$ is a set of states; 2) Formula$Q^{0}\subseteq Q$ is a set of initial states; 3) Formula$\Sigma$ is a set of labels; 4) Formula$\Pi$ is a set of observations; 5) Formula$\langle\langle\cdot\rangle\rangle$ is an observation map from Formula$Q$ to Formula$\Pi$; and 6) Formula$\rightarrow$ is a transition relation such that Formula$\rightarrow\subseteq Q\times\Sigma\times Q$ and a transition from Formula$q$ to Formula$q^{\prime}$ with a label Formula$\sigma$ is denoted by Formula$q\buildrel{\sigma}\over{\rightarrow}q^{\prime}$.

Definition 2 (Simulation)

A relation Formula${\cal S}\subseteq Q_{1}\times Q_{2}$ is called a simulation of Formula${\cal T}_{1}$ by Formula${\cal T}_{2}$ if for all Formula$(q_{1},q_{2})\in{\cal S}$: 1) Formula${\langle\langle q_{1}\rangle\rangle}_{1}={\langle\langle q_{2} \rangle\rangle}_{2}$; and 2) Formula$\forall q_{1} \buildrel{\sigma}\over{\rightarrow}_{1}q_{1}^{\prime}$, Formula$\exists q_{2}\buildrel{\sigma}\over{\rightarrow}_{2}q_{2}^{\prime}$ such that Formula$(q_{1}^{\prime},q_{2}^{\prime})\in{\cal S}$.

Definition 3 (Bisimulation)

A relation Formula${\cal B}\subseteq Q_{1}\times Q_{2}$ is called a bisimulation between Formula${\cal T}_{1}$ and Formula${\cal T}_{2}$ if Formula${\cal B}$ is a simulation relation from Formula${\cal T}_{1}$ to Formula${\cal T}_{2}$ and Formula${\cal B}^{-1}$ is a simulation relation from Formula${\cal T}_{2}$ to Formula${\cal T}_{1}$.

A result on the decidability of the class of initialized rectangular hybrid automata (IRHA) is shown in [101]. Two important factors for decidability are: 1) rectangularity, that is, if we denote the set of all rectangular regions in Formula$\BBR^{n}$ by Formula${\cal R}^{n}$, then the three vertex labeling functions init, inv, flow are all mapping functions from Formula$V$ to Formula${\cal R}^{n}$, and the two edge labeling functions guard, reset are mapping functions from Formula$E$ to Formula${\cal R}^{n}$; and 2) initialization, that is, a continuous variable has to be reinitialized whenever its flow changes during a discrete transition. In [101], it is shown that slight generalizations from IRHA lead to undecidability.

The o-minimal hybrid systems are defined in [107] as initialized hybrid systems whose relevant sets such as guard, reset, etc., and flow are definable in an o-minimal (or order-minimal) theory [108], [109]. This class captures hybrid systems with relatively complex continuous dynamics including linear, polynomial, and exponential flow dynamics. In [107], it is shown that every o-minimal hybrid system admits a finite bisimulation, and furthermore, the computation of such finite bisimulation terminates. Hence, o-minimal hybrid systems comprise a decidable class of hybrid system.

An interesting class of hybrid systems, called linear hybrid automata (LHA) [99], [110], are those for which, for each Formula$v\in V$ and Formula$e\in E$: 1) the vertex labeling functions flow(v), inv(v), init(v), and edge labeling functions guard(e), reset(e) are finite conjunctions of linear inequalities; and 2) more importantly, the flow function flow(v) is finite conjunction of linear inequalities over the variables in Formula$\mathdot{X}$ only. An important result is that if a given HA Formula${\cal A}$ is an LHA, then Formula$Reach({\cal A},{\cal I})$ can be computed exactly [110]. However, it is not guaranteed that the iterative reach set computation terminates.

One of the most common class of hybrid systems of interest has vertex labeling functions flow(v) in the form of a differential equation Formula$\mathdot{x}=f(x)$. An example of an HA with linear differential equations is shown in Fig. 2. For such HAs and other classes of HAs more general than LHA, there is no known algorithm that can compute Formula${\rm Reach}({\cal A},{\cal I})$ exactly even without termination guarantee. Hence, the safety verification problem for this class of HAs can only be addressed through an overapproximation of Formula${\rm Reach}({\cal A},{\cal I})$.

In [99], an approximation technique, called linear phase portrait approximation, is proposed. The basic idea of this technique is to replace the dynamics of Formula$f(x)$ for each Formula$v\in V$ by a corresponding rectangular region that upper and lower bounds the function Formula$f(x)$ over the invariant set for the mode Formula$v$. As an example, the dynamics Formula$\mathdot{x}=-x+5$ for the mode on in Fig. 2 can be over-approximated by Formula$\mathdot{x}\in[2,4]$ over the range of Formula$x\in[1,3]$. Then, it is easy to see that the HA in Fig. 2 can be overapproximated by an LHA through this technique.

Another useful technique is to overapproximate the evolution of continuous variables using polyhedral representation [111], [112]. Given a dynamical system Formula$\mathdot{x}=f(x)$, let Formula${\cal R}_{[t_{k-1},t_{k}]}({\cal I})$, called a flow-pipe segment, be the set of states over the time interval Formula$[t_{k-1},t_{k}]$ reachable from the initial set Formula${\cal I}$ at time Formula$t_{0}$, and let Formula$(C,d)$ be a matrix–vector pair that defines a polyhedron to approximate the flow-pipe segment such that Formula$x\in\{x\vert Cx\leq d\}$ for any Formula$x\in{\cal R}_{[t_{k-1},t_{k}]}({\cal I})$. Then, for a given Formula$C$, the optimal Formula$d^{\ast}$ that minimizes the overapproximation error can be determined as the solution to the optimization problem Formula TeX Source $$\eqalignno{&\max_{x_{0},t}\quad c_{i}^{T}x(t,x_{0})\cr&{\hbox {s.t.}}\qquad x_{0}\in{\cal I}\quad{\hbox {and}}\quad t\in[t_{k-1},t_{k}]&\hbox{(4)}}$$ where Formula$c_{i}^{T}$ is the Formula$i$th row vector of Formula$C$ that is the unit normal vector to the Formula$i$th face of the polytope Formula$C x\leq d$, and Formula$x(t,x_{0})$ is the solution of Formula$\mathdot{x}=f(x)$ at time Formula$t$ from the initial state Formula$x_{0}$. Then, from the optimal solution Formula$(t^{\ast},x_{0}^{\ast})$ of (4), the optimal value Formula$d_{i}^{\ast}$ for the given Formula$c_{i}$ is determined as Formula$d_{i}^{\ast}=c_{i}^{T}x(t^{\ast},x_{0}^{\ast})$. Now, the question is how to determine Formula$C$. A heuristic approach is also proposed in [111] based on a convex hull computation from a set of vertices. Assuming that Formula${\cal I}$ is a polyhedron, let Formula${\cal V}({\cal I})$ be the set of vertices of Formula${\cal I}$, and let Formula${\cal V}_{t}({\cal I})=\{x(t,v)\vert v\in{\cal V}({\cal I})\}$. Then, the matrix Formula$C$ can be determined by the set of outward pointing normal vectors of the convex hull that is obtained from the set of vertices Formula${\cal V}_{t_{k-1}}({\cal I})\cup{\cal V}_{t}({\cal I})$.

As noted earlier, the notion of a bisimulation relation between transition systems is crucial for the decidability result of several classes of HAs that have fairly simple continuous dynamics, such as timed automata. This notion can be extended to explicitly include the observation error in its definition so that a larger class of continuous dynamics Formula$\mathdot{x}=f(x)$ can be abstracted as a finite state transition system that is approximately bisimilar to the original continuous dynamics. If we let Formula${\cal T}_{M}(\Sigma,\Pi)$, called a metric transition system, be the set of transition systems associated with a set of labels Formula$\Sigma$ and a set of observations Formula$\Pi$ where Formula$(Q,d_{Q})$ and Formula$(\Pi,d_{\Pi})$ are metric spaces, then, for Formula${\cal T}_{1},{\cal T}_{2}\in{\cal T}_{M}(\Sigma,\Pi)$, an approximate bisimulation relation is defined as follows [106].

Definition 4 (Approximate Bisimulation)

A relation Formula${\cal B}_{\delta}\subseteq Q_{1}\times Q_{2}$ is a Formula$\delta$-approximate bisimulation relation between Formula${\cal T}_{1}$ and Formula${\cal T}_{2}$ if for all Formula$(q_{1},q_{2})\in{\cal B}_{\delta}$: 1) Formula$d_{\Pi}({\langle\langle q_{1}\rangle\rangle}_{1},{\langle\langle q_{2}\rangle\rangle}_{2})\leq \delta$; 2) Formula$\forall q_{1}\buildrel{\sigma}\over{\rightarrow}_{1}q_{1}^{\prime}$, Formula$\exists q_{2}\buildrel{\sigma}\over{\rightarrow}_{2}q_{2}^{\prime}$ such that Formula$(q_{1}^{\prime},q_{2}^{\prime})\in{\cal B}_{\delta}$; and 3) Formula$\forall q_{2}\buildrel{\sigma}\over{\rightarrow}_{2}q_{2}^{\prime},\exists q_{1}\buildrel{\sigma}\over{\rightarrow}_{1}q_{1}^{\prime}$ such that Formula$(q_{1}^{\prime},q_{2}^{\prime})\in{\cal B}_{\delta}$.

Concerning an approximate bisimulation relation, if a nonlinear control system is incrementally asymptotically stable [113], then it is Formula$\delta$-approximately bisimilar [114] to a symbolic model of the original continuous system that can be constructed by aggregating states and control inputs using several parameters such as Formula$\tau\in \BBR^{+}$ for time domain quantization, Formula$\eta\in \BBR^{+}$ for state space quantization, and Formula$\mu \in\BBR^{+}$ for input space quantization satisfying the following inequality: Formula TeX Source $$\beta(\delta,\tau)+\mu+\eta/2\leq\delta\eqno{\hbox{(5)}}$$ where Formula$\beta$ is a Formula${\cal KL}$ function [31]. Once we have such a symbolic model of a continuous control system, a controller satisfying a given specification can be synthesized automatically using techniques developed in supervisory control of discrete event systems or algorithmic game theory [114], [115]. Other results relevant to automatic controller synthesis for hybrid systems can be found in [116] on algorithmic controller synthesis through finite bisimulation to satisfy LTL specifications of discrete-time linear systems and in [117] on the synthesis of control laws for piecewise-affine hybrid systems on simplices.

It is important to note that, based on these theoretical results, many software tools have been developed for formal verification and automatic controller synthesis of hybrid systems. Some examples are UPPAAL [118], a verification tool for real-time systems based on timed automata, HyTech [99] and PHAVer [119] for LHA, SpaceEx [120], which is based on the LeGuernic–Girard (LGG) algorithm [121] that can efficiently handle HAs with linear differential equations with a larger number of system states compared to other approximation techniques, and PESSOA [122], which is a tool for controller synthesis based on [114]. More details and results can be found in [123], [124], [125], [126], [127] for various approaches for reachability, in [128] for other classes of systems for which some verification/synthesis problems are decidable, and in [129], [130], [131], [132] for abstractions of hybrid systems.

C. Distributed Hybrid Systems

A major goal in the design of CPSs is to have formal proofs of correctness of the overall system design. This overall system can however be quite complex, involving not only differential-equation-based dynamics of the physical system, but also discrete models of the physical system, as well as interaction with real-time computation and communication. The system is the composition of several systems. Thus, proofs of correctness will have to be holistic and transcend domains. An example is a proof of correctness of an automobile traffic control system in [133] and [134]. It involves not only differential equation models of automobiles, but also a balls-and-bins model of the positions of all cars, which is necessary to prove properties such as deadlock avoidance [135]. Also involved is real-time scheduling of the computational tasks. Similarly, the design of CPSs can involve several choices such as the extent of centralization, and the extent of robustness, both of which may have to be made, keeping in mind the provable correctness of the design. An example is a design accompanied with a proof of correctness for an automated traffic intersection [136]. So far, verification for such systems has mostly involved pencil–paper proofs and interactive theorem prover-based verification [137], [138], [139], [140]. For future systems, it would be valuable to have compositional frameworks, and more systematic or automated methods for proving correctness.



Computing and networking are key driving forces and key components of new highly connected, distributed, and reliable CPSs. We review classical and recent results in these areas.

A. Real-Time Scheduling Theory

In real-time systems, the correctness of a system depends not only on the logical results of the computation but also on the time at which the results are produced [141]. One of the primary design objectives of a real-time computing system is to support temporally predictable execution of a set of computing tasks so that it is guaranteed that there will be timely interaction between computing tasks and the physical environment. More precisely, for a given set of computing tasks Formula$\Gamma=\{\tau_{1},\tau_{2},\ldots,\tau_{n}\}$ with timing constraints, a set of processors Formula$P= \{P_{1},P_{2},\ldots,P_{m}\}$, and a set of resources Formula$R=\{R_{1},R_{2},\ldots,R_{r}\}$, a real-time computing system has to make an appropriate scheduling decision on Formula$\Gamma$ to meet all the timing constraints. If some tasks cannot meet their timing constraints, then the system should be able to determine this in advance. However, it is known to be computationally intractable to solve such scheduling problems in general [142].

One of the most influential results in real-time scheduling theory is based on a task set model, which is simple enough to be computationally tractable and also practical enough to be useful in many applications [6]. Consider the scheduling problem for a set of preemptible and periodic tasks under both fixed (or static) and dynamic priority assignment based on the following assumptions: 1) all tasks Formula$\tau_{i}\in\Gamma$ are independent, i.e., there is no shared resource or precedence relationship between tasks; 2) all instances of a periodic task Formula$\tau_{i}$ have the same relative deadline Formula$D_{i}$ and it is equal to its period Formula$T_{i}$; 3) all instances of a periodic task Formula$\tau_{i}$ have the same worst case execution time Formula$C_{i}$; and 4) there is only one processor.

The rate monotonic (RM) policy is a static priority scheduling policy that assigns priorities to tasks based on the rate of arrivals of jobs in the task. The shorter the period of a task, the higher is the priority assigned to the task. It is optimal among all static priority scheduling policies in the sense that if a periodic task set can be scheduled by some static priority policy, then it can be scheduled by RM [6]. Moreover, there is a simple sufficient condition for schedulability: Formula$\sum_{i=1}^{n}U_{i}\leq n(2^{1/n}-1)$, where Formula$U_{i}=C_{i}/T_{i}$ is the processor utilization of Formula$\tau_{i}$. There is also a less conservative schedulability condition for RM, called a hyperbolic bound [143]. For a set of periodic tasks whose relative deadlines are less than their periods, the deadline monotonic (DM) scheduling policy is an extension of RM. For the exact schedulability analysis for a given periodic task set, there is an iterative algorithm, called response-time analysis [144].

The earliest deadline first (EDF) policy assigns priorities to tasks according to the absolute deadline of their instances. Hence, EDF is a dynamic scheduling policy. It is optimal in that if there is any schedule that can meet all deadlines, then EDF will too. For task sets with deadlines less than periods, a necessary and sufficient condition for schedulability under EDF is derived in [145].

Many scheduling algorithms have been proposed to simultaneously handle both hard periodic tasks and soft aperiodic tasks. The primary objective is to minimize the response time for aperiodic tasks without compromising the schedulability of the periodic tasks. In fixed-priority scheduling, a basic idea is to create a periodic task Formula$\tau_{s}=(T_{s},C_{s})$ for serving aperiodic tasks where Formula$T_{s}$ is the period and Formula$C_{s}$ is the computation time for Formula$\tau_{s}$, called server capacity, in addition to the hard periodic task set. Some examples of scheduling algorithms based on this idea are polling server [146], deferrable server [147], and priority exchange server [148].

For dynamic scheduling, especially under EDF, one useful idea, called the total bandwidth server (TBS) [149], to handle aperiodic requests along with a set of periodic tasks, is to assign each aperiodic request a deadline such that the overall aperiodic load never exceeds a specified value of processor utilization Formula$U_{s}$ that is called the bandwidth of an aperiodic server. When the Formula$k$th aperiodic request which requires Formula$C_{k}$ amount of execution time arrives at time Formula$r_{k}$, then the deadline assigned to this aperiodic task is Formula$d_{k}=\max\{r_{k},d_{k-1}\}+(C_{k}/U_{s})$, where Formula$d_{k-1}$ is the deadline assigned previously for the Formula$(k-1)$th aperiodic request. However, TBS cannot be used when Formula$C_{k}$ is unknown. For such cases, a bandwidth reservation mechanism, called the constant bandwidth server (CBS), is proposed in [150]. The basic idea of CBS is that when a new aperiodic request arrives, it is assigned a suitable deadline that is determined by the currently available bandwidth resource for the request. If the request cannot be completed before its deadline, then its deadline is postponed. Notice that, under EDF this implies that its priority is decreased and thus the interference to the other tasks is reduced.

For a task set that consists of aperiodic tasks with arbitrary arrival times, execution times, and deadlines, a utilization bound for schedulability is derived in [151]. In particular, the notion of synthetic utilization, denoted as Formula$U(t)$, is introduced, which is roughly defined as the sum of utilization values of all arrived aperiodic requests whose deadlines have not yet expired at time Formula$t$. A set of Formula$n$ aperiodic tasks is schedulable under the deadline monotonic scheduling policy if, for all Formula$t$, Formula$U(t)\leq UB(n)$ where Formula$UB(1)=1$, Formula$UB(2)=0.75$, and Formula$UB(n)=1/1+ \sqrt{(1/2)(1-(1/n-1))}$ for Formula$n\geq 3$. Deadline monotonic scheduling policy is optimal among all time-independent scheduling policies, a generalized notion of fixed-priority scheduling that applies to aperiodic tasks, since no other time-independent scheduling policy can have a higher upper bound for Formula$U(t)$. For the case of dynamic aperiodic task scheduling, EDF is optimal and its utilization bound is 1.

In reality, tasks are not generally independent since they typically share resources such as memory, files, communication network, etc., for their execution in a mutually exclusive manner. In such cases, a higher priority task can be blocked by a lower priority task due to resource sharing. This is called priority inversion and its duration can be arbitrarily long. In [152], a simple solution is proposed that is called the priority inheritance protocol (PIP). The basic idea of PIP is to let a lower priority task which currently holds the shared resource to inherit temporarily the highest priority among the blocked tasks, until it releases the resource. After releasing the resource, it recovers its original priority. However, it is known that priority inheritance does not prevent deadlocks. The priority ceiling protocol (PCP) is also proposed in [152] as an extension of the PIP to resolve this issue. Under EDF, the PIP and the PCP are not applicable since they are based on the fixed-priority scheduling system. For such cases, the stack resource policy (SRP) [153], extended from the PCP to support dynamic priority scheduling and to allow the sharing of runtime stack-based resources, is a useful mechanism that is applicable to both fixed-priority and dynamic scheduling.

It is of interest to determine guarantees for jobs that have to be processed by a sequence of processors, i.e., as they move through a network. The work on stability of reentrant lines provides bounds on end-to-end delays that are of potential interest because it establishes a pipeline property where the delay is related to the bottleneck node [154]. Other important results can be found in [155] and [156] on real-time queueing theory for stochastic analysis of soft real-time systems, in [157] and [158] on real-time scheduling analysis in a resource partitioned computing environment, in [159] on resource kernels as an approach for operating system resource management based on resource reservation for real-time applications, in [160], [161], [162] on a control theoretical approach to performance and throughput management of computing systems, in [163] on real-time scheduling algorithms for power management in embedded real-time systems, and in [7] for more on real-time system theories.

B. Real-Time Systems for CPSs

There are several important characteristics which make today's CPSs different from earlier generation control systems: 1) the scale of a CPS is much larger; 2) entities comprising a CPS typically run over heterogeneous environments; and 3) entities interact with each other in a very complex manner. It is also expected that CPSs should be highly extensible for new functionalities, and flexible for runtime adaptation. Due to such structural and behavioral complexities, it is more challenging to design and implement a CPS. To overcome such complex issues, it is becoming increasingly important to develop a software platform, called middleware, based on an appropriate abstraction of such complex systems, and a well-designed architecture for rapid implementation of reliable and evolvable CPS applications [134].

The Common Object Request Broker Architecture (CORBA) [164] is a well-known industry standard specification for middleware developed by the Object Management Group (OMG). It is designed primarily for interoperability between software objects running on different machines in a heterogeneous distributed environment. Thus, it is not designed for control system applications in which temporal predictability is essential. Later, Real-Time CORBA has been developed as an extension of CORBA to support temporally predictable end-to-end interactions between client and server objects in a system. It defines a set of mechanisms and interfaces which enable applications to explicitly manage system resources such as synchronization mechanism, thread pool model, scheduling service, and explicit binding.

The ACE ORB (TAO) [165] is an implementation of Real-Time CORBA. It has been used in application areas such as telecommunication, aerospace, medical, and financial services. It is used as a middleware framework for an application development platform, called open control platform (OCP) [166], developed for complex and reconfigurable control system applications under the U.S. DARPA Software Enabled Control research program.

Etherware [133] is a middleware developed for large-scale networked control systems. It is based on the concept of microkernel architecture and supports component-based application development. It also supports runtime reconfiguration, such as component upgrade, and even migration at runtime from one computing node to other node. This is possible through an Etherware component model that is based on several software design patterns [167]. Etherware has been enhanced to support time-critical systems by incorporating quality of service (QoS) in component interaction and a real-time scheduling mechanism for interactions [168]. As an illustrative example of Etherware-based CPS, Fig. 3 shows how a distributed traffic system can be developed over Etherware.

Figure 3
Fig. 3. An illustrative example of Etheware-based distributed traffic control system.

A component-based middleware framework for networked embedded systems has been developed under the European Reconfigurable, Ubiquitous, Networked Embedded Systems (RUNES) project. As in Etherware, it is designed to support runtime reconfiguration provided by a middleware service, called the Logical Mobility Service. A number of components for network reconfiguration, localization, and collision avoidance have been developed based on the RUNES middleware component framework [169]. OSA+ [170] is another middleware based on the microkernel architecture for distributed real-time embedded systems. Other real-time middleware frameworks are RTZen [171], an implementation of Real-Time CORBA developed on a real-time Java platform, and ARMADA [172], a set of communication and middleware services for real-time embedded distributed applications.

Another approach to implementation of real-time CPSs is the development of programming languages. Giotto [173] provides platform-independent high-level abstractions that can be used for specifying time-triggered sensor readings, task invocations, actuator updates, and mode switches of control systems. Platform-specific issues such as schedulability analysis of a program on a specific platform are handled by the Giotto compiler. Giotto thereby decouples high-level real-time programming of real-time embedded systems from low-level real-time scheduling of computation and communication. Other programming languages for real-time systems that have been used successfully, especially in industry, are the synchronous languages such as Esterel [174] and Signal [175].

A discussion on the importance of time in computing abstractions of every layer of the computing system and possible approaches for the development of repeatable and predictable CPS can be found in [176].

C. Real-Time Wireless Networking

CPSs rely on an underlying communication network to transport data packets between sensors, computational units, and actuators. For actions to be taken on time, these packets need to be delivered within a time deadline. Nodes may require a certain minimum throughput of such packets. Thus, CPSs need a real-time communication network that can provide guarantees on both the throughputs and delays of flows. The current Internet does not provide such QoS guarantees, a significant challenge.

As motivation, current automobiles have about 75 sensors and 100 switches connected by a wired network. The wiring harness is heavy, complex, hard to assemble, expensive, and subject to failures. There is significant motivation for replacing the wiring harness by a wireless access point. This can potentially lead to savings in fuel economy and reduced manufacturing cost, as well as making it possible to perform software upgrades, and add or remove devices. Packets will then have to be delivered within timing constraints.

Such a system can be modeled as an access point serving Formula$n$ several clients [177]. Similar to Section IV-A [6], suppose that packets arrive, one for each client, at the beginning of each common period of Formula$T$ slots. Suppose that each packet takes one slot to transmit, and in each slot the access point can attempt a transmission for one of the clients. The deadline for each packet is the end of the period. The throughput of each client is the long-term average of the number of packets delivered per period. Each client Formula$c$ has a throughput requirement of Formula$q_{c}$ packets/slot, called the timely throughput since it only considers packets delivered by their deadline. The distinction from the deterministic model for real-time computation is that the wireless channels are unreliable. When the access point transmits a packet for client Formula$c$, it only succeeds with probability Formula$p_{c}$. (This model of channel reliability can be generalized [178].)

There are two fundamental questions concerning the QoS requirements Formula$\{(q_{c},p_{c},\tau):c\in\{1,2,\ldots,C\}\}$: 1) Are they feasible; and, if so, 2) what is an appropriate scheduling policy? The first item is the problem of admission control.

Let Formula$\gamma_{c}$ be a geometrically distributed random variable with mean Formula$1/p_{c}$ representing the number of transmissions needed to successfully deliver client Formula$c$'s packet, and let Formula$I(S):=E[{[\tau-\sum_{c=1}^{n}\gamma_{c}]}^{+}]$ be the expected unavoidable idle time when the access point has to serve only the subset Formula$S\subseteq\{1,2,\ldots,C\}$, of clients. With Formula$z^{+}:=\max\{z,0\}$, the necessary and sufficient condition for feasibility [177] is Formula TeX Source $$\sum_{c\in S}{q_{c}\over p_{c}}+I(S)\leq\tau,\qquad{\hbox {for every}}\ S\subseteq\{1,2, \ldots,C\}.\eqno{\hbox{(6)}}$$ The following weighted delivery debt policy fulfills any set of feasible clients: give priority to clients according to the expected number of packets that ought to have been delivered minus the number of packets that have been actually delivered, weighted by some positive constant.

In some situations, task frequencies can be optimally tuned to support control systems [179]. Suppose that the throughputs Formula$\{q_{c}\}$ are not prespecified, but there is a strictly concave and increasing utility function Formula$U_{c}(q_{c})$ for each client, and the goal is to maximize the sum of the utilities: Formula$\max_{(q_{1},q_{2},\ldots,q_{n})}\sum_{c=1}^{n}U_{c}(q_{c})$. This problem is difficult because the number of constraints (6) is exponential in Formula$n$. One can decompose the problem into two subproblems, as in [180], by decoupling clients from the access point by using a price per unit throughput Formula$\psi_{c}$ for each client, and the amount Formula$\rho_{c}$ paid by client Formula$c$. Then, client Formula$c$'s problem is Formula$\max_{\rho_{c}}[U_{c}(\rho_{c}/\psi_{c})-p_{c}]$, subject to Formula$0\leq\rho_{c}\leq\psi_{c}$. The access point's optimization problem is to determine how much timely throughput to provide to each client, given that the client is willing to pay Formula$\rho_{c}$: Formula$\max_{(q_{1},q_{2},\ldots,q_{c})}\sum_{c=1}^{n}\rho_{c}\log q_{c}$, subject to the constraints (6). Surprisingly, the access point's problem is solved [181] by simply giving higher priority to clients with lower value of the ratio. (Number of slots provided to Formula$c$ so far)/Formula$\rho_{c}$. Also interesting is the consequence that neither the access point nor the clients need to know the channel reliabilities Formula$(p_{1},p_{2},\ldots,p_{n})$.

This formulation of the problem of real-time wireless communication can be extended to handle random arrivals [182], model fading and rate adaptation [178], provide a minimum specified throughput to each client while maximizing the total utility even when the clients are strategic and noncooperative in revealing their true utilities [183]. There has also been work on simultaneous existence of flows with delay constraints as well as flows without delay constraints [184]. A major open problem is that of delay constraints in multihop networks.

For sensor networks, protocols have been developed to support real-time applications [185]. A protocol to support timeliness is presented in [186] that exploits cellular structure for the network architecture, the periodic nature of communication, and uses EDF to support real-time messages. The SPEED protocol [187] attempts to ensure that end-to-end delay is proportional to the distance traveled by the flow. RAP [188] is a communication architecture for supporting high-level query and event services. Nano Resource Kernel (Nano-RK) is a real-time operating system for sensor networks [189].



Wireless networks allow nodes to communicate with each other over the wireless medium, possibly by using other nodes as relays or cooperating in other more information theoretic ways. By attaching sensors to nodes and providing them with computational capability, one obtains wireless sensor networks. They can be deployed to monitor their environment, e.g., monitoring facilities for anomalies or monitoring wildlife [190], [191], [192], or to conduct physics-in-the-large by offering scientists the means to deploy large number of sensors in the field and wirelessly gather information from them, as at the Center for Embedded Networked Sensing [193]. By using active sensors they can estimate distances between nodes and thus their relative positions [194].

A. Connectivity of Sensor Networks

Two nodes not in range of each other may need to communicate over several hops. Therefore, a multihop wireless network will need to ensure such a path between any two nodes, i.e., it is connected. The range of a wireless transmission will depend on its power, for the same data rate. If nodes do not employ adequate power, there may not be enough links in the network to produce a connected graph, while using too much power is wasteful. It is of interest to determine what is an appropriate range that ensures that a wireless network is connected.

A simple model is when Formula$n$ nodes are randomly scattered, say uniformly in a square, and employ a common range Formula$r_{n}$ that depends on Formula$n$. When the nodes are few and therefore sparse, they need to choose a large range to form a connected graph, while if there are many nodes, and hence dense, then they can each choose a small range. The network is connected with a probability approaching one as the number of nodes Formula$n\rightarrow\infty$, if and only if Formula$r_{n}=\sqrt{(\log n+\gamma_{n})/\pi n}$, where Formula$\lim_{n\rightarrow+\infty}\gamma_{n}=+\infty$ [195], [196]. When nodes are more regularly spread, one can reduce the range while still being connected.

A related problem is that of coverage by sensors. If each node has a sensor that can detect events within a distance Formula$r_{n}$, one is interested in how large Formula$r_{n}$ has to be so that every point in the entire domain is covered by some sensor [197].

B. Energy-Efficient Networking

The overall sensor network may often be deployed untended over a long duration, with the nodes drawing energy from their batteries or from renewable energy sources, such as solar cells, and one is interested in ensuring that the networks can survive a long duration in the field before requiring attention, e.g., replacing batteries or other maintenance [198]. All protocols used will therefore have to be energy efficient.

Clearly, any collision of packets leads to packet loss and is wasteful. Nodes will need to coordinate their wireless transmissions to avoid interfering with each other. This needs a medium access protocol. It must efficiently use the transmission medium and avoid wasting the communication spectrum that is a common resource of the network, and also be energy efficient. In contrast to wireless local area networks, ensuring fairness to all nodes is not important since sensor networks are often deployed for a specific purpose. Thus, one can design a medium access control protocol specifically for sensor networks. Also, a node wastes energy if its radio is “on” listening to packets that are not intended for it, or just “on” when there is no nearby ongoing transmission. One of the most significant ways to save power is to turn off a node's radio and put it to sleep. The protocol Sensor-MAC (S-MAC) [199] takes advantages of such sleep to save power. Sleeping can be initiated on the basis of time, i.e., by scheduled sleep, or by implicit signaling that occurs due to a neighbor's transmission. The former requires clock synchronization. Collision can be avoided by using control packets, e.g., “request-to-sent” (RTS) and “clear-to-send” (CTS) as in IEEE 802.11 [13]. Long packets can be fragmented into smaller packets, so that not all is lost when a long packet is corrupted. However, transmission of the short packets can be done in a single burst, after only a single RTS and CTS, thus amortizing their overhead. Through such strategies, MAC protocols can be made to be specifically energy efficient for sensor network deployment [200]. The protocol B-MAC is motivated by the goal of simple implementation, and aims at only providing link-layer functionality, relegating other functionalities like task synchronization and organization to higher layers, which can then employ the mechanisms exposed by B-MAC so as to adapt to changing network or channel conditions. It employs carrier sensing and adaptive preamble sampling to design an efficient MAC for sensor network monitoring applications.

C. Routing in Sensor Networks

A routing protocol is needed for two nodes that are not neighbors to communicate. Peer-to-peer routing, multicast and all-cast may be needed by sensor network applications. Very commonly, the data gathered, or the information that is extracted, may need to be communicated to a designated “sink” or “collector” or “fusion” node, which may also possibly serve as a gateway for exfiltrating the desired information out of the sensor network. This is called “ConvergeCast” [201]. It may need to be done in an energy-efficient manner, or with low delay, depending on the application. In some applications, the identity of a node may not be important; only its data may be relevant. This can simplify ID or address management schemes. Nodes may be limited in their processing or storage capabilities. The data collected from nearby nodes may have considerable redundancy, which can also be exploited in designing an efficient protocol. The protocol may be query based, responding to particular information that is sought, or the dissemination may be content based. The protocol itself may be flat, hierarchical, or even location based [202].

D. Protocols and Operating Systems for Sensor Networks

TinyOS [203], [204], an open source operating system developed for sensor networks, has triggered much experimental and deployment activities in sensor networks. The challenges in the networking, operating system, and middleware layers are surveyed in [205]. The IEEE 802.15.4 standard [206] specifies the physical layer and medium-access control for wireless personal area networks. The Zigbee alliance [207] builds upon IEEE 802.15.4 to specify high level protocols for low data rate and low energy consumption applications. WirelessHART [208], [209] is an open communication standard for process control. So is ISA100.11a that has been developed by the International Society of Automation (ISA) [210]. An Internet Engineering Task Force Working Group has developed 6LoWPAN [211], [212], [213] to use Internet Protocol version 6 over IEEE 802.15.4. It allows interoperability with Internet Protocol (IP) links, while still being energy efficient, reliable, adaptable to applications, and allowing management of a large number of nodes. Interoperability with IP also allows use of established security mechanisms, network management tools, transport protocols, and services for naming, addressing, discovery, etc.

E. Clock Synchronization in Sensor Networks

In many applications, it is important that sensor measurements be time stamped. In fact, this is an important aspect of CPSs because the physical world's evolution does depend on time. Different nodes in the network may however have different clocks, and so it is necessary to synchronize them. Another reason is that in order to save energy, it is important that nodes go to “sleep” most of time, and “wake up” only when necessary to hear or send a transmission, or take a sensor measurement. When a node wakes up and transmits, it is necessary that the receiving node also be in an awake state. The more accurately their sleep–wake times are coordinated, the less is the energy wastage in an awake but idle state.

When clocks are linear, they can be described by their skew (rate) and offset. Two neighboring nodes can exchange time-stamped packets. If there is a constant but unknown time delay in such packet exchanges that is symmetric, i.e., the same for transmissions in both directions, then the nodes can determine all three quantities—offset, skew, and time delay [214], [215].

In a network, one can multiply multiple skews over successive links in a path to determine the skew between two remote nodes, and likewise one can also estimate offsets. The Flooding Time Synchronization Protocol [216] time stamps packets at the MAC layer and uses linear regression to smooth noisy time stamps and delays. When the synchronization error at each link is independent with a certain standard deviation, then summed over the links along the path, the error grows as Formula$O(\sqrt{d})$, where Formula$d$ is the diameter of the network. In a grid topology where Formula$n$ nodes are located at, say, points with integer Formula$x$ and Formula$y$ coordinates in a square, the synchronization error grows like Formula$O(n^{1/4})$. If nodes are uniformly and randomly located in a square of side 1, then the critical range at which the network gets connected is Formula$O(\sqrt{(\log n)/n})$, as noted earlier. Then, the synchronization error grows like Formula$O((n/\log n)^{1/4})$ [217]. All these errors grow polynomially with the number of nodes in the network. However, one can do much better by combining estimates over different paths [218], [219]. The error is then related to the resistance distance of the graph, i.e., the resistance between two nodes when each link is replaced by a 1-Formula$\Omega$ resistance [218]. The resulting error in a critically connected random wireless network is then only Formula$O(1)$, showing that error can indeed be kept bounded even in random wireless networks with large number of nodes [217].

F. In-Network Information Processing in Sensor Networks

The raison d'etre for sensor networks is that they can provide information about the environment, which may be exfiltrated through a designated gateway to an external entity. To do this, the data gathered by the sensor nodes has to be processed to determine relevant information. One strategy is to send all data from all nodes to the sink or gateway node, where it is centrally processed. However, this may be very wasteful of energy and communication bandwidth due to the large amount of data. An alternative is for all nodes to conduct processing, and only send along to other nodes what is relevant. This strategy is feasible because individual nodes in sensor networks have computation capabilities. Nodes can thereby trade off computation for communication. This strategy is called “in-network computation,” and how it is to be best done is an important issue.

An early precursor is the communication complexity problem in distributed computing [220]. The goal is to exchange the minimum number of bits between two nodes which each possess the value of one variable, so that they can determine the value of a function of the two variables. Similar to block communication, one can compute several instances of the function, giving rise to the direct sum problem [221]. In information theory, a similar problem is source coding with side information. One variant is to require zero error for finite block lengths [222], [223].

The problem of computing a function corresponds to a rate-distortion problem with a particular choice of distortion measure, and the required capacity is the conditional graph entropy [224]. The problem of computing some symmetric functions, i.e., invariant to permutations of their arguments, has been considered in the context of a wireless sensor network in [225]. For a random wireless network with Formula$n$ randomly located nodes, the shared aspect of the wireless medium is modeled by each wireless transmission consuming a certain interference footprint. The rate at which the Average of nodal values can be computed is Formula$\Theta(1/ \log n)$, when each node chooses a communication range that leads to a connected graph. Interestingly, this problem does not benefit, up to order, from allowing block computation. In contrast, computing the Maximum does significantly benefit from allowing block computation; the computational rate is Formula$\Theta(1/(\log\log n))$. Such symmetric functions are of interest because many statistical functions are symmetric, and because they embody the data-centric paradigm where only nodal values are relevant, and not nodal identities. The problem of computing divisible functions is addressed in [225], while the problem of computing divisible functions that are amenable to divide and conquer is considered in [225], [226], [227].

When data are random, one can consider optimal function computation to minimize the expected number of bits communicated. This has been considered for symmetric functions that are Boolean valued [228], [229], and some specific problems are solved optimally or near-optimally when nodes are collocated within one hop of each other.

One can also consider the problem of in-network computation from an information theoretic point of view; this has been done for two nodes in [230] and for collocated nodes in [231]. The problem of computing in noisy networks is considered in [232], [233], [234], [235], [236], [237], [238], [239]. Related information theoretic problems are studied in [240].

G. Self-Calibration in Sensor Networks

There are also interesting issues at the sensing end [241]. For example, sensing nodes may provide erroneous measurements about the environment, and it is important to detect that based on correlated sensor measurements from neighboring nodes. More generally, there is the problem of how nodes in a network can self-calibrate themselves [242].



Security is a critical aspect of any safety-critical system, i.e., one where physical harm can be caused. Much remains to be done for security of CPSs. The case of an attack on a Supervisory Control and Data Acquisition system is described in [243]. There have been attacks on natural gas pipeline systems [244], trams [245], power utilities [246], and water systems [247]. Recently, there has been the Stuxnet worm that attacked control systems [248], [249], [250]. There has been much work on security at the computational and communication layers, but CPSs have additional challenges since they involve not only the communication and computation layers, but also the control layer and the physical system itself. At the same time, one can also exploit the features of the CPS system to develop approaches to security.

Several new challenges and a research roadmap are presented in [251]. Due to the feedback processes between the physical and cyber parts, there are new communication “channels” that need to be secured. Some large-scale systems, e.g., power grid, are federated. The systems are real time, yet can be geographically distributed. There are a multiplicity of time-scales and the overall system is a system-of-systems.

The vulnerability of CPSs is increased because controllers are computers prone to bugs and attacks, the communication networks are open and of potentially large scale, increasing use of commodity solutions so that systems are susceptible to the flaws of components, protocols for control are becoming more open and accessible, and increasing functionality provided by CPS opens new vulnerabilities [252]. There are challenges and security mechanisms for prevention, detection and recovery, resilience, and deterrence of attacks [253]. Computer attacks can be detected by incorporating knowledge of the physical system under control [254]. Other results for detecting attacks can be found in [255] and [256].

Standardization efforts underway include North American Electric Reliability Corporation [257], National Institute of Standards and Technology [258], and ISA-SP99 [259].



The importance of the computing system, especially software, in control system applications has been increasing. Since its first introduction in automobiles around 30 years ago, the amount of software has increased. Computing systems including software can take up almost half of the production costs of today's automobiles [260]. The same is true for many other control systems such as airplanes and factory automation systems. This trend is anticipated to continue due to the significant benefits provided by software technologies in control applications, with respect to functionalities, performance, and flexibility. Simultaneously, it is becoming more challenging to develop such control systems since the overall complexity of the system also increases. In fact, the performance, reliability, and production costs of control systems are becoming more dependent on those of computing systems, especially software systems. Hence, an important research issues in software technology is how to manage complexity to make it easy to design and implement software systems for reliable CPSs.

From past experience, it has been observed that one of the most effective approaches in managing complexity, and accordingly increasing productivity in software development, is to raise the level of abstraction. In the early stages of computing, assembler technology allowed us to step up from machine code to assembler code. Later in the 1970s, compiler technology raised the level of abstraction a step further from assembly language to high level programming languages such as C and Fortran, which make it significantly easier to write and understand software programs. We now have object-oriented programming languages such as C++ and Java, which allow us to develop software at even higher levels of abstraction than procedural programming languages such as C and Fortran.

The next level of abstraction beyond today's component and object-based programming can be model-driven development (MDD), as emphasized in [261], [262], [263], [264]. One of the important visions of MDD is that software developers can develop software systems through designing models in the application domain instead of writing computer programs at the implementation level, and can then transform the application domain design models into real implementation. MDD can thereby significantly improve the productivity of the software development process. Another important benefit is to improve productivity in the long term by supporting developers to build a software system that is less sensitive to changes in personnel, requirements, and implementation platforms [262].

Broadly, a model is a description of some aspect of a system for some purposes such as communication, analysis, or implementation. In principle, models relevant for software systems can be in any form depending on the purpose. For example, in the traditional software development process, the requirement and functionalities of a software system are typically described in text and picture format, resulting in documents for software developers to use. At the next stage, the system is designed based on requirements typically in the form of diagrams, e.g., class diagrams and activity diagrams of Unified Modeling Language. Finally, the design is implemented and tested by software developers in the form of computer programs. One of the issues in this process is that the models at various stages are only loosely connected and information contained in a model might not be correctly captured during the transition from one form of model to another. As an example, whenever there is some change in requirements, lower level models have to be manually updated to maintain consistency between models, and vice versa. Another important concern is that whether there are some errors at the design stage might not be determined until the test stage of the implemented code. Thus, it requires much cost and effort to maintain consistency between models in the traditional software development process.

To fully exploit benefits of MDD such as automatic generation of complete programs from application domain models and automatic verification of a system at design time, models, especially those at the application layer, should possess properties that enable seamless usage throughout the development process [261]. Key to MDD are that a model should be 1) an appropriate abstraction of the system, hiding irrelevant details; 2) represented so that it is easily understandable for improving productivity in design and maintenance; and 3) executable, so that it can help to predict the modeled system's properties at an early stage of development process. Building such models is itself a great challenge in MDD. Major challenges in realizing the vision of MDD are categorized into three different aspects [263]: 1) modeling language to support creating well-defined models; 2) separation of concerns to support modeling a system from multiple viewpoints; and 3) model manipulation and management, such as transformation between models, maintaining consistency between models, and model-level execution and debugging.

Model-driven architecture (MDA) [265] is a conceptual framework for software development defined by OMG, and is supported by standards for modeling and transformation between models such as UML, XML Metadata Interchange, Meta-Object Facility. In particular, to improve flexibility for better support of evolving software systems, MDA models a system in three different types: 1) computation independent model to capture system requirements; 2) platform independent model to represent a system with high-level designs that are independent of any forms of implementation technologies; and 3) platform-specific model to represent a system in terms of some specific platform implementation technologies.

Model-integrated computing (MIC) [264] is another well-known software development framework which supports the development paradigm envisioned by MDD. As in MDA, models are the main artifacts for software development and used in each stage of the development process, such as design, analysis, and test. However, while MDA adopts UML as one of its primary modeling languages, MIC emphasizes the framework for designing modeling languages, called domain-specific modeling language (DSML) [266]. DSML tool suites developed based on such an MIC concept are the Platform-Independent Component Modeling Language for component-based software system development and the Embedded Control Systems Language for distributed embedded automotive system development [267]. Another approach to MDD is Software Factories [268], which provides a software framework that can be used to create software development environments for rapid development of applications. The Architecture Analysis & Design Language (AADL) [269] is a Society of Automotive Engineers (SAE) standard model-based language that can be used for designing and analyzing structure and runtime behavioral properties such as performance, schedulability, and reliability of complex real-time embedded systems.



As can be seen, the research spectrum related to CPS is indeed quite broad, ranging from theories in various areas for analysis and design, to technologies for implementation. The impact of CPS research can be significant enough to bring revolutionary changes in how to design and develop engineering systems to meet societal needs in several domains such as energy, environment, and healthcare. In this section, we attempt to anticipate benefits that CPS research can potentially provide in some representative application areas. We also outline some of the challenges that need to be overcome.

A. Energy Systems

Energy generation, transmission, and distribution for a clean and sustainable society are high-priority issues that need immediate research attention in many disciplines for the global public interest. Smart grid [270] is a next-generation infrastructure for electric power systems that can help to produce, distribute, and use electricity in a more clean, efficient, and cost-effective manner through the integration of computing, communication, and control technologies. The production and distribution of electric energy can be made more responsive and reliable through real-time distributed sensing, measurement, and analysis. Furthermore, communication and information technology can contribute to improving efficiency of overall electric energy consumption by encouraging consumers to avoid consumption at peak times through dynamic pricing mechanisms and by providing useful real-time price information to consumers. Thanks to the infrastructure and mechanisms for bidirectional exchange of information and electricity, smart grid also allows traditional electric energy consumers to become providers. Electric energy that is stored or generated at residential and industrial facilities from renewable energy sources such as wind and solar can be sold to other consumers in the neighborhood or electric power providers.

Computing, communication, and control technologies can play an important role in improving efficiency in home and office building energy consumption. Electric energy used in the buildings sector is approximately 70% of total electricity consumption in the United States [271]. Energy consumption for lighting, heating/cooling, and computing can be made more efficient through distributed sensing and intelligent management of energy consumption by dynamically reacting to circumstances such as human activities and weather conditions.

B. Transportation Systems

The development of vehicles, mass transit, and traffic systems to address sustainability, efficiency, congestion, and safety is an important research issue for the benefit of our environment, economy, and safety. Next-generation transportation systems can potentially integrate intelligent vehicles and intelligent infrastructures. Intelligent vehicles can be equipped with seamlessly integrated embedded computing systems and in-vehicle networking systems. Vehicles can exchange information through wireless communication between vehicle-to-vehicle and vehicle-to-infrastructure. Intelligent mass transit systems can be more adaptive to the needs of users. Through these capabilities, vehicles can assist drivers or even drive autonomously by monitoring and estimating traffic conditions, planning ahead their behavior, and implementing the plan through drive-by-wire functionalities such as stability control, speed control, braking, and steering. Intelligent traffic infrastructures can be operated to manage the throughput of entire traffic systems. Intelligent mass transit systems can be better adaptive to the needs of users.

C. Healthcare and Medical Systems

It is an important challenge to design and develop medical devices and systems with better efficiency, reliability, intelligence, and interoperability. Medical devices need to be highly reliable, and moreover should be operated in a patient-specific manner since patients have different physiological characteristics. Formal models of patient physiological dynamics, and the hardware and software systems of medical devices, and their interactions, can play an important role in designing and verifying safety properties of devices. The integration of wireless networking and distributed sensing and computing infrastructure for interconnectivity and interoperability with medical devices enables the development of medical systems by which patient physiological conditions can be diagnosed and treated in a more integrated and intelligent manner.

Research Challenges

The high level of complexity of CPSs in both structural and behavioral aspects poses many challenges for researchers in realizing the benefits envisioned in many application areas.

Fundamental theoretical frameworks that can address the dynamics of CPSs in an integrated manner need to be developed. Further development of theoretical foundations is needed to better understand and predict complex dynamical behaviors caused by tight interactions between cyber and physical domains. Significant further advancement is needed to develop theories which enable us to capture and analyze the dynamics of the communications, computation, control, and applications in a unified theoretical framework.

Much research remains to be done to address complexity and productivity issues in the design and development of CPSs. Languages to model various aspects of a system at different levels of abstraction for various application domains need a fuller development. Further advances are also required to support automatic transformation between models in different semantic domains, model-level execution and debugging capabilities, composition of models to build an application, and incorporation of verification and validation capabilities.

Software platforms with well-defined and appropriate levels of abstractions and architecture are essential for the development of reliable, scalable, and evolvable CPSs in various application domains. They should hide unnecessary complexities inherent to CPSs, such as heterogeneity and distribution, and support rapid implementation of application and runtime reconfiguration and resource management to meet functional and nonfunctional requirements of an application.

Control methodologies need to be extended to much broader contexts since next-generation CPSs will be operated in much larger scales and in open environments. Algorithms and theories for high-level decision making based on information collected from different sources at different spatial and temporal scales are necessary for system-wide reliability, efficiency, security, robustness, and autonomy of CPSs.

Much important work remains to be done.


The authors would like to thank M. Caccamo, M. Franceschetti, S. Mitra, and P. Tabuada for their careful reading of the paper and valuable comments.


This work was supported in part by the National Science Foundation (NSF) under Contracts CNS-1035378, CNS-0905397, CNS-1035340, and CCF-0939370, by the United States Army Research Office (USARO) under Contracts W911NF-08-1-0238 and W-911-NF-0710287, and by the U.S. Air Force Office of Scientific Research (AFOSR) under Contract FA9550-09-0121.

The authors are with the Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843-3128 USA (e-mail:;


No Data Available


Kyoung-Dae Kim

Kyoung-Dae Kim

Kyoung-Dae Kim received the B.S. and the M.S. degrees in mechanical engineering from Hanyang University, Seoul, Korea in 1995 and in 1998, respectively and the M.S. degree in computer science and the Ph.D. degree in electrical and computer engineering from the University of Illinois at Urbana-Champaign, Urbana, in 2011.

Currently, he is a Postdoctoral Research Associate in the Department of Electrical and Computer Engineering, Texas A&M University, College Station. His research interests include robotics, autonomous systems, distributed and cooperative control systems, real-time embedded systems, and hybrid systems.

P. R. Kumar

P. R. Kumar

P. R. Kumar (Fellow, IEEE) received the B.Tech. degree in electrical engineering (electronics) from the Indian Institute of Technology (IIT), Madras, India, in 1973 and the M.S. and D.Sc. degrees in systems science and mathematics from Washington University at St. Louis, St. Louis, MO, in 1975 and 1977, respectively.

From 1977 to 1984, he was a faculty member in the Department of Mathematics, University of Maryland Baltimore County, and from 1985 to 2011, in the Department of Electrical and Computer Engineering and the Coordinated Science Laboratory at the University of Illinois. He is currently at Texas A&M University, College Station, where he holds the College of Engineering Chair in Computer Engineering. He has worked on problems in game theory, adaptive control, stochastic systems, simulated annealing, neural networks, machine learning, queueing networks, manufacturing systems, scheduling, wafer fabrication plants, and information theory. His research is currently focused on wireless networks, sensor networks, cyber–physical systems, and the convergence of control, communication, and computation.

Dr. Kumar is a member of the National Academy of Engineering of the USA, as well as the Academy of Sciences of the Developing World. He was awarded an honorary doctorate by the Swiss Federal Institute of Technology (Eidgenossische Technische Hochschule), Zurich, Switzerland. He received the IEEE Field Award for Control Systems, the Donald P. Eckman Award of the American Automatic Control Council, and the Fred W. Ellersick Prize of the IEEE Communications Society. He is a Guest Chair Professor and Leader of the Guest Chair Professor Group on Wireless Communication and Networking at Tsinghua University, Beijing, China. He is also an Honorary Professor at IIT Hyderabad. He was awarded the Daniel C. Drucker Eminent Faculty Award from the College of Engineering at the University of Illinois, the Alumni Achievement Award from Washington University in St. Louis, and the Distinguished Alumni Award from IIT Madras.

Cited By

No Data Available





No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
INSPEC Accession Number:
Digital Object Identifier:
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available

Text Size