Decentralized Stream Runtime Verification for Timed Asynchronous Networks

Problem: We study the problem of monitoring distributed systems such as smart buildings, ambient living, wide area networks and other distributed systems that get monitored periodically in human scale times. In these systems computers communicate using message passing and share an almost synchronized clock. This is a realistic scenario for networks where the speed of the monitoring is sufficiently slow (like seconds or tens of seconds) to permit efficient clock synchronization, where clock deviations are small compared to the time precision and frequency required by the monitoring. Solution: More concretely, we propose a solution to monitor decentralized systems where monitors are expressed as stream runtime verification specifications. We solve the problem for “timed asynchronous networks”, where computational nodes where the monitors run have a synchronized clock with a small bounded maximum drift. These nodes communicate using a network, where messages can take arbitrarily long but cannot be duplicated or lost. This setting is common in many cyber-physical systems like smart buildings and ambient living. This assumption generalizes the synchronous monitoring case. Previous approaches to decentralized monitoring were limited to synchronous networks, which are not easily implemented in practice because of network failures. Even when network failures are unusual, they can require several monitoring cycles to be repaired. Methodology: We describe formally the monitoring problem for timed-asynchronous networks, we describe a decentralized algorithm and provide proofs of its correctness. Afterwards, we formally analyze the complexity of our solutions and provide two analysis techniques to approximate the memory requirements. Finally, we implement the algorithm and perform an empirical evaluation with real data extracted from four different datasets. Contributions: We propose a solution to the timed asynchronous decentralized monitoring problem. We study the specifications and conditions on the network behavior that allow the monitoring to take place with bounded resources, independently of the trace length. Finally, we report the results of an empirical evaluation of an implementation and verify the theoretical results in terms of effectiveness and efficiency.


Introduction
We study the problem of decentralized runtime verification of stream runtime verification (SRV) specifications under the timed asynchronous model of computation.In decentralized monitoring a specification is decomposed into a network of monitors that communicate by exchanging messages.These monitors cooperatively evaluate the specification against a trace of input observations performed arXiv:2302.00506v2[cs.LO] 3 Feb 2023 at distributed locations.We present a solution to the decentralized monitoring problem under the timed asynchronous model of computation-in which processes share a sufficiently synchronized global clock but where messages can take arbitrarily long to arrive.
Runtime verification (RV) is a dynamic technique for software quality assurance that consists of generating a monitor from a formal specification.This monitor then inspects a single trace of execution of the system under analysis.In contrast to static verification techniques, RV sacrifices completeness to provide a readily usable formal method, that for example can be easily combined with testing and debugging.One of the problems that RV must handle is to generate monitors from a specification.Early approaches to RV specification languages were based on temporal logics [Havelund and Roşu, 2002, Eisner et al., 2003, Bauer et al., 2011], regular expressions [Sen and Roşu, 2003], timed regular expressions [Asarin et al., 2002], rules [Barringer et al., 2004], or rewriting [Roşu and Havelund, 2005].Another approach to monitor specifications is Stream Runtime Verification (SRV)-pioneered by Lola [D'Angelo et al., 2005]which defines monitors by declaring equations that describe the dependencies between output streams of results and input streams of observations.SRV is a richer formalism than most RV solutions that goes beyond Boolean verdicts (like in logical techniques) by allowing specifications that compute richer verdicts as output.Examples include counting events and other statistics, computation of robustness values or generating explanations of the errors.See [D'Angelo et al., 2005, Faymonville et al., 2016, Gorostiaga and Sánchez, 2018, Danielsson and Sánchez, 2019, Gorostiaga et al., 2020] for examples illustrating the expressivity of SRV languages.
Another important aspect of runtime verification is the operational execution of monitors: how to collect information and how to perform the monitoring task.We focus in this paper in online monitoring where the monitoring happens incrementally as the input trace is being observed.In [Bauer and Falcone, 2012,El-Hokayem and Falcone, 2017a,Danielsson and Sánchez, 2019] the authors consider a centralized specification which gets deployed as network of distributed monitors connected via a synchronous network, where the global synchronous clock is used both for communication and periodic sampling.Monitors exchange messages and cooperate to perform the global monitoring task.This problem is called decentralized monitoring (see [Francalanza et al., 2018]).We study here the timed asynchronous networks or communication together with periodic sampling of inputs, that is a synchronous computation over an asynchronous network.Our solution subsumes the previously available SRV solution for synchronous computation and a synchronous reliable network studied in [Danielsson and Sánchez, 2019].We call the more general problem studied in this paper the timed asynchronous decentralized monitoring problem.Our goal is to generate local monitors at each node that collaborate to monitor the specification, distributing the computational load while minimizing the network bandwidth and the latency of the computation of verdicts.Apart from more efficient evaluation, decentralized monitoring can provide fault-tolerance as the process can partially evaluate a specification using the information provided by the part of the network that does not fail.In the same spirit, if part of the network of cooperating monitors is clogged-in the sense that it is working slower for some reason-the other part can keep its normal throughput.Consider for example an if-then-else specification with a slow computation needed to obtain the value of both the then and the else parts.Consider a decentralized deployment with three monitors connected as a tree: the leaf monitors compute the then and the else parts, while the root monitor computes the specification using a Boolean input stream for if part.Assume that the condition is true 90% of the time, so most of the time the then value is used and the else value is discarded.Now, also consider that the network link between the root monitor and the leaf monitor that computes the else part is slow, the throughput of the root of the specification remains unaffected for that 90% of times and the result can be produced without waiting for the long computation and the network delay of the link that affect the else part.We plan to leverage the advantages of the decentralized systems to aggressively incorporate fault-tolerance in future work.
Our Solution.In this paper we provide a solution to the decentralized monitoring problem for Lola [D'Angelo et al., 2005] specifications for arbitrary network topologies and placement of the local monitors.
We study time asynchronous networks [Cristian and Fetzer, 1999], where nodes share a global clock (built upon bounding the network synchronicity delays and hardware clock drifts) but monitoring messages can take arbitrarily long.Time asynchronous networks [Cristian and Fetzer, 1999] "... allow practically needed services such as clock synchronization, membership, consensus, election, and atomic broadcast to be implemented".Synchronous networks are a special case where, additionally, messages take a known bounded time to arrive.We use the fact that a global clock is available to use a model of computation for monitoring that proceeds in rounds, where each round consists on input readings and process incoming messages, followed by an update the internal state of local monitors and finally producing output messages.This synchronous execution model is realistic in many scenarios, for example in smart buildings or smart cities-where clocks can be synchronized using a time network protocol-that is sufficiently precise for round cycles of tens of seconds.We also assume in this paper a reliable system: nodes do not crash, and messages are not lost or duplicated.In our solution, different parts of the specification (modeled as streams), including input readings, are deployed into different network nodes as a local monitor.Local monitors will communicate with other monitors when necessary to resolve the streams assigned to them, trying to minimize the communication overhead.Intuitively, data will be read from sensor monitors, and then each layer of intermediate monitors will compute sub-expressions and communicate partial results to remote monitors in charge of super-expressions, ultimately computing the stream of values of the root expression.A degenerated case of this setting is a centralized solution: nodes with mapped observations send their sensed values to a fixed central node that is responsible of computing the whole specification.
The SRV language that we consider is Lola [D'Angelo et al., 2005,Sánchez, 2018].We will identify those specifications and conditions on the network behavior that allow the monitoring to take place with bounded resources, independently of the trace length.

Motivating Example.
Example 1.We use as a running example a smart building with rooms equipped with sensors and a central node.The aim is to generate alarms when there is a fire risk.The following specification captures this risk by detecting acute uprisings in temperature and CO 2 in a certain room.We place the computations needed to decide whether the measured variables rise 'enough' to those nodes where the sensor readings take place.In this way, the central node only needs to compute which nodes present both the temperature and the CO 2 alarm.We omit the CO 2 computation for simplicity and readability (as it is an exact mirror of the temperature computation).The CO 2 values would be useful to assess the risk of fire at the 'Building' monitor.@Room1 { input num t_1 eval # tini_1 is a const ant # with meaningful bounds define num tlow = 1.6 * tini_1 define num thi = 2.0 * tini_1 define num t_spike_q_1 = if t_1 <= low then 0 else if t_1 > hi then 1 else ( t_1 -low )/( hi -low ) } @Room2 { input num t_2 eval define num t_low = 1.6 * tini_2 define num t_hi = 2.0 * tini_2 define num t_spike_q_2 = if t_2 <= t_low then 0 else if t_2 > t_hi then 1 else ( t_2 -t_low )/( t_hi -t_low ) } @Building { define bool fire_risk_q_1 = t_spike_q_1 > 0.5 define bool fire_risk_q_2 = t_spike_q_2 > 0.5 } Related work.The term decentralized monitoring is used in the survey [Francalanza et al., 2018] to distinguish the term from distributed monitoring where processes do not share a global clock.In distributed monitoring a complete asynchronous network is assumed, while typically decentralized monitoring assumes a completely synchronous network where all samples and communication occur in lockstep.In this paper we explore the middle ground: network nodes share a sufficiently synchronized global clock (like in synchronous distributed systems) but communication can take arbitrarily long (like in asynchronous distributed systems).Also, in [Francalanza et al., 2018] they present other concepts such as policy checking that are called decentralized monitoring that do not correspond to the monitoring presented in this paper, because they are concerned only about global safety properties that can be used for asynchronous networks with asynchronous computations.
In [Ganguly et al., 2021] they also study timed asynchronous networks of cooperating monitors but use an SMT-solver for simplifying LTL formulas.
Distributed stream processing has been largely studied.In [Quoc et al., 2017] they use the concept of streams in Complex Event Processing, where events may be structured datatypes and where computation may be complex in the sense that several operations are needed for each event, for example in sliding window operations to make aggregate calculations on the arriving events.The aim of [Quoc et al., 2017] is merging privacy and approximation techniques obtaining zero-knowledge privacy and low-latency and efficient analytics.In [Carbone et al., 2015] Apache Flink is introduced where stream dataflows processing is used to handle continuous streams and batch processing.Distributed and decentralized monitoring has been studied in the context of runtime verification.Sen et al. [Sen et al., 2004] introduces PT-DTL, a variant of LTL logic for monitoring distributed systems, but they consider a complete asynchronous distributed system and they are limited to Boolean verdicts.The work in [Francalanza et al., 2018] uses slices to support node crashes and message errors when monitoring distributed message passing systems with a global clock.Bauer et al. [Bauer et al., 2013] introduce a first-order temporal logic and trace-length independent spawning automaton, and in [Bauer and Falcone, 2012] show a decentralized solution to monitor LTL 3 in synchronous systems using formula rewriting.LTL 3 is a three-valued variant of LTL with a central value in the lattice that captures when an expression has an unknown value so far and we need to process more of the input trace to determine its truth value.This is improved in [El-Hokayem andFalcone, 2017a, El-Hokayem andFalcone, 2017b] using an Execution History Encoding (EHE).EHE is a datastructure that stores the partially evaluated expressions by different monitiors with their partial information that allow decentralized monitors to infer the state in which the monitoring automaton is in.In [El-Hokayem and Falcone, 2020] the authors extend the EHE with distributed and multi-threaded support along with guaranteeing the determinism of the datastructure by construction.Then they analyze the compatibility and monitorability of decentralized specifications using EHE.However, the verdicts and data are still Boolean and the network assumption is synchronizity.In [Jaber et al., 2020] global choreographies (as a kind of master-based protocol) are synthesized (including control flows, synchronization, notification, acknowledgment, computations embedding) to distributed systems.Also, they provide a transformation to Promela which allows to verify the implementation using LTL spec-ifications.Some schemes that they showcase are a variant producer-consumer or two-phase commit and apply it to building micro-services such as a buying system.This work focuses on synthesizing the flow of monitors, but again the observations and verdicts are Boolean.In [Kazemlou and Bonakdarpour, 2018] a synchronous network of LTL-monitors cooperate to achieve a verdict on the system under test while they may suffer crashes.In this scenario an SMT-based algorithm for synthesizing the automata for the LTL-monitors is presented that achieves fault tolerance providing soundness even though crashed monitors never recover.Even though this work considers failures (which is out of the scope of our paper) they assume synchronous communication.All these approaches consider only Boolean verdicts.In comparison, SRV can generate verdicts from arbitrary data domains.
All previous SRV efforts, from Lola [D'Angelo et al., 2005], Lola2.0 [Faymonville et al., 2016], Copilot [Pike et al., 2010, Pike et al., 2013, Perez et al., 2020] and extensions to timed event streams, like TeSSLa [Convent et al., 2018], RTLola [Faymonville et al., 2019] or Striver [Gorostiaga and Sánchez, 2018] assume a centralized monitoring setting.In [Gorostiaga et al., 2020] the relationship between time-based (soft real time) and event-based models of computation and their effects on SRV are explored, but again in the centralized setting.The work in [Basin et al., 2015] shows how monitoring Metric Temporal Logic specifications of distributed systems (including failures and message reordering) where the nodes communicate in a tree fashion and the root emits the final verdict.The work in [Danielsson and Sánchez, 2019] proposes a solution to the synchronous monitoring of SRV specifications but assuming a synchronous network.We extend [Danielsson and Sánchez, 2019] to timed asynchronous networks.
Contributions and structure.The main contribution of this paper is a solution, described in Section 3, to the timed asynchronous decentralized stream runtime verification problem.We provide a proof of correctness of the algorithms and show that our solution subsumes a synchronous decentralized problem without overhead.A second contribution, included in Section 3.6, is the description of those specifications and conditions on the network behavior that allow the monitoring to take place with bounded resources, independently of the trace length.Bounding resources is of the uttermost importance in cyber-physical systems where memory, bandwidth and even computing time are limited in order to react properly and timely to the changing environment.If a cyber-physical system is trace-length independent it can run indefinitely long even if the resources are physically constrained.A third contribution, detailed in Section 4, is a prototype implementation and an empirical evaluation.A fourth contribution, in Section 5, is a modified algorithm that allows nodes to save bandwidth by only communicating stream values when requested.Section 2 contains the preliminaries and Section 6 concludes.

Preliminaries. Stream Runtime Verification
We recall now SRV briefly.For a more detailed description see [D'Angelo et al., 2005] and the tutorial [Sánchez, 2018].The fundamental idea of SRV, pioneered by Lola [D'Angelo et al., 2005] is to describe monitors declaratively via a set of equations that describe the dependencies between output streams of values and input streams of values.We focus here on online monitoring.A monitor is generated from a specification, which at runtime computes a sequence of values for the output streams as soon as possible after observing each value from input streams.Input values are typically extracted from some sensor or read from a log file.
A Lola specification declares output streams in relation to the input streams, including both future and past temporal dependencies.The Lola language cleanly separates the temporal dependencies from the individual operations to be performed at each step, which leads to generalization of monitoring algorithms for logics to the computation of richer values such as numbers, strings or richer data-types.

Lola Syntax.
A Lola specification consist of declaring the relation between output streams and input streams of events.Stream expressions are terms built using a collection of (interpreted) constructor symbols.Symbols are interpreted in the sense that each constructor is not only used to build terms, but it is also associated with an evaluation function, that given values of arguments produces a value of the return type.Given a set Z of typed stream variables the set of stream expressions consists of (1) variables from Z, (2) offsets v [k, d] where v is a stream variable of type D, k is a natural number and d a value from D, and (3) terms f (t 1 , . . ., t n ) using constructor symbols f from the theories to previously defined terms.Stream variables represent sequences of values (streams) in the specification.The intended meaning of an offset expression v[−1, false] is the value of stream v in the previous position of the trace (or false if there is no such previous position, that is, at the beginning).We use Term D (Z) for the set of stream expressions of type D constructed from variables from Z (and drop Z if clear from the context).Given a term t, sub(t) represents the set of sub-terms of t.
Definition 1 (Specification).A Lola specification ϕ(I, O) consists of a set I = {r 1 , . . ., r m } of input stream variables, a set O = {s 1 , . . ., s n } of output stream variables, and a set of defining equations, s i = e i (r 1 , . . ., r m , s 1 , . . ., s n ) one per output variable s i ∈ O.The term e i is from Term D (I ∪ O), where D is the type of s i .
We will use r, r i . . . to refer to input stream variables; s, s i . . . to refer to output stream variables; and u, v for an arbitrary input or output stream variable.Given ϕ(I, O) we use appears(u) for the set of output streams that use u, that is {s i | u[−k, d] ∈ sub(e i ) or u ∈ sub(e i )}.Also, ground (t) indicates whether expression t is a ground expression (contains no variables or offsets) and therefore can be evaluated into a value using the interpretations of constants and function symbols.
Example 2. The property "sum the previous values in input stream y, but if the reset stream is true, reset the count", can be expressed as follows, where stream variable root uses the accumulator acc and the input reset to compute the desired sum.The stream acc is defined with the keyword define to emphasize that it is an intermediate stream.
input bool reset input num y define int acc = y + root [ -1|0] output int root = if reset then 0 else acc 2.2 Lola semantics.
We introduce now the formal semantics of Lola, that guarantee that there is a unique correct output stream for each input stream.This semantics allows to prove that an algorithm is correct by showing that the algorithm produces the desired output.At runtime, input stream variables are associated incrementally with input streams of values.
Given an input streams σ I (one sequence per input stream variable) and given an output candidate σ O (one sequence per output stream) the formal semantics captures whether the pair (σ I , σ O ) matches the specification, which we write (σ I , σ O ) ϕ.We use σ r for the stream in σ I corresponding to input variable r and σ r (k) for the value of stream σ at position k.For (σ I , σ O ) ϕ to hold, all streams must be sequences of the same length.
A valuation of a specification ϕ is a pair σ : (σ I , σ O ) that contains one stream (of values of the appropriate type) and of the same length for each input and output stream variable in ϕ.Given a term t, the evaluation t σ is a sequence of values of the type of t defined as follows: trace, and the defalult value c otherwise.Formally: and c otherwise.
A valuation (σ I , σ O ) satisfies a Lola specification ϕ whenever for every output variable s i , s i (σ I ,σ O ) = e i (σ I ,σ O ) .In this case we say that σ is an evaluation model of ϕ and write (σ I , σ O ) ϕ.
The intention of a specification ϕ is to describe a unique output from a given input, which is guaranteed if ϕ has no cycles in the following sense.A dependency graph D ϕ of a specification ϕ(I ∪ O) is a weighted multi-graph (V, E) whose vertices are the stream variables V = I ∪ O, and where E contains a directed weighted edge u w − → v whenever v[w, d] is a sub-term in the defining equation of u.A specification ϕ is well-formed if D ϕ contains no zero-weight cycles, which guarantees that no stream depends on itself at the current position.
Considering example 2. Its dependency graph is: Given a stream variable u and position i ≥ 0 an instant stream variable (or simply instant variable) is defined as the pair u i , which is a fresh variable of the same type as u.Note there is one different instant variable u i for each instant i.  Considering example 2, acc 4 points to root 3 in all evaluation graphs with M ≥ 4. We denote by e s k the term (whose leafs are instant variables) that results from e s at k, by replacing the offset terms with the corresponding instant variables corrected with the appropriated shift.Consider again Example 2. The instant stream expression e acc for acc at instant 4 is acc 4 = y 4 + root 3 .
Nodes of the dependency graph form a DAG of Maximal Strongly Connected Components (MSCCs).Note also that specifications whose dependency graph has no positive cycles are called efficiently monitorable specifications [D'Angelo et al., 2005].There are no cycles in the evaluation graph of an efficiently monitorable specification, which enables us to reason by induction on evaluation graphs, as we will do later.Note that these specifications can have positive edges (corresponding to future dependencies) as long as they do not form a positive cycle.As it can be shown [Sánchez, 2018] these specifications can be evaluated online (incrementally) with finite memory with a central monitor.
Example 3. The following code snippet shows a non-efficiently monitorable, an efficiently monitorable specification and a very efficiently monitorable.The first snippet is a non-efficiently monitorable specification because the stream b depends on itself in the future, in the Evaluation Graph (EG) all instant variables will depend on the next instant unboundedly to the future.This will make all instant streams b to never be resolved in an infinite trace.
Next specification is an efficiently monitorable specification because there are only bounded references to the future: each instant variable b only depends on a two positions ahead, so for every instant variable b k it will be resolved at This is a very efficiently monitorable specification because there are no reference to the future, all offsets are either negative or zero.

Decentralized Synchronous Online Monitoring
An online decentralized algorithm to monitor Lola specifications in a synchronous networks is presented in [Danielsson and Sánchez, 2019].The main idea is to use a network of cooperating nodes to monitor a Lola specification, sliced according to its syntax tree and then each subexpression, including inputs, is mapped to a node.This requires monitors to share their partial results (of the subexpressions) via messages.At each time instant the algorithm will read inputs, update internal expressions and communicate results with the appropiate nodes so that the specification ends being computed by means of those partial results.Therefore, given a well-formed Lola specification, the decentralized online algorithm presented in [Danielsson and Sánchez, 2019] incrementally computes the value for each output instant variable assuming a synchronous network where messages are not lost or duplicated.The algorithms presented here extend [Danielsson and Sánchez, 2019] to the more general setting of timed asynchronous networks.

Decentralized Stream Runtime Verification for Timed Asynchronous Networks
In this section we describe our solution to the decentralized SRV problem for Timed Asynchronous Networks.The algorithm that we present below will com-pute the unique values of the output instant variables based on the values of the input readings.We prove the termination of the algorithm in theorem 1 and its correctness in theorem 2, verifying that the operational semantics are equivalent to the denotational.We require a well-formed Lola specification, and a mapping between streams and the network nodes where they are computed.
Each network node will host a local monitor that is responsible for computing some of the streams of the specification.We denote µ(s) for stream variable s is the network node whose local monitor is responsible for resolving the values of stream s.Local monitors exchange messages containing partial results whenever needed in order to compute the global monitoring task.However, our decentralized algorithm may compute some output values at different time instants than a centralized version, due to the different location of the inputs and the delays caused by the communication.We study this effect both theoretically in Section 3.6, and empirically in Section 4. A centralized monitor corresponds with the operational semantics in [D'Angelo et al., 2005, Sánchez, 2018] which is equivalent with a network mapping that assigns all input and output streams to a single node and therefore avoids communication.

Problem Description
Network.We assume a network with a set of nodes N , such that every node can communicate with every other network node by sending messages.We assume reliable unicast communication (no message loss or duplication) over a timed asynchronous network, so a given message can take an arbitrary amount of time to arrive.Since network nodes share a global clock, the computation proceeds in cycles.In every cycle, all nodes in the network execute-in parallel and to completion-the following actions: (1) read input messages, (2) perform a terminating local computation, (3) generate output messages.We use the following type of message: (s k , c, n s , n d ) where s k is an instant variable, c is a value of the type of s, n s is the source node and n d is the destination node.We use the following abbreviations msg.src = n s , msg.dst = n d , msg.stream = s k and msg.val = c.These messages are used to inform of the actual values read or computed.

Stream Assignment and Communication Strategy
Given a specification ϕ(I, O) and a network with nodes N , a stream assignment is a map µ : I ∪O → N that assigns a network node to each stream variable.The node µ(r) for an input stream variable r is the location in the network where r is sensed in every clock tick.At runtime, at every instant k a new input value for r k is read.On the other hand, the node µ(s) for an output stream variable s is the location whose local monitor is responsible for resolving the values of s.
An instant value v k is automatically communicated to all potentially interested nodes whenever the value of v k is resolved.Let v and u be two stream variables such that v appears in the equation of u and let n v = µ(v) and n u = µ(u).Then, n v informs n u of every value v k = c that n v resolves by sending a message (v k , c, n v , n u ).We are finally ready to define the decentralized SRV problem.Definition 2. A decentralized SRV problem ϕ, N, µ is characterized by a specification ϕ, a network with notes N and a stream assignment µ for every stream variable.
We use DSRV for decentralized SRV problem.Solving a DSRV instance consists of computing the values of instant variables corresponding to the output streams based on the values of the instant variables of the input streams, by means of a network of interconnected nodes that host local monitors.

Model of Communication
We now describe in detail the timed asynchronous model of computation that we assume.Every message inserted in the network arrive at its destination according to the following conditions: -Always later : every message m inserted at t will arrive at t with t > t; -Arbitrary delay: there is no a-priori bound on the amount of time that any message will take to arrive.-FIFO between each pair of nodes: let m 1 and m 2 be two messages with the same origin and destination, m 1 .src= m 2 .srcand m 1 .dst= m 2 .dst.Let m 1 is inserted at t 1 and arrive at t 1 and let m 2 be inserted at t 2 and arrive at The synchronous model is a particular case of the timed asynchronous in which all messages inserted in the network will always take the same amount of time between each pair of network nodes.In this case the delay will always be a constant.Formally, to analyze the behavior of our algorithms we model the message delays as a family of functions arr u→v (one for each pair of nodes (u, v), which provides at every moment t the instant t at which a message sent at t from u will arrive at v.

DSRV for Timed Asynchronous Networks: monitor and algorithm
Our solution consists of a collection of local monitors, one for each network node n.A local monitor Q n , U n , R n for n maintains an input queue Q n and two storages: -Resolved storage R n , where n stores resolved instant variables (v k , c).
-Unresolved storage U n , where n stores unresolved equations v k = e where e is not a value, but an expression that contains other instant variables.
When n receives a message from a remote node, the information is added to R n , so future local requests for the information can be resolved locally and immediately.At the beginning of the cycle of computation at instant k, node n reads the values for input streams assigned to using local sensors and instantiates for k all output stream variables that n is responsible for.After that, the equations obtained are simplified using the knowledge acquired so far by n, which is stored in R n .Finally, new messages are generated and inserted in the queues of the corresponding neighbors.More concretely, every node n will execute the procedure Monitor shown in Algorithm 1, which invokes Step in every clock tick.The procedure Finalize is used to resolve the pending values at the end of the trace to their default.Note that this procedure is never invoked if the monitor trace never terminates (the monitor will be continuously observing and producing outputs).The procedure Step executes the following steps: 1. Process Messages: Lines 7 invokes ProcessMessages procedure in lines 23-25 that deals with the processing of incoming response arrivals, adding them to R n 2. Read Inputs and Instantiate Outputs: Line 8 reads new inputs for current time k, and line 9 instantiates the equation of every output stream that n is responsible for.3. Evaluate: Line 10 invokes the procedure Evaluate, in lines 14 − 22 which evaluates the unresolved equations.4. Send Responses: Line 12 invokes SendResponses, in lines 26-28, sending messages for all newly resolved variables.5. Prune: Line 29-31 prunes the set R from information that is no longer needed.See section 3.6.

Formal Correctness
We now show that our solution is correct by proving that the output computed is the same as in the denotational semantics, and that every output is eventually computed.
Theorem 1.All of the following hold for every instant variable u k : (1) The value of u k is eventually resolved.
(2) The value of u k is c if and only if (u k , c) ∈ R at some instant.
(3) A response message for u k is eventually sent to all interested network nodes (all nodes responsible for streams v where u ∈ appears(v)).
Proof.The proof proceeds by induction on the evaluation graph, showing simultaneously in the induction step (1)-(3) as these depend on each other in the previous inductive steps.Let M be a length of a computation (which can be ω) and σ I be an input of length M .Note that (1) to (3) above are all statements about instant variables u k , which are the nodes of the evaluation graph G ϕ,M .We proceed by induction on G ϕ,M (which is acyclic because D ϕ is well-formed, by assumption).
-Base case: The base case are vertices of the evaluation graph that have no outgoing edges, which are either • instant variables that correspond to inputs read from local sensors or • to defined variables whose instant equation does not contain other instant variables; This is the case when either the equation is a constant or the time instant is such that the resulting offset falls off the trace; the default value is used.Statement (1) follows immediately for inputs because at instant k, u k is read at node µ(u).For output equations that do not have variables, or whose variables have offsets that once instantiated become negative or greater than M , the value of its leafs is determined either immediately or at M when the offset is calculated.At this point, the value computed is inserted in R, so (2) also holds at µ(u).Note that (2) also holds for other nodes because the response message contains u k = c if and only if (u k , c) ∈ R n , where µ(u) = n.Then the response message is inserted exactly at the point it is resolved, so (1) implies (3).
-Inductive case: Consider an arbitrary u k in the evaluation graph G ϕ,M and let u 1 k 1 , . . ., u l k l be the instant variables that u k depends on.These are nodes in G ϕ,M that are lower than u k so the inductive hypothesis applies, and (1)-( 3) hold for these.Let n = µ(u).At instant k, u k is instantiated and inserted in U n .The values of instant variables are calculated and sent as well (by ( 1) and ( 3)).At the latest time of arrival, the equation for u k has no more variables and it is evaluated to a value, so (1) holds and (2) holds at n.At this point, the response message is sent (so (1) holds for u k ) and so (1) also holds.
This finishes the proof.
Theorem 1 implies that every value of every defined stream at every point is eventually resolved by our network of cooperating monitors.Therefore, given input streams σ I , the algorithm computes (by ( 2)) the unique output streams σ i one for each s i .The element σ i (k) is the value resolved for s i k by the local monitor for µ(s i ).The following theorem captures that Algorithm 1 computes the right values (according to the denotational semantics of Lola), Theorem 1 that all values are eventually computed.
We use out(σ I ) as the function from input streams to output streams that the cooperating monitors compute.We use [s] for the stream of values corresponding to stream variable s in out(σ I ).We now show that the sequence of values computed corresponds to the semantics of the specification.
Theorem 2. Let ϕ be a specification, S = ϕ, T , µ be a decentralized SRV problem, and σ I an input stream of values.Then (σ I , out(σ I )) ϕ.
Proof.Let σ O be the unique evaluation model such that (σ I , σ O ) ϕ (we use σ O (s) for the output stream for stream variable s and σ O (s)(k) for its value in the k-th position).We need to show that for every s and k, We again proceed by induction on the evaluation graph G ϕ,M .
-Base case: For inputs the value follows immediately.The other basic case corresponds to output variables s at instants at which these that do not depend on other variables (because all occurrences of offsets, if any, fall off the trace).The evaluation of the value is performed by network node µ(s), and it satisfies the equation e s of s, not depending on any value of any other stream.Therefore, it satisfies that [s](k) = e s [k] = σ O (s)(k), as desired -Inductive case: Let s be an arbitrary stream variable and k an arbitrary instant within 0 and M − 1 and assume that all instant variables u k that s k can reach in the evaluation graph satisfy the inductive hypothesis.
Let n be the node in charge of computing s.By Theorem 1, all the values are eventually received by n and in R n , and by IH, these values are the same as in the denotational semantics, that is . The evaluation of s k corresponds to computing e s , which uses the semantics of the expression (according to Section 2).A simple structural induction on the expression e s shows that the result of the evaluation, that is the value assigned to s k , is e s σ (k) = σ O (s)(k), as desired.
This finishes the proof.

Simplifiers
The evaluation of expressions in Algorithm 1 assumes that all instant variables in an expression e are known (i.e., e is ground), so the interpreted functions in the data theory can evaluate e.Sometimes, expressions can be partially evaluated (or even the value fully determined) knowing only some but not all of the instant variables involved in the expression.As simplifier is a function f : Term D → Term D such that (1) the variables in f (t) are a subset of the variables in t), and (2) every substitution of values for the variables of t produces the same value as the substitution of f (t).For example, the following are typical simplifiers: In practice, simplifiers can dramatically affect the performance in terms of the instant at which an instant variable is resolved and, in the case of decentralized monitoring, the delays and number of messages exchanged.Essentially, a simplifier is a function from terms to terms such that, for every possible valuation of the variables in the original term it does not change the final value obtained.It is easy to see that for every term t obtained by instantiating a defining equation and for every simplifier f , , because the values of the variables in t and in f (t) are filled with the same values (taken from σ I and σ O ).
Consider arbitrary simplifiers simp used in line 19 of Algorithm 1 to simplify expressions.Let U n be the unresolved storage for node n and let u k be an instant variable with µ(u) = n.By Algorithm 1 the sequence of terms (u k , t 0 ), (u k , t 1 ), . . .(u k , t k ) that U n will store are such that each t i will have the simplifier applied.It follows that the value computed using simplifiers is the same as without simplifiers.It is also easy to show that the algorithm using simplifiers obtains the value of every instant variable no later than the algorithm that uses no simplifier.This is because in the worst case every instant variable is resolved when all the instant variables it depends on are known, and all response messages are sent at the moment they are resolved.

Theoretical Resource Utilization
The aim of this section is to define conditions under which local monitors only need bounded memory to compute every output value.The first thing to consider is that the specification must be decentralized efficiently monitorable [Danielsson and Sánchez, 2019], which essentially states that every strongly connected component in G ϕ must be mapped to the same network node.That is, if u appears, transitively, in the declaration of v and v appears in the declaration of u (with some offsets), then µ(u) = µ(v).
In order to guarantee that a given storage in a local monitor for node n is bounded, we must provide an upper-bound for how long it takes to resolve an instant variable for a stream that is assigned to n.We use Time to Resolve (TTR) to refer to the ammount of time that a given instant variable u k takes to get resolved.This is the number of time instants between the instantiation of the variable at time k and the instant at which it gets resolved, leaving U n and being stored in R n .This happens in line 21 in Algorithm 1.

General Equations for the Time to Resolve
We introduce now a general definition of recursive equations that capture when an instant variable s k is resolved.In order to bound the memory used by the monitor at network node n, we need to bound storages U n and R n : -Bound on R n : Resolved values that are needed remotely are sent immediately to the remote nodes, so R n only contains resolved values that are needed in the future locally at n. Since efficiently monitorable specifications only contain (future) bounded paths there is a maximum future reference b used in the specification.This upper-bound limits for how long a resolved value v k can remain in R n , because after at most b steps the instant variables u k that need the value of v k stored in R n will be instantiated (note That is u k is not needed after t = max(k + b, k + T T R(u k )).At t, the value of u k can be removed from R n .This guarantees that the size of R n is always upped-bounded by a constant in every node n. -Bound on U n : The size of the memory required for storage U n at the node n responsible to resolve s (that is n = µ(s)) is proportional to the number of instantiated but unresolved instant variables.Therefore, to bound U n we need to compute the bound on the time it takes to resolve instant variables of streams assigned to n.
The general equations that we present below depend on the delay of messages in the network.We will later instantiate these general equations for the following cases of network behavior: a synchronous network; a timed-asynchronous network with an upper-bound on message delays for the whole trace (we call this the aeternal case); timed-asynchronous network with an upper-bound for message delays in a given time-horizon (we call this the temporary case).
Note that the correctness of the algorithm (Theorem 2) establishes that the output streams σ O only depend on the input streams σ I but does not state bounds on the time at which each element of σ O is resolved or on the delays of messages.
In this section we study how the delay of messages affects the time at which instant variables are resolved, which in turn affects the memory usage at the computations nodes.We use d(t, a, b) for the time it takes for a message sent from a to b at time t to arrive.In other words arr a→b (t) = t + d(t, a, b).Recall that we assume that messages are causal and queues are FIFO as we described in 3.2.Causality means that messages arrive after they are sent (that is, for every n, m and t, arr a→b (t) > t) and FIFO that for every n and m, if t < t then arr a→b (t) ≤ arr a→b (t ).
We now capture the Moment to Resolve for a given instant variable s t , represented as MTR(s t ), which captures the instant of time at which s t is guaranteed to be resolved by the monitor at network node µ(n) responsible to compute s.Our definition considers two components, the delay in resolving all local instant variables that s t may depend on and the resolution of remote instant variables, which also involve message delays.We use the concept of remote moment to resolve, denoted MTR rem (s t ), as the instant at which all remote values that s t directly require have arrived (which is t if all values arrive before t).Note that this is well-defined for every well-formed specification because the evaluation graph is acyclic, and the equation for s t only depends on those variables lower in the evaluation graph, which is acyclic.Example 4. Consider example 2 with streams i and acc at network node 1 and streams reset and root computed at network node 2.Then, we can substitute in the equations to obtain the MTR(root 1 ).

MTR(root
The instant variable root 1 is guaranteed to be resolved when the response from the instant variable acc 1 arrives-that is the max(1, arr acc→root (...)) part.And this response can only be produced when the response for acc 0 is arrives, which is the innermost part: ... max(1, arr acc→root (0)) Note that we do not need to account for MTR rem (root −1 ) since it is resolved instantaneously to its default value.Likewise, the inputs are also resolved instantaneously and do not add any delay when obtaining the value of the MTR.
Note that for MTR rem (s t ) only consider the those remote instant variables for which t + w ≥ 0 because otherwise the default value will be used at the moment of instantiating s t .In the equation for MTR(s t ) we assume the base case MTR(s t ) = 0 when t < 0, because again, the default value in the offset expression is used instead, which is known immediately.It is easy to see that the first equation is equivalent to: We are now ready to prove that these definitions indeed capture the time at which s t is resolved.
Theorem 3. Let ϕ be a specification and µ a network placement, let σ I be the input trace and arr a network behavior.Every s t is resolved at MTR(s t ) or before.
Proof.The proof proceeds by induction on the evaluation graph G ϕ,M induced by ϕ and the length of σ I .
-Base case: inputs and instant variables s t that do not depend on any other instant variables.These are the nodes of EG that do not have any outgoing edge.Since s t is instantiated at t, then the value is resolved exactly at t either by reading a sensor or instancing to a default value.Also, MTR(s t ) = MTR rem (s t ) = t.-General case.Let s t be an arbitrary instant variable and assume, by inductive hypothesis, that the theorem holds for all instant variables lower in the EG than s t .At time MTR rem (s t ) all instant variables r t + w from remote nodes that s t depends on have arrived because r t+w will be resolved at MTR(r t ) by induction hypothesis.Similarly, all local elements that s t depends on are also below in the dependency graph, so the induction hypothesis also applies.Therefore, at time or before all elements that s t depends on will be known and s t will be resolved.
This finishes the proof.
The following corollary follows from the fact that nothing that happens after an instant variable has been resolved (either further values in σ I or the network behavior) can affect the value computed.Therefore, the value and time at which s t is computed does not depend on the future after MTR(s t ).
Corollary 1.For all s t there is a t such that s t only depends on σ I and arr up to t .
The MTR for an instant variable depends on the delay of the network arr → between the network nodes that cooperate in order to compute that instant variable.Therefore we cannot guarantee a bound on MTR if those delays can be arbitrarily long, so we cannot bound the memory usage.Consequently, monitoring is not trace-length independent in a general Time Asynchronous Network.
Next, we study how different conditions on the network behavior (concerning the delays between links) affect the MTR establishing memory bounds and regain trace-length independent monitoring under those conditions.

Instantiation to Synchronous Time
We assume first the synchronous model of computation, which is a particular case of the timed-asynchronous model where all message delays between two monitors take exactly the same amount of time throughout the trace.We use the predicate dist r s to represent the delay that every message will take from µ(r) to µ(s), independently of the time instant at which the message is sent.Therefore arr r→s (t) = t + dist r s .This delay allows us to simplify MTR rem for synchronous networks as follows: Recall that the time to resolve is the time interval between the moment at which a variable is instantiated and the instant at which it is resolved (that is TTR(s t ) = MTR(s t ) − t) In the synchronous case we obtain: Note that the value that determines the result is the TTR sync (s t ) of the slowest remote dependency, which includes both the resolve time and the time the message needs to traverse through the network.Additionally, we can easily show by induction on the dependency graph that for every stream variable s there is a constant k such that TTR sync (s t ) ≤ k, that is, s always takes less than k instants to be resolved.It follows that all decentralized efficiently monitorable specifications can be monitored in constant space in every local monitor, that is, synchronous decentralized monitoring of decentralized efficient monitorable specifications is trace length independent.
Timed Asynchronous with AETERNALLY Bounded delays We now assume that there is a global upper bound on the delay time for every message, which we call aeternally bounded delays.Formally, this assumption states that if there is a d such that for every pair of streams r, s and for every time t, arr r→s (t) ≤ t + d.Substituting the upper-bound value d in the equations for MTR, we obtain an constant upper-bound on the MTR: where Note that in some cases s t can be resolved before MTR g (s t ) because d is an upper bound.In this case we can also bound the memory necessary to store in every node to perform the monitoring process, but most of the time less memory will be necessary.We can see an example of a aeternal bound in Figure 3.
Timed Asynchronous with TEMPORARILY Bounded delays We now take a closer look at the equations to obtain a better bound on the time to resolve a given instant variable s t , without assuming an upper-bound of all messages in the history of the computation, but only the necessary messages that can influence s t .The main idea to bound MTR(s t ) is to consider the time interval at which the messages that are relevant to compute s t are sent.We first define an auxiliary notion.We say that a stream variable r is a direct remote influence on s with delay w, and we write s no two nodes s i and s j are repeated (if = j then s i = s j ), and w = w 1 + . . .+ w k + w k+1 .Note that s w −−−→ drem r means that s t may be influenced by remote variable r t + w .We define the window of interest for s t as: win(s t ) = [min S, max S] where S is defined as Note that S is the set of instants at which remote instant variables that influence s t are sent.
Example 5. Considering the specification in example 2 and by taking a look at the evaluation graph in Figure 2 we observe that the window of interest of the any instant variable at any time includes those of its dependencies in the evaluation graph.Therefore, their window of interest will include the minimum time for the earliest dependency to be resolved and the maximum time for the last dependency to be resolved.In this example, the window for root 1 will include the windows for acc 1 , root 0 and acc 0 and the time required for the response messages to travel from source to destination.Note that inputs do not affect the MTR.Therefore win(s t ) contains those instants at which the remote information relevant to s t is sent.This window always ends at most at MTR(s t ).We then define the worst message sent to s for the computation of s t as: Note that d worst is still an over-approximation of the messages sent in order to compute s t but in this case the bound considers all those messages and only looks at a bounded interval of time.Since all the values that influence s t are sent within win(s t ) we can bound MTR(s t ) as follows: where We have finally arrived at the desired outcome: a finite window of time that contains the sending and receiving of the relevant messages for the computation of the instant variable.This implies that only a finite number of network delays affect the resolution of any instant variable s t .As we can always find the maximum delay in the window, we can upper bound the time that it will take for any instant variable to be resolved, and we are able to know how much time these instant variables are stored in U n and R n .In turn, this allows to determine when certain instant variables are no longer needed and when they can be pruned releasing the used memory.
Figure 3 shows the peak network behavior and how the TTR adapts accordingly.We can observe the difference between the temporary and aeternal bounds, where the aeternal bound is high and constant throughout the execution and the temporary drops when the network has small delays.
Pruning the Resolved Storage R n .We are finally ready to prune R n because we know now when every instant variable will be resolved.
Corollary 2. Every unresolved instant variable s t in U n is resolved at most at MTR(s t ).
As soon as MTR(s t is reached (or before), the value of s t will be known in the local monitor of µ(s) and its value will be sent to those remote monitors where it is needed.After this moment s t can be pruned from U n .With this mechanism, we can assure that every instant variable will be in memory (U n or R n ) for a bounded amount of time.Corollary 2 implies that decentralized efficiently monitorable specifications in timed asynchronous networks can be monitored with bounded resources when there is a certain bound on the network behavior, be it synchronous, aeternal or a temporary bound.This memory bound depends only linearly on the size of the specification and the delays between the nodes of the network.This results can be interpreted from the opposite perspective: given a fixed amount of memory available, we could calculate the maximum delays in the network that would allow the monitoring to be performed correctly.
We have implemented our solution in a prototype tool tadLola, written in the Go programming language (available at http://github.com/imdea-software/ dLola).We describe now: -(1) an empirical study of the capabilities of tadLola in different scenarios with real data extracted from four different realistic public datasets.-(2) the effect of the network behavior-in terms of delays-into memory and time to resolve outputs.Our experimental setup intends to empirically determine the behavior of the asynchronous network and how failures affect the time to resolve of the streams.

Datasets and Network Failures.
We have used four different datasets for this empirical evaluation, namely: Smart-Politech [Pajuelo-Holguera et al., 2020], Tomsk Heating [Zorin and Stukach, 2020], Orange4Home [Cumin et al., 2017] and Context [Kaupp et al., 2021].All datasets are related to smart buildings except for Context that is about Industry 4.0.The first two are concerned about building climate control and use sensors in different rooms or buildings respectively.Orange4Home dataset focus on activity recognition where a tenant can move freely in an apartment, and the goal is to infer the activity performed.Lastly, Context is a dataset in a smart factory where a new class of failures, namely contextual failures arise when there is no specific sensor or data collected that signals directly the error but the presence  of the error and its underlying cause need to be inferred from contextual knowledge.For each dataset we created a synthetic specification that could showcase the functionality of our tool.We also injected synthetic delays to model network congestion and failures.
constant behavior is modeled as a global constant delay between each pair of monitors, so every message takes exactly a fixed amount of time to reach the destination network node.This corresponds to the network behavior observable in synchronous monitoring.-constantPeak consists of a constant delay with a single high delay of the network modeling a network failure and recovery, so all messages get delayed until the problem is solved and then the network starts to recover gradually, until normal operation is reached again.-Normal behavior follows a normal distribution of the delays given an average delay.
-normalPeak is similar to the constantPeak but with a baseline of the normal behavior.
Note that all these behaviors are both aeternal and temporary bounded since for all of them we can find an upper bound for the whole trace as well as a bound by window of interest of each instant variable.
Figure 4 shows the minimum, median and maximum TTR to resolve streams under these network behaviors.We can observe an example of the delays observed under these behaviors in Figure 5.The system under observation is sampled periodicly, obtaining the input traces for each of the variables measured.Thus, having the length of the trace and the sampling period we can obtain the system time that gets monitored throughout the experiment.For example, a trace of length 200k with a sample period of 30 seconds, corresponds to monitoring a system during ≈ 2.31 months.For some of the experiments the traces of real data available in the datasets were not sufficiently long, so we extended those traces by repeating the samples as much as needed to reach the desired trace length.Also, some of those traces required interpolation in order to use a common clock tick for all events, since some of those traces were based on events instead of sensing periodically a variable.We did this interpolation whenever needed.

Hypothesis
For the empirical evaluation of this paper we intend to evaluate the following hypothesis: -(H1) Our time Asynchronous algorithm behaves no worse than the synchronous algorithm from [Danielsson and Sánchez, 2019] when the network presents a synchronous behavior.-(H2) Synchronous SRV can simulate the monitoring of a time asynchronous network with a software layer that provides the illusion of synchronicity, but at a very high cost in delays and memory usage.We expect that memory will increase linearly with "network usage" but will remain constant when increasing the number of local monitors.Here we refer with local monitor to a non-empty set of streams that are computed at the same network node.-(H6) We can benefit from using redundant specifications and redundant topologies (exploiting simplifiers) to reduce TTRs by avoiding delays of slow or faulty links.

Empirical Results
In order to validate hypothesis (H1) we built the following experiments: -SmartPolitechDistr: we detect fire hazards by analysing the levels of temperature, CO 2 and humidity in the air in different rooms in university buildings.We use a quantitative robust specification.-tomskHeating: we check that the heating system is behaving as expected (extracted from the data).Again, this is a quantitative robust specification.-orange4Home: we detect fire hazards by analysing the activities performed by the tenant in the apartment.
-contextAct: we detect fire hazards by analysing the levels of temperature, CO 2 and humidity in the air in different rooms in an smart apartment.This is also a quantitative specification.
Figure 4 shows metrics of the delay of the root of the specification for the different datasets analyzed with different network behaviors.This proves empirically that TADSRV subsumes DSRV with no additional loss of performance, as expected by our theoretical proofs.Therefore (H1) holds.All these different network behaviors show that TADSRV is more general than DSRV, as we expected.
For the validation of (H2) we built an experiment with the specification of obtaining both the maximum and sum of the inputs.We placed this in the topology shown in Figure 6.We looked for the maximum delay present in the normalPeak traces that we have and used that duration as the global delay between each pair of monitors in the synchronous scenario.We measured both settings: simulating synchronicity and the execution of the timed asynchronous algorithm.The results are shown in Figure 7.The figure shows that we can emulate TADSRV with DSRV but with a high cost in memory usage (+200% than the worst instant) and incurring in delays of worst delay * depth of topology, which in this case is 558 instants.This corresponds to an increase of around 30 times the delay of the timed asynchronous.Therefore, (H2) holds as well.This results makes it clear that it is not feasible in practice to use DSRV in a time asynchronous scenario (even with the layer that simulates synchrony), where the contribution of this work applies naturally with much better performance.
Also, we can see that the TTRs obtained empirically are below or equal to our estimated bounds calculated a-priori with the equations described in Section 3.6.Hence, (H3) holds.
For the validation of (H4)-studying the scalability in terms of trace lengthwe used the smartPolitechDistr dataset and run it with a trace of 200k instants with the normalPeak behavior.In the extract shown below we compute both a Boolean and a quantitative stream to look for temperature uprisings.Figure 8 shows that the memory used in the root monitor of this experiment remains bounded.The pikes in memory correspond to higher delays in the network links among nodes.This forces monitors to keep records in their memory until the messages that they need arrive, allowing the monitor to resolve streams and prune their memories.This result suggests that the algorithm with a decentralized efficiently monitorable specification can behave in a trace-length independent fashion, validating hypothesis (H4).
Figure 9 shows that the memory usage of a single monitor does not depend on the number of other monitors in the network but it depends on the maximum depth of its specification that travels the network.In this experiment the depth of the specification deployed in the network was kept constant (5) while we changed the number of monitors in a binary tree topology (preserving the depth in one branch).The intuition is that the variable that affects memory usage is not how many monitors we have but the number of network nodes and links among them that affect the monitoring performance.This is because the more links, the higher the probability that a failure in the network (modelled as a delay) affects the run.These results prove that hypothesis (H5) holds.
Redundancy and Delays.In this subsection we take a closer look at hypothesis (H6), so we build the topology and the specification to minimize the TTRs of the instant variables.We seek to benefit from using simplifiers to minimize the effect network delays of messages required to compute the instant variables.Thus, we intend to exploit the messages that go through the fastest path in the network from the nodes that read the inputs to the nodes that compute the root of the specification.Intermediate results are generated faster in the least congested deployment and messages will travel through the least weight path (in terms of accumulated delays) between the inputs and the root of the specification yielding a minimum TTR for the instant variables.This improvement can be achieved because intermediate results from slower monitors will not be needed due to the use of simplifiers, and therefore the engine will not wait to achieved a final result of the root monitor.We build the following fragment of the specification for the data in smartPolitech, where we make the streams C3_fire_risk_q and C3_fire_risk_q_red redundant of each other and we deploy them in different monitors so that they are affected by different delays.We use a normal delay for the whole network but introduce a failure in the form of a peak in the delays between the monitors connected to monitor 3.This will make the path through monitor 2 faster.We can observe in Figure 10 how the delay of obtaining the value for the root of the specification takes the best delay possible.Since we use an OR to take advantage of the symplifiers, in the best case verdict (outcome true) there is a gain, but in the worst case verdict (false) the redundant solution gains no speed as the engine needs to wait for all the values to calculate the OR.@0 { define bool C3_alarm eval = ( C3_fire_risk or C3_fire_risk_red ) and ( C3_fire_risk_q > 0.5 or C3_fire_risk_q_red > 0.5) } @2 { define bool C3_fire_risk_red eval = AND ( C3_temp_spike , C3_co2_spike , C3_humid_down ) define num C3_fire_risk_q_red eval = AVG ( C3_temp_spike_q , C3_co2_spike_q , C3_humid_down_q ) } @3 { define bool C3_fire_risk eval = AND ( C3_temp_spike , C3_co2_spike , C3_humid_down ) define num C3_fire_risk_q eval = AVG ( C3_temp_spike_q , C3_co2_spike_q , C3_humid_down_q ) } Figure 10 shows the difference between using the redundant specification with redundant topology and not using any redundancy.Even though a general study of exploiting redundant paths in the network is out of the scope of this paper, this case study illustrates how redundant deployments can improve decentralized monitoring.

Lazy Algorithm
We introduce now a variant of Algorithm 2 where some of the streams are not sent unless their values are requested.This is beneficial in cases where their value is rarely needed.We call these lazy streams.
To introduce the modified algorithm we need to introduce a new type of message: the request message.We also call a response message to the messages containing the value of an instant variable.
-Response messages: (resp, s k , c, n s , n d ) where s k is an instant variable, c is a constant of the same datatype as s k , n s is the source node and n d is the destination node of the message.-Requests messages: (req, s k , n s , n d ) where s k is an instant variable, n s is the source node and n d is the destination node of the message.
Again, if msg = (req, s k , n s , n d ), then msg.src = n s , msg.dst = n d , msg.type = req, msg.stream = s k .Similarly, for a response message we have the same, the only difference is that we add msg.val = c.Each stream variable v can be assigned one of the following two communication strategies to denote whether an instant value v k is automatically communicated to all potentially interested nodes, or whether its value is provided upon request only.Let v and u be two stream variables such that v appears in the equation of u and let n v = µ(v) and n u = µ(u).
-Eager communication: the node n v informs n u of every value v k = c that it resolves by sending a message (resp, v k , c, n v , n u ).This is what we have used previously in the paper.-Lazy communication: node n u requests n v the value of v k (in case n u needs it to resolve u k for some k ) by sending a message (req, v k , n u , n v ).
When n u receives this message and resolves v k to a value c, n u will respond with (resp, v k , c, n v , n u ).
Each stream variable can be independently declared as eager or lazy.We use two predicates eager (u) and lazy(u) (which is defined as ¬eager (u)) to indicate the communication strategy of stream variable u.Note that the lazy strategy involves two messages and the eager strategy only one, but eager sends every instant variable resolved, while lazy will only sends those that are requested.In case the values are almost always needed, eager is preferable while if values are less frequently required lazy is preferred.We now need to add the communication strategy to the definition of the decentralized SRV problem.A decentralized SRV problem ϕ, T , µ, eager is now characterized by a specification ϕ, a topology T , a stream assignment µ and a communication strategy for every stream variable.

Lazy DSRV Algorithm for Timed Asynchronous Networks
We extend our local monitor to Q n , U n , R n , P n , W n adding the following two storages: -Pending requests P n , where n records instant variables that have been requested from n by other monitors but that n has not resolved yet.-Waiting for responses W n , where n records instant variables that n has requested from other nodes but has received no response yet.
The storage W n is used to prevent n from requesting the same value twice while waiting for the first request to be responded.An entry in W n is removed when the value is received, since the value will be subsequently fetched directly from R n and not requested through the network.The storage P n is used to record that a value that n is responsible for has been requested, but n does not know the answer yet.When n computes the answer, then n will send the corresponding response message and remove the entry from P n .Finally, request messages are generated for unresolved lazy instant variables and inserted in the queues of the corresponding neighbors.More concretely, every node n will execute the procedure Monitor shown in Algorithm 2, which invokes Step in every clock tick until the input terminates or ad infinitum.Procedure Finalize is used to resolve the pending values at the end of the trace to their default if the trace ends.Procedure Step now executes some modified procedures and additional steps: 1. Process Messages: Line 26 annotates requests in P n , which will be later resolved and responded.Lines 27-28 handle response arrivals, adding them to R n and removing them from W n .2. Send Responses: Lines 33-36 deal with pending lazy variables.If a pending instant variable is now resolved, the response message is sent and the entry is removed from P n .3. Send new Requests: Lines 37-41 send new request messages for all lazy instant streams that are now needed.4. Prune: Line 42-44 prunes the set R from information that is no longer needed.See section 5.4.

Formal Correctness
We now show that our solution is correct again by proving that the output computed is the same as in the denotational semantics, and that every output is eventually computed.
Theorem 4. All of the following hold for every instant variable u k : (1) The value of u k is eventually resolved.
(2) The value of u k is c if and only if (u k , c) ∈ R at some instant.
(3) If eager (u) then a response message for u k is eventually sent.
(4) If lazy(u) then all request messages for u k are eventually responded.
Proof.The proof proceeds by induction in the evaluation graph, showing simultaneously in the induction step (1)-( 4) as these depend on each other (in the previous inductive steps).Let M be a length of a computation and σ I be an input of length M .Note that (1) to ( 4) above are all statements about instant variables u k , which are the nodes of the evaluation graph G ϕ,M .We proceed by induction on G ϕ,M (which is acyclic because D ϕ is well-formed).
-Base case: The base case are vertices of the evaluation graph that have no outgoing edges, which are either instant variables that correspond to inputs or to defined variables whose instant equation does not contain other instant variables.Statement (1) follows immediately for inputs because at instant k, s k is read at node µ(k).For output equations that do not have variables, or whose variables have offsets that once instantiated become negative or greater than M , the value of its leafs is determined either immediately or at M when the offset if calculated.At this point, the value computed is inserted in R, so (2) also holds at µ(u).Note that (2) also holds for other nodes because the response message contains u k = c if and only if (u k , c) ∈ R n , where µ(u) = n.Then the response message is inserted exactly at the point it is resolved, so (1) implies (3).Finally, (4) also holds at the time of receiving the request message or resolving u k (whatever happens later).-Inductive case: Consider an arbitrary u k in the evaluation graph G ϕ,M and let u 1 k 1 . . .u l k l the instant variables that u k depends on.These are nodes in G ϕ,M that are lower than u k so the inductive hypothesis applies, and ( 1)-( 4) hold for these instant variables.Let n = µ(u).At instant k, u k is instantiated and inserted in U n .At the end of cycle k, lazy variables among u 1 k 1 . . .u l k l are requested.By induction hypothesis, at some instant all these requests are responded by ( 1) and ( 4).Similarly, the values of all eager variables are calculated and sent as well (by ( 1) and (3) which hold by IH).
At the latest time of arrival, the equation for u k has no more variables and it is evaluated to a value, so (1) holds and (2) holds for u k at n.At this point, if eager (u) then the response message is sent (so (1) holds for u k ) and if lazy(u) then all requests (previously received in P n or future requests) are answered, so (1) also holds.
This finishes the proof.

Resources for Lazy
Analyzing the lazy case requires modifications.In timed asynchronous networks we need to introduce a new kind of message to provide confirmations that are only used to inform the receiving node that some instant variables are not needed so they can be pruned.This new message have the following form: -Confirmation messages: (confirm, s k , n s , n d ) where s k is an instant variable, n s is the source node and n d is the destination node of the message.This message will be interpreted as the source node n s has resolved instant variables s up to k.This information allows the destination node to conclude that instant variables required at the remote node for nodes that have been resolved are no longer necessary.We change MTR rem to include that the response gets emitted when the request arrives or when the remote instant variable gets resolved, whichever happens later.Here arr s→r (t) is the time when the request is sent, that is, when the instant variable s gets instantiated and stored in U .MTR lazy rem (r t + w ) is when the remote instant stream gets resolved.Finally arr r→s (t ) is the moment at which the response of the lazy instant stream variable arrives at the requesting node.

MTR
Instantiation to Synchronous Again, we first consider the case where the delay of any link to be a constant throughout the execution.This constant is useful to simplify the equations but we need to consider now that for each instant variable we need a request and afterwards a response, in order to get the remote value.Again, dist r s is used to represent the delay that every message will take from µ(r) to µ(s), independently of the time instant at which the message is sent.We use this knowledge to simplify MTR lazy rem for synchronous networks as follows Where the value of the remote instant variable arrives when the response message arrives dist r s , which is emitted either when the request arrived t+dist s r or when the remote value is resolved MTR sync (r t + w ), whichever ocurrs later.
AETERNALLY Bounded delays Now we consider that case where we know a maximum delay in the network that upper bounds all the other delays in the network behavior.Substituting the upper-bound value d in the equations for MTR, we obtain an constant upper-bound on the MTR (although this value can be a gross over-approximation): TEMPORARILY Bounded delays Finally, we do not assume an aeternal bound on the delays of the network.Instead, we can just look at what affects the computation of the instant variables, that is, other instant variables that it depends on and the network delays that affect the messages to compute those instant variables.We take into account again the window win(s t ), which contains the interval that includes all the instants at which values that influence s t are resolved and sent.This window always ends at most at MTR(s t ).Inside this window we can find the worst delay of a message sent for the computing of the instant variable: d worst (s t ).Then, we can bound MTR(s t ) as follows for the lazy case: Here, d worst (s t ) is the time for worst message affecting the computation of s t , so the window for obtaining this value considers both request and response messages.We use this value to bound both the request and the response.First, we obtain the latter instant at which either the request arrives or the remote dependency is resolved in max(d worst (s t ), MTR temp lazy (r t + w )) and then we add the time for the response message to arrive with the value in d worst (s t ).
Obtaining the moment at which we know that the remote dependency is guaranteed to be resolved and its value arrived at the requesting network node.

Pruning the Resolved Storage.
We are finally ready to prune R n for the lazy algorithm case because we know now when every instant variable will be resolved.
Theorem 5. Every unresolved instant variable s t that is lazy in U n is resolved at most at MTR lazy (s t ).
As soon as this moment is reached, considering that the network delays are bounded, a confirmation message will be sent to those monitors where lazy instant variables that are dependencies to the resolved instant variable are computed and this message will arrive in bounded time.Then the receiving node can prune the corresponding instant variables from its memory.Now we need to add tconf in this theorem which is the time for the confirmation message to arrive: Every unresolved s k = e in U n is pruned at most at max({MTR lazy (u k − w ) + tconf u }).Where u k − w is a remote instant variable that contains s k in its equation and tconf u is the time for the confirmation message to travel from µ(u) to µ(s) sent at time MTR lazy (u k −w ).This message arrives at destination in bounded time and the instant variable gets pruned.Because at that point the receiving node knows that the instant variable is no longer needed and can prune it even if it is not resolved yet.With this mechanism, we can assure that every instant variable will be in memory (U n , R n ) for a bounded amount of time.This implies that decentralized efficiently monitorable specifications in timed asynchronous networks can be monitored with bounded resources.The bound depends only linearly on the size of the specification, the diameter of the network and the delays among the nodes of the network.

Conclusions and Future Work
We have studied the problem of decentralized stream runtime verification for timed asynchronous networks where messages can take an arbitrary ammount of time to arrive.This problems starts from a specification and a network.Our solution consists of a placement of output streams and an online local monitoring algorithm that runs on every node.We prove the termination and correctness of the proposed algorithm.We have captured specifications and network assumptions (synchronous, aeternal and temporary bounds) that guarantee that the monitoring can be performed with constant memory independently of the length of the trace showing that our solution subsumes the previous synchronous algorithm.We report on an empirical evaluation of our prototype tool tadLola.Our empirical evaluation shows that placement is crucial for performance and suggest that in most cases careful placement can lead to bounded costs and delays.As future work we plan to extend our solution to disaster scenarios where some links may present a delay ad infinitum, so no message can traverse that link.Our intuition is that we could use redundancy in the specifications and the network topology to provide resilience against faulty network links while also providing better performance than just by replicating the time asynchronous algorithm and running them in parallel isolated from each other.

Fig. 1 .
Fig. 1.Dependency graph for example 2 The evaluation graph EG is the unrolling expansion of the dependency graph for all instants.Given ϕ(I, O) and a trace length M (or M = ω for infinite traces) the evaluation graph G ϕ,M has as vertices the set of instant variables {u k } for u ∈ I ∪ O and 0 ≤ k < M , and has edges u k → v k if the dependency graph contains an edge u j − → v and k + j = k The corresponding evaluation graph for M = 5 is shown in Fig. 2.

Fig. 8 .
Fig. 7. Synchronous and asynchronous in an asynchronous network with details of asynchronous