Modeling and Verification of Symbolic Distributed Applications Through an Intelligent Monitoring Agent

Wireless Sensor Networks (WSNs) represent a key component in emerging distributed computing paradigms such as IoT, Ambient Intelligence, and Smart Cities. In these contexts, the difficulty of testing, verifying, and monitoring applications in their intended scenarios ranges from challenging to impractical. Current simulators can only be used to investigate correctness at source code level and with limited accuracy. This paper proposes a system and a methodology to model and verify symbolic distributed applications running on WSNs. The approach allows to complement the distributed application code at a high level of abstraction in order to test and reprogram it, directly, on deployed network devices. The proposed intelligent architecture enables the execution of distributed applications and the verification of the supplied correctness conditions. This paper shows the feasibility of the proposed approach and its effectiveness even when networks include resource-constrained nodes with some sample applications and quantitative experiments measuring the overhead introduced by the monitoring operations.

In particular, after a network has been deployed for an 88 extended period of time, its characteristics may be substan-89 tially different from the testing condition during implementa-90 tion and initial evaluation. Typical variations in the network 91 behavior are due to nodes running out of power [24], suf-92 fering failures in their communication modules, or otherwise 93 malfunctioning [25]. 94 Moreover, WSNs are often deployed in inaccessible areas, 95 or may comprise thousands of nodes. In both cases, manually 96 checking that each node behaves properly may simply be 97 unfeasible. Thus, post-deployment [26] monitoring systems 98 are valuable tools to diagnose malfunctions without stopping 99 the system and to acquire information about the WSN 100 operation [27]. 101 In this work, we present a WSN monitoring platform for 102 the debugging of symbolic distributed applications. The main 103 contributions and novelties of this work are: 104 • the proposed system does not require any extra pre- 105 installed debugging code or hardware on the deployed 106 nodes, reducing the burden on resource-constrained 107 devices, and potentially prolonging the WSN lifes-108 pan and leaving more resources available for ordinary 109 operations;

110
• the interactive approach enabled by symbolic program-111 ming allows multiple verification modes that can be 112 added to the system long after the network has been 113 deployed with greater flexibility than currently existing 114 solutions;

115
• the symbolic computation model permits verifica-116 tion operations on heterogeneous devices unlike the 117 platform-specific tools commonly used;

118
• knowledge on the network and modeling of the dis-119 tributed application are used to automatically verify 120 whether the application was executed correctly, with-121 out the need for human intervention in monitoring 122 network logs. 123 The software platform used in this work is DC4CD [28]. 124 This platform was proposed to enable the development of 125 distributed applications on resource-constrained WSN nodes 126 through executable high-level code exchange [28]. 127 In this work, symbolic computation plays a key role in the 128 evaluation of distributed applications during their execution 129 on deployed devices. To this end a rule-based modeling and 130 verification system is introduced. 131 The proposed system relies on: 132 1) a knowledge base that includes applications and net- 133 work specifications and ties application operations to 134 the corresponding verification code; 135 2) an intelligent agent that uses inference rules to concate-136 nate snippets of symbolic code in the knowledge base 137 to produce verification messages; 138 3) a communication module that sends application and 139 verification code to deployed nodes; 140 4) a symbolic verifier that checks that results satisfy 141 expectations and collects necessary metrics.

142
Development and verification of an application are both per-143 formed in terms of executable symbols that are exchanged 144 among entities. Symbolic test programs are executed on the 145 deployed devices as soon as they are received. 146 The rest of the article is organized as follows. Section II 147 goes over some related works. Section III details the com-148 putational paradigm, the architecture of the modeling and 149 verification system, and describes the system operation. 150 Section IV presents case studies concerning various appli-151 cations. Section V presents experimental results. Finally, 152 Section VI reports our conclusions and discusses future 153 research directions. 155 In this section we discuss related research on WSN mon- 156 itoring and debugging platforms, and the use of symbolic memory and 1 KB of data memory, making the debugging of 192 large programs impossible on resource-constrained nodes.  • the data obtained from the nodes need to be analyzed and 240 interpreted separately.

241
The underlying limitation from which most of these issues 242 stem is that the development and deployment of WSNs usu-243 ally follows the flashing-rebooting-reloading cycle. To over-244 come these issues, symbolic distributed computation has been 245 proposed as a promising solution that naturally supports inter-246 active programming.   The evaluation process follows the postfix notation.

301
The dictionary can be easily expanded with user-defined  The dictionary can be be implemented as a linked list of 315 word definitions allocated one after another in memory, the 316 last pointing to the previously defined one. This way the word 317 lookup mechanism can be implemented as a simple backward 318 search from the last to the first definition. Special symbols, 319 called markers, can be used to create restore points in the 320 word dictionary. When a marker word is defined, it is placed 321 at the current end of the dictionary as any new user-defined 322 word would be. The execution of a marker removes from the 323 dictionary stack the marker itself and all the words that were 324 defined after, rolling back the dictionary to the state it had 325 right before the definition of the marker. Variables are also 326 stored in the dictionary, their execution leaves their memory 327 address on top of the stack. Values can be read from a memory 328 address through the fetch (@) word and written to a specified 329 memory location using the store (!) word.

330
In order to support distributed applications, and to facilitate 331 the exchange of symbolic code, the DC4CD platform natively 332 provides support to distributed computing schemes through a 333 special-purpose construct: 334 tell: <symbolic code to be sent> :tell 335 These two words build IEEE 802.15.4-2003-compliant mes-336 sages, as required by the node radio (see Section V), contain-337 ing the symbolic code enclosed between them as payload of 338 datalink level packets. The sequence of words to be sent are 339 encoded as plain ASCII characters.

340
When the tell: word is executed, it consumes the value 341 on top of the stack, and interprets it as the MAC address of 342 the receiver node, placing that value in the destination address 343 field of the packet. Upon receiving a message, the destination 344 node immediately executes the received instruction without 345 any further translation step. The same words might be defined 346 differently depending on the underlying hardware of a node. 347 Each node then executes the words received through mes-348 sages using the definition in its own dictionary. Exchanging 349 executable symbolic code thus abstracts the characteristics 350 and the representation of target hardware easing interoper-351 ability on networks composed of heterogenous devices.

352
Inside the tell: :tell construct, the symbol tilde (~) 353 is treated as a symbolic placeholder and replaced with the 354 value currently on the top of the stack. This mechanism 355 permits to include computed values in outbound messages. 356 For example, to tell a remote node with id 7 to measure the 357 temperature, send the response back (reply), and make the 358 requesting node print its value, the requesting node executes:   Instead, nodes receive executable code from the moni-409 toring agent to execute a single step of the distributed 410 FIGURE 2. Stack execution of the 7 tell: temperature reply tell: ∼ . :tell :tell symbolic code on Node 6: a) the symbol 7 is recognized as a numeric value and placed on the top of the stack, then the outer tell: ...:tell construct uses this value as destination node address making Node 6 send the inner code (temperature reply tell: ∼ . :tell) to Node 7; stack on Node 6 gets back to the initial state; b) Node 7 receives the code and executes it; temperature leaves a temperature reading (23) on the stack; reply pushes the address of the sender of the received message on the top of the stack; the inner tell: ...:tell construct uses this value to send Node 6 a message containing the reading (23), extracted by the tilde (∼) placeholder, followed by the word dot (.); c) Node 6 receives 23 . ; the symbol 23 is again recognized as a numeric value and put on the top of the stack, then the word dot (.) uses this value to produce an output representation of the reading. The exchanged messages are shown in d).
application. The verification proceeds as in the on 411 demand case. This strategy overcomes the difficul-412 ties of monitoring the state of the network while it 413 is actively changing due to a running application. 414 VOLUME 10, 2022   bridge node has the same specifications of the other nodes in 453 the network, and communicates with the CM through a wired 454 serial interface. All the information exchanges between the 455 monitoring agent and the WSN goes through the bridge node. 456 Using one of the nodes in the network as a bridge is conve-457 nient since it allows to leverage the flexibility of symbolic 458 code execution on this node too as well as the other facili-459 ties of the development environment. The Symbolic Code 460 Producer (SCP) automatically produces symbolic verification 461 code. The SCP is a decision agent that takes as input the 462 network structure and the verification rules related to a dis-463 tributed application to appropriately concatenate snippets of 464 symbolic code. The verification code makes the sensor node 465 perform some application-specific computation and send the 466 results back to the monitoring agent at one or more points 467 during the execution of the application. Verification is thus 468 carried out in virtue of this declared association between sym-469 bolic high-level code describing high-level operations and 470 similarly written code verifying the outcome of the former. 471 To this purpose, in the KB are defined rules of this type: in which Label specifies the operation to be verified, and 474 VerificCode indicates the verification code to be trans-475 mitted by the monitoring agent to the bridge node so that its 476 execution on arrival retrieve the desired results.

477
Once the SCP has produced the verification code, the 478 CM starts the application execution by sending the initiating 479 code to the network through the bridge node. Then it sends 480 the verification code to the nodes accordingly to one of the 481 monitoring modes.

482
Before the application execution begins, during the ini-483 tialization phase, the SCP can inject executable code in the 484 network targeting some or all the nodes. This is useful when 485 debugging a WSN to ensure reproducibility of the performed 486 tests by explicitly setting the configuration of the nodes, and 487 to ensure that some preconditions are verified before the 488 application execution. When the monitoring agent sends the 489 initialization code to the network, it may also override some 490 application-specific words to include debug functionalities. 491 Moreover, this code can also be defined differently for each 492 node. By sending a marker before the new definitions, at the 493 end of the application execution, it is possible to revert the 494 dictionary to its previous state. 495 The core element of the system is the Symbolic Code 496 Verifier (SCV), which examines the results of the execution 497 and ascertains the correctness of the application execution 498 using the verification rules in the KB. Metrics about the 499 distributed application execution and verification time as well 500 as exchanged data are also recorded.

501
The verification process is detailed in Fig. 4. For the on 502 demand and stepwise strategies, which are based on fine-503 grained evaluation during application execution, all nodes are 504 queried for verification data at the end of the process.  computation result will be, either because of some stochastic 536 component in the application or because the measurement 537 of physical quantities is involved. Depending on the result 538 of this verification process, the monitoring agent can take 539 appropriate actions, such as stopping the verification process 540 to report failure or testing the validity of additional predicates. 541 The KB allows for defining nodes as reliable, that is nodes 542 whose response and behavior are assumed to be always cor-543 rect. Rules defining primitives for communication, such as 544 messages, and their storage and transmission among nodes, 545 are also defined in the KB as well as routes and transmission 546 timings.

547
The KB also models the distribution within the network 548 and placement of nodes using a two-dimensional Cartesian 549 coordinate system. The network structure is also modeled in 550 the KB in terms of topology and connectivity as facts for each 551 node in the KB.

552
Some of the predicates that can be used by the monitor-553 ing agent as conditions in its decision making process are 554 shown in Table 1. These predicates take as input one or more 555 node IDs and report some information regarding their status. 556 This information can be used as conditions to determine the 557 actions to pursue during the verification process. For instance, 558 some nodes could be excluded from the verification process 559 to avoid burdening nodes with low remaining charge or to 560 respect a limit on the number of hops for the verification mes-561 sages. Furthermore, in networks with heterogeneous devices, 562 the available computational resources of each node and the 563 available hardware peripheral can limit or expand the scope 564 of the verification process. Once the queried values are retrieved from the nodes, 566 the KB is enriched with additional information that the 567 VOLUME 10, 2022  monitoring agent can use in order to make decisions dur-568 ing the verification process. The predicates in Table 2

598
In order to test and validate the verification system, in this 599 section, we present its adoption for developing some dis-600 tributed applications of different complexity. For the sake of 601 brevity, in the following we present some meaningful frag-602 ments together with the code provided for their verification. 603 The proposed tool can monitor both code and network func-604 tionality. For the second application fragment only, which 605 concerns network functionality, we describe in detail the 606 logical reasoning process implemented for verification.

608
This application is a short fragment belonging to a more com-609 plex application for the distributed aggregation of physical 610 quantities, in this specific case, instantiated to collect tem-611 perature data. The distributed application can be decomposed 612 in the following steps: The bcst keyword specifies that the message be 617 broadcast to the network, update is an application-618 specific word; 619 2) each listening node receives the message and executes 620 it. As specified, the two zeros are interpreted as numeric 621 values and put on the stack. The update symbol is 622 defined to pick these two values from the stack and 623 store them in two of its local variables. The first vari-624 able (num) holds the number of nodes that already 625 carried out the temperature measurement. The second 626 (aggr) holds the current aggregate temperature value, 627 which is the sum of the measurements communicated 628 by the nodes so far. The execution of the message 629 thus commands the nodes to perform the initialization 630 of the application by resetting their local values. The 631 complete definition of update, which includes the 632 wait-and-reply symbol that actually implements 633 the rest of the distributed procedure (step 3), is: 3) The wait-and-reply-nq symbol starts a timer 687 letting the node idle for a time proportional to its ID. 688 When the timer expires, the node broadcasts the update 689 message: 690 1 net-quality 691 4) as described before, the definition of net-quality 692 is such that whenever a node receives the above 693 message from any other node, it increments its own 694 counter.

695
Once the application execution is terminated, the monitor-696 ing agent starts the verification process. In order to monitor 697 the connectivity of the network, verification is performed 698 on the number of messages received by each node. To this 699 end, the SCP will select the appropriate snippet of verification 700 code for this application. According to Table 4 the agent 701 will use the''rcvd-msg @'' snippet to extract the required 702 counter value from each node. To make each node report the number of received mes-704 sages, the SCP concatenates the appropriate communication 705 primitives. After the verification code snippet, a tell con-706 struct is used to send back the computed values. For every 707 node to be verified, the SCP merges the verification code into 708 a tell construct addressed to the correct destination node. 709 The code that the CM sends to the bridge node in order to 710 query a node is the following:

711
NodeID tell: rcvd-msg @ reply tell: 712 ∼ :tell :tell 713 with NodeID the address of the node the verification code is 714 sent to.

715
If the nodes_connected(bridge, NodeID) con-716 dition is not satisfied, meaning that in the KB there is no 717 information reporting a direct link between the bridge and 718 the target node, the SCP selects the forward construct 719 instead of the tell communication primitive to ensure that 720 messages can be correctly delivered.

721
Once the nodes are queried, the SCV performs verification 722 actions for all the nodes that satisfy the node_active 723 condition in Table 1 through the following predicates: The third sample application is based on the network discov-766 ery protocol described in [45] and used to construct a network 767 topology tree. The application execution entails the following In this last sample application we face some issues related 800 to non-determinism. An appropriate monitoring of non-801 deterministic behaviors is a crucial issue since many appli-802 cations, for instance those of IoT systems, are often charac-803 terized by uncertainties. IoT devices typically execute spe-804 cific actions on the basis of their sensor readings. However, 805 measurements can be inaccurate or actuators might mal-806 function. Furthermore, these applications are characterized 807 by the unpredictability of message exchanges. Finally, these 808 applications might not necessarily terminate after a definite 809 time as, for instance, when implementing control loops for 810 physical processes. All these considerations underline the dif-811 ficulty of monitoring the correct behavior of IoT applications 812 while minimizing undue interference. For these reasons, the 813 stepwise monitoring mode, as described in Section III-B, can 814 be a valuable tool.

815
The following application fragment implements a temper-816 ature control in a smart environment. 817 1) On startup all the nodes acquire a temperature sample; 818 2) Each node, acquired its first sample, classifies it as: 1) 819 belonging to the user predefined comfort range, 2) cold, 820 or 3) warm, and broadcasts the classification value; this 821 step is encoded as such: 822 temperature @ classify bcst tell:

823
∼ temperature-update :tell 824 3) All the nodes periodically perform the same measure-825 ments. Only when the classification of new acquired 826 temperature is different from the previous one, the 827 nodes broadcast a message with the new classification; 828 4) According to the classification emerging from the 829 majority of nodes, a collector node sends opportune 830 commands to the HVAC system so to guarantee that 831 most of the nodes obtain readings in the comfort range. 832 Since for most of the tested applications the 864 execution time depends on the node IDs, the ability of the 865 monitoring agent to virtualize IDs was exploited to randomly 866 assign MAC values in the range 0-65535. Being the ID ran-867 domization functionality based on a fixed pool of seeds, each 868 combination was evaluated on the same fifteen sets of IDs. 869 The assigned IDs were set on the nodes through executable 870 code sent by the verification system in the initialization phase. 871 The overhead from these operations does not contribute to 872 verification time measurements because an already deployed 873 network does not generally require these steps.

874
In the linear topology setting, randomizing the addresses 875 presented an extra challenge. Before each experiment the 876 routing tables in the whole network required to be set up with 877 the randomized addresses. Due to the flexibility of the system, 878 this was simply solved by having the monitoring agent send 879 each node executable code defining its routing table in the 880 initialization phase, before the start of the application. To ease 881 the practicality of the experiments, during the initialization 882 phase the nodes were gathered in close proximity to ensure 883 that a correct configuration was quickly achieved. With no 884 loss of generality regarding the evaluation of the overhead 885 introduced by the monitoring agent, the topology was though 886 linear from the point of view of the verification system. 887 Table 5 collects the results of all the tests: 888 • t p is the time elapsed from the moment the message 889 starting application execution is sent to the moment the 890 application terminates and global verification can begin. 891 It can be computed by the system before starting the 892 application and is not influenced by the verification 893 modality. It does not apply to the stepwise modality.

894
• t p + t v is the time from the moment the message starting 895 the application execution is sent to the verification end. 896 • the messages column reports the number of exchanged 897 messages during application execution and verification. 898 The messages considered are only those needed to verify 899 the application itself, not those exchanged by the nodes 900 during the normal application execution, as those are not 901 related to the verification tool. In multi-hop topologies 902 each message forwarding is counted separately.

903
• the bytes column reports the bytes sent through serial 904 line to the bridge node 905 The targeted strategy, interrogating only a reliable remote 906 node, is the most efficient. Nevertheless, besides the intrin-907 sic difficulty of selecting a subset of the nodes as reliable, 908 especially for long executions, not every application can be 909 verified with knowledge about the state of a single node.

910
The global strategy can ensure that the application produce 911 the correct final result but at the cost of higher verification 912 time and number of exchanged messages. Moreover, if the 913 final result is not correct, it may not be possible to determine 914 what caused the failure.

915
The on demand strategy entails even more exchanged mes-916 sages and may potentially introduce timing issues. However, 917 this strategy provides more detailed information during the 918 application execution, for instance about error conditions, 919 which could be detected before the end of the process. 920 VOLUME 10, 2022   An application with more frequent messages exchanges 931 would be slowed down.

932
The HVAC control application was only tested in the 933 stepwise modality because of its non-deterministic nature.

934
The non-deterministic update mechanism does not allow for 935 determining safe timings to send messages without collisions.

936
For the sake of ease of testing, temperature readings were 937 simulated.

938
The network quality monitoring application is meant to 939 assess the state of the whole network, for this reason the 940 targeted modality makes little sense. Moreover, verification 941 is performed on the final results of the application so we did 942 not perform the on demand and stepwise verifications.

943
The relationship between the number of queries performed 944 by the rule system and the other tracked metrics is reported in 945 Fig. 7. As expected, the chart shows a noticeable linear corre-946 lation between the number of queries required for verification 947 and the time spent to perform it.

948
The number of messages propagated through the net-949 work, on the other hand, shows high variability between the 950 two 1-hop topologies (L-shaped and Home) and the linear 951 one: in the latter, for each message sent from the moni-952 toring agent several messages are generated by the nodes 953 contributing to the overall message count. In the stepwise 954 verification scheme for the HVAC control application the 955 non-deterministic execution order of the nodes makes the 956 number of exchanged messages highly variable.

957
Despite the noticeable influence of topology and appli-958 cation in the number of messages propagated through the 959 network, a noticeable linear correlation between the number 960 of queries performed by the monitoring agent and the t v com-961 ponent of the verification time can be observed. In fact, the 962   Higher bitrates with such constrained resources, would not 970 be feasible. This is not a limitation of the monitoring agent. In fact, the actual throughput would be limited by the low 972 computational power of the nodes and their tiny amounts of 973 RAM for buffers that would trigger control-flow mechanisms 974 anyway. All in all, the fact that our approach is feasible 975 even when targeting such a resource-poor platform shows its 976 effectiveness.

977
To asses the applicability of the proposed methodology 978 in networks with more nodes we also performed numerical 979 simulations for each verification scheme. The simulations 980 were carried out with multiple network topologies:

981
• connected topologies where the bridge node could 982 directly query each node;

983
• linear topologies with the bridge node at one end of the 984 line;

985
• ramified topologies where each node could directly 986 communicate with at least four other nodes.

987
The networks were generated in different sizes: 10, 20, 50, 988 and 100 nodes. Fig. 8 reports t v and performed queries for 989 all the performed simulations showing the linear relationship: 990 each query to a node had a cost of ∼ 15 s. The simulation 991 results for networks of size 10 closely match the tests per-992 formed on the deployed networks. 993 Fig. 9 summarizes the simulation results, and shows that 994 the previously identified relationships hold even at increased 995 network size. In particular, the impact of the transmissions of 996 the generated messages on t v is negligible when compared 997 to the communication through serial line with the bridge 998 node. Since the rate at which queries were performed was 999 far lower than the maximum throughput of the network, 1000 the verification process minimally interfered with the WSN 1001 operations. Moreover, the number of bytes sent through the 1002 serial interface for each query was constant, thus the burden 1003 on the bridge node was the same regardless of the topology 1004 and the application under test. Open Access funding provided by 'Università degli Studi di Palermo' within the CRUI CARE Agreement VOLUME 10, 2022