A Novel Low-Power Time Synchronization Algorithm Based on a Fractional Approach for Wireless Body Area Networks

Time synchronization is a topic of interest for any distributed system and it is of particular relevance in Wireless Sensor Networks (WSN), especially when it is necessary to keep a strong level of time synchronization among the data coming from different nodes, which are then employed to perform a posteriori data-fusion and data-merging operations. A special case of WSN is constituted by Wireless Body Area Networks (WBAN). The paper introduces the Fractional Low-power time Synchronization Algorithm (FLSA), a lightweight and ultra-low-power time synchronization algorithm conceived for Wireless Body Area Networks. The core of the proposed approach is the fractional-time concept, borrowed from Phase-Locked Loops theory, that allows achieving fine timer corrections. Moreover, an heuristic routine managing the on/off switching of the radio section of the device allows to dramatically decrease the power consumption. The mathematical discussion, along with a set of experiments is presented, proving the benefits associated with the proposed algorithm.

monitoring purposes, therefore the tasks of the WBANs are to measure the vital signs and report them to the medical staff. In the second case, WBANs can have multiple possible tasks in entertainment, gaming, consumer electronics, non-medical emergency handling (typically interfacing with outer sensor networks) and emotion detection [4].
WSNs and WBANs are important examples of distributed systems. Clock synchronization is a critical issue in every distributed environment [5]. In fact, in this scenario each node is governed by its own physical clock and, given the fact that each of them can be subject to some artifacts, there is the need for a common reference in order to avoid possible time mismatches. Time synchronization is important for many reasons in WSNs. For example, if we aim to merge and correlate data coming from different sensing sources, it is important to ensure their time coherency. A common example where time consistency is essential are fault-recovery applications.
Time synchronization accuracy varies as a function of the different applications. During the years, many synchronization protocols and algorithms have been developed to solve the clock coordination problem in a distributed environment [6]- [11].
However, in the case of WSNs and WBANs, their specific constraints do not allow to employ the classical techniques. In fact, the limited power available requires protocols that reduce at its minimum the number of messages associated to time coordination, and to the processing time and effort associated to process them. Moreover, since the radio channel provides an unreliable transmission medium, it is mandatory to employ protocols that are as simple as possible, even at the cost of losing packets, due to the presence of interferers or absorbers.
In this paper, we introduce a novel time synchronization algorithm for WBANs. The algorithm is capable of both time offset and time skew compensation. In particular, as it will be explained in the next sections, the proposed approach is based on a novel fractional-time correction technique relying on a dual modulus division, and an heuristic routine that balances the power-consumption and the precision of the system timer according to thresholds that are customizable with respect to the application.
The paper is organized as follows: Section II describes the background and the analysis of other techniques present in literature. In particular, it focuses on the definition and modeling of clock artifacts, the characterization of delays in message exchanging and the most known and employed algorithms and protocols for time synchronization in WSNs, with a particular emphasis on the WBAN case. Section III describes the theory behind the classic integer clock implementation. Section IV introduces the main motivations for which the described approach was devised. Then, in Section V, the algorithm theory and operations are presented in detail. Section VI describes some experiments conducted for validating the approach for both a precision and power efficiency. Finally, conclusions close this work.

II. THEORETICAL BACKGROUND AND RELATED WORKS
In this section, after having briefly recalled WBAN design requirements and topologies, we will introduce some basic background concepts to understand how time synchronization algorithms and protocols operate.
Later on, we will discuss the advantages and the disadvantages of some of the most common algorithms and approaches for time synchronization.

A. WIRELESS BODY AREA NETWORKS
The classical definition of WBAN has been already presented in the introduction [3]. WBANs have been subject to standardization from the IEEE 802.15 working group and their main technical requirements are specified in the IEEE 802.15.16 standard draft [12]. Critical issues associated to the development of a WBAN are [4] the minimization of the communication delay in the network and the network itself maximum throughput rate. Also, it is important to guarantee good network lifetime (i.e., the resiliency to network failures must be taken into account), and minimum power consumption. The nodes in a WBAN are usually battery powered, for portability reasons.
The IEEE 802.15.16 draft considers the star topology as the main network topology for a WBAN [12], in other words, it is the central node that coordinates the entire network. Star topology can be of the single or multi-hop type. However, to guarantee resiliency to network failures, and facilitate sensors/actuators addition or removal, the single-hop star topology is more efficient. This paper will focus on the single-hop topology.

B. A MODEL FOR LOW-COST DEVICES CLOCK
Usually in low-cost devices, the clock is generated by crystal oscillators that ensure reasonably stable operations with the benefit of reduced costs.
Let us define χ i (t) a function that allows to determine the time value for the device i at the time instant t, starting from the clock circuit operating at frequency f SRC . In perfect conditions, for each node i: where α(t) is a proper scaling factor. In practice, real clocks experience some non-idealities, due to temperature variations [13], aging [14] and load capacitance mismatches [15]. In a distributed environment, these eventually lead to have different time values for each node. Because of the relativity of time computation, if we aim to model those artifacts we have to use a common reference clock value. Hence, if i is the generic node, equation (1) must be corrected as follows: being ε i (t) = α i (t)/(f i SRC ) and ϑ i being the initial offset. Therefore, if we suppose j to be the reference node, we have Because of its time invariant nature, ϑ ij can be compensated in a calibration phase. Then ∀t there is a non-static error due to the different crystal oscillation frequencies, commonly called skew. If we write 1/(f i SRC ) = t i SRC and suppose t i with e ij (t) modeling the instantaneous time error due to frequency drifts. The implementation of a clock correction can be difficult and expensive at the hardware level, (expecially in terms of power consumption), and, thus, it is usually performed in software. To correct the drift, we can vary the proportionality coefficients. So, in order to correct i (i.e. to obtain χ ij (t) = 0) we should have

C. ALGORITHMS AND PROTOCOLS FOR WSN TIME SYNCHRONIZATION
Being the WBAN a particular type of WSN, the protocols employed for time synchronization are usually inherited from this latter operative context. Different operative scenarios call for diverse protocols and approaches. Almost every protocol has a single node acting as the time reference for the whole WSN, therefore each node must be synchronized to its timing. We will refer to this node as the root-node, time coordinator node or time-reference node as in [16]. Note that, in the WBAN case, this node is, actually, the star-center node. We can so refer to this kind of communication paradigm as a master-slave approach, with the root-node acting as the master of the synchronization process and the other nodes being the slaves. When aiming to perform data merging from the different sensors of the network, in many applications it is not necessary for the master's time to be absolutely correct (e.g. with respect to the UTC). In fact, what is relevant is that the nodes agree on a common time.
Depending on the quantities that they correct, clock correction and synchronization protocols and algorithms can be divided in offset-only correction algorithms and joint skew-offset correction algorithms. Offset-only solutions generally require a higher number of messages, because the time-variant term leads to the impossibility of a single correction. Hence, joint skew-offset solutions are eligible to be lower power-intensive techniques.
In those systems in which we expect a huge impact of frequency drifting, a greater complexity for the algorithms is required because it is necessary to predict how the drifting will evolve, in order to reduce the frequency of the synchronization messages. To do that, regression and machine learning techniques can be employed [17]- [19]. In very large WSNs, there are many approaches to simplify and speed up the synchronization phase. One example is presented in [20] in which the network is divided into broadcast subdomains and the synchronization message dissemination is performed thanks to the creation of a spanning tree. Spanning trees are also employed in [19] and [21] as a way to manage multi-hop. In this way it is also possible to manage time-coordinator failures.
For those scenarios in which the links between the nodes are not stable, a resistive-network inspired approach like the one proposed in [17] can be very helpful because of its flexibility. However, given the computational weight associated with the steps, the effort could be justified only for very large WSNs.
To further simplify the complexity of distributed protocols, another trend is furnished by overhearing-based protocols, such as [22], [23], in which neighbor nodes capture the synchronization packet during an ad hoc synchronization phase (usually pair-wise) of a couple of nodes.
In the wide literature currently available, some protocols are taken as a reference because of their large diffusion. One of those is the Reference Broadcast Algorithm (RBS) [18], used to synchronize a set of receivers to the coordinator. RBS does not employ dedicated synchronization messages and adopts relative time synchronization. Once the root-node has sent the reference_packet, the receivers log the local time of reception associated with that packet. Then, the receivers exchange their observation and, once they have collected the whole network observations, they average and reach a consensus on the average time of reception. Note that RBS is not optimized for low-power contexts. However, in [24], a revised RBS approach for low-power energy-harvested WSN is proposed, in which a linearly incremental switch-off time for the radio section is employed when the synchronization error decreases. Otherwise, the off-time is reduced referring to the slope of the interpolated error samples (or, equivalently, the ''time points'' received from the master).
The Timing synchronization Protocol for Sensor Networks (TPSN) [21] is a dynamic pair-wise synchronization approach in which synchronization-request senders are synchronized to receivers. It has two main phases. The first one is a discovery phase, initiated by the root node. At this stage, a broadcast message is sent by every node to establish a spanning tree hierarchy based on level number exchange. Then, in the synchronization phase, each node belonging to level i sends a synchronization_pulse to the one on level i−1 (assuming the root-node has level 0), which answers with an acknowledge (ACK) message.
Another example is constituted by the Flooding Time Synchronization Protocol (FTSP) [19]. FTSP synchronizes the sender time with multiple receivers using message timestamped at MAC layer. The approach is simple, i.e. the central node floods the entire network with a broadcast packet containing its local MAC time, so that each receiver logs the reception time and adjusts its own MAC time to it.

D. A MODEL FOR THE TIME DELAY IN MESSAGE EXCHANGE
The time elapsed for the synchronization message from when it is generated up to when it is processed at the receiver's side can be decomposed into 5 parts [19]: • Send Time (t S ): time needed for the message to be prepared and sent to the network interface; • Access Time (t A ): waiting time for the shared communication channel to be free; • Transmission Time (t TX ): delay needed for the sender to put the message on the communication channel; • Propagation Time (t P ): time needed for the electromagnetic signal to travel from the transmitter to the receiver; • Reception Time (t R ): It is the dual of t S , and it goes from receiving the synchronization message to the end of its processing. The overall delay τ can be expressed as: Note that some of the contributions to the overall time delay have a random nature (i.e. t S , t A , t R ). However, this model can be simplified under some hypotheses.
As previously discussed, in time synchronization there is commonly a reference node and every node synchronizes to it.
Since we are assuming a star topology, we can consider that the central node uses messages of the broadcast type to communicate its time. In this specific case, it is possible not to consider t S , t A and t TX for all the slaves. Since a single message is broadcast to all the nodes there is only a unique send-time to wait. Channel access time is also the same for every node, as a single t A is needed to be waited before sending the synchronization message to all the other nodes. For what concerns t TX , since it is related to the message length, for a broadcasted message it represents a quantity that it is equal for each slave. Therefore, the possible inter-slave delay expression reduces to: If the considered network has a small spatial extension, as in the case of a WBAN, t P can be neglected, because, considering the air as propagating medium (c = 3 · 10 8 m/s), t P ranges from 10 to 100 nanoseconds for a distance of 1 − 20 m. Hence, we can accordingly simplify the expression of τ : Thus, to further optimize the overall delay, τ , we have to reduce t R . However, due to t R random nature, a good reception queue management policy can also lead to significant performance improvements.

III. INTEGER CLOCK IMPLEMENTATION
Usually, in a low-cost system, the System Timer (ST) is supplied by a Real-Time Clock (RTC) module. For low-power applications, the RTC modules usually integrate a set of registers (counter register, capture/compare register, prescaling register) and a tick event generator, as shown in Fig. 1. The relationship between the frequency value of the clock source, f SRC , and the increment rate f RTC of the counter register R COUNT , is: where R P is the value of the prescaler register. The easiest way to implement a ST is to set its resolution, t ST , by properly programming the prescaler. In this case, the ST resolution is equal to that of the RTC, t RTC : The ST value can be directly obtained by multiplying the R C register value and the ST resolution value: The maximum value that can be stored in the ST, ST MAX_VALUE , represents the only limit for such implementation: where N is the length in terms of bits of the counter register.
Commonly employed register lengths are 24 bits for the counter and 12 bits for the prescaler register. The used crystals operate at a frequency f SRC = 32.768 kHz [25]. As a consequence, t ST is: 30.517 µs ≤ t ST ≤ 125 ms (14) so that the maximum ST value ST MAX_VALUE is such that: Once the desired resolution has been set, the maximum value that the ST can manage without overflowing and resetting the register R COUNT to zero can be computed. The dependence between resolution and maximum ST value can be overcome using the internal RTC module feature called compare. When the R COUNT register value reaches the value stored in the comparison register, R C , an interrupt is issued. The interrupt routine manages some operations linked to the ST itself at a higher level. Under this condition, the ST frequency, f ST , becomes: where f ST represents the ST refresh frequency and R C is the value of the imposed comparison value. Note that the compare register length is usually the same as the length of the counter register. The ST resolution is given by: so that it is possible to find Despite having decreased the ST resolution by a factor R C with respect to the RTC resolution, it can still be varied in t RTC steps, so the granularity of the ST continues to be very low. A practical example helps to gain a better insight. Let us assume a system employing a f SRC = 32.768 kHz RTC source and with a f ST = 1 kHz target ST frequency. A possible implementation that allows to successively refine in steps of the ST resolution, 30.517 µs, consists in setting the R P to zero and the register R C to: The example illustrates that the approximation introduced starting with this setting is about 7.08 µs/ms. In the forthcoming section we will illustrate the effect of this choice on the overall system precision.

IV. MOTIVATIONS TO THE PROPOSED APPROACH
The most intuitive strategy to deploy in synchronized master-slave systems with a star topology is to calibrate the unit ST according to that of the master.
This strategy assumes that the slaves correct their ST and calibrate the whole system to keep a stable synchronization. The operation is performed by every single slave using the synchronization messages received by the master. In this case, the great advantage is that each slave autonomously decides whether to receive every synchronization packet turning the radio on or to skip some synchronization slots as a function of its capability to keep the synchronization with the master. The slaves can accelerate or decelerate their ST according to the needs and this is done by changing the value in the R C register, that is, the number of ticks, t RTC , determining the t ST . Let us evaluate how the effects of a δ ST variation in the t ST affect the overall ST. δ ST can be either a positive or a negative number, depending on the need to accelerate or decelerate the global ST and it is an integer multiple of t RTC .
Let T M be the repetition rate of the master synchronization message toward the slaves. The ST variation in each T M period, ST, is: Thus, the ST variation is directly proportional to the repetition rate of the synchronization message, T M , and inversely proportional to the ST time resolution, t ST . Using (16), we obtain: In an integer clock representation, the slave can only change the R C value of a quantity k. Hence, the minimum insertable step during a synchronization period ST MIN is obtained with |k| = 1 and it is given by To clarify the concept, let us consider an example. Let us choose to keep t RTC as low as possible, that is equal to 30.517 µs. The results are reported in Table 1. Table 1 illustrates that the desired ST MIN values can be reached by changing T M and/or R C . This implementation does not allow obtaining all the possible values for ST MIN since R C is an integer value. This is a common problem in those contexts in which integer dividers are employed. To obtain also fractional values, one of the classical techniques is the dual-modulus division [26] that consists of employing two counters and two registers. In this way, if we call the two register values R 1 C and R 2 C we obtain: with {I i } i=1,2 being the times each register value should be taken into account during the observation period T M . The ideal condition is obtained when t ST strictly depends on the application. Let us assume, for instance, an Inertial Measurement Unit (IMU) whose data need to be sampled at a f s = 200 Hz rate that is, t s = 5 ms. Let us further suppose the IMU is equipped with a RTC whose f RTC = 32.768 kHz. In this case, the t ST must be either equal to the IMU sampling period, 5 ms, or, to gain some safety margin, to one of its submultiples, for  (23), T M cannot be greater than 820 ms and 33 ms for a ST resolution of 5 ms and 1 ms respectively (see the information in Table 1). Such values cause an excessive power consumption due to the intensive radio channel use and to the huge message exchange.
To ensure a reduced message exchange while retaining a high accuracy level, it is apparent that a new synchronization algorithm based on the fractional approach should be introduced.

V. ALGORITHM
The novel algorithm proposed is called FLSA and its core is based on four sub-routines, each one of them being in charge of a specific task related to the global procedure. A high-level diagram of the algorithm is shown in Fig. 2. The diagram emphasizes the interactions between the algorithm's sub-routines. The first sub-routine initializes the fractional timing process. The second one regulates the processing priorities of the messages received by each slave, making a distinction between the synchronization-related messages and other messages. The third sub-routine employs the data coming from the other subroutines to implement the slave time correction and synchronization to the master. The last sub-routine optimizes the power consumption through a time-slot skipping strategy.

A. FRACTIONAL TIMING FUNDAMENTALS
We already mentioned the possibility of improving the granularity of the minimum insertable ST increment by adopting a couple of registers/counters instead of a single one. In order to simplify the notation, we will now refer to the number of ticks rather than to time values. The required number of ticks to form a synchronization slot is: where · represents the rounding at the closest integer number. The first fundamental relationship is: and for any variation should be imposed.

B. FRACTIONAL TIMING INITIALIZATION
Given T M and t ST , the system of equations to be solved to evaluate the initialization values is:

VOLUME 9, 2021
If we use only comparators and counters of the R/R + 1 type, that is: it is possible to rewrite (27) as: Therefore, taking into account (16), we can write Thus, (29) becomes: By substituting and by using (26), we obtain: and, by substituting and simplifying: Solving the system (35) leads to a single solution that represents the initialization vector for the fractional timer.

C. RECEIVED MESSAGES QUEUE MANAGEMENT (RMQM) ROUTINE
Every T M , the master broadcasts to the slaves a synchronization packet containing its ST. Each slave analyzes and processes the received packet according to the algorithm shown in Fig. 3.
In Section II-D it was shown that when analyzing the case of a WBAN with broadcast timing message dissemination, the message delay reduces to the expression (8). Given its random nature, to reduce the reception time t R at its minimum, the RMQM routine processes the synchronization packet with the highest priority with respect to every other possible type of message in the queue. Once the synchronization message is received, the slave j first evaluates the value: that represents the equivalent accumulated time difference between the master and the slave in a single slot T M . Therefore, the slave considers in the evaluation the number of skipped-slots S S (j) = 0, 1, 2, .., k ∈ N in which the radio section was turned off according to the slot-skipping routine. The absolute value of the obtained quantity T M is compared with two reference values. The first, T M,min , represents the fixed lower bound for the synchronization error upon which a calibration is necessary and is determined by the required precision. In particular, the master and the slaves are considered to be synchronous when and the slot skipping routine is executed. The second value, T M,max , is set equal to T M /2. When the slave exceeds such value it simply updates its internal ST with the received value, since it is evident that the time difference is so high that the calibration process would be too expensive. In fact, this step is particularly useful in the early synchronization stage, when the values can be quite different.
In this sense, FLSA operates as an offset compensation algorithm. However, it will be clarified later that the algorithm actually operates also a skew compensation. This is achieved by changing the registers and counters values.

D. SLOT-SKIPPING ROUTINE
When the condition (38) occurs, it is possible to consider the slave to be synchronous to the master. Under this hypothesis, it is possible to save energy by switching off the radio section of the node, according to the slot skipping routine in Fig. 4. The routine operates in terms of synchronization slots. First, the number of slots in which the node can be considered to be synchronous to the master, S sync (j), it is updated by adding the current slot to the previous value. The skipped slots (i.e. the ones in which the radio section is turned off) are considered to be synchronous by hypothesis. To lighten the notation, in this paragraph we will consider each variable referred to the node j. Because of the synchronization condition (38), the following inequality is always met: During the slot skipping phase the radio is turned off, and at the end of each synchronization slot, the ST is adjusted taking into account the computed AVG wd [k], since it represents the average error for each synchronization slot, without calling the calibration routine. The computed AVG wd [k] is subtracted in each synchronization interval skipped and the skipped slot count is refreshed. The routine ends with the radio being switched on for a new synchronization packet, at the reception of which, the RMQM routine is executed. It is possible to see how the skipped slot number linearly increases when the systems are synchronous. For example if we consider the node to be already synchronous, and the threshold S min sync = 5, the radio is switched on for 5 slots, off for 5 more, on for 1 more, off for 6 more, on for 1 more, off for 7 more, and so on. The more the system stays synchronous, the more evident the advantages are.

E. SLAVE TIMER CORRECTION ROUTINE
Let us suppose we are in the case that a correction is needed and the value T M (j) can be expressed equivalently in terms of number of ticks referred to the ticks that compose the time interval between two synchronizations as with t RTC (j) being the RTC-time of node j. The calibration stage is governed by the set of equations: with R i C,new = R i C + δR i C and I i new = I i + δI i . Considering that I A + I B = T M t ST , we can rewrite (43) as: Doing the calculations, (44) becomes: and, by substituting (26): The system has ∞ 1 solutions. To find one of the possible solutions, an iterative procedure can be used, starting by imposing: and verifying that the boundary conditions are satisfied. Therefore, The new parameter to be set is available only when the system has a solution,i.  step described in (35) is then modified and executed as follows: It is very important to highlight that the method detects the t ST minimum deviation while the correction precision is equal to t RTC , that is, the minimum allowed value determined by the specific hardware implementation. It is also evident that a correction on both registers and counters is equivalent to a skew correction. Referring to the model described in Section II-B, changing those values is equivalent to set the new value α i referring to α j . Note that, as already mentioned, the timer correction routine is not called-back when the slot-skipping routine is executed.

VI. VALIDATION OF THE ALGORITHM
The algorithm validation has been performed using the IMUboard-HW-1-0-0 unit described in [14]. The unit was designed and assembled integrating 9 DoF MPU-9250 nine-axis MEMS MotionTracking TM device, produced by TDK-InveSense. This component is a System in Package (SiP) that combines two chips: the MPU-6500, which contains a 3-axis gyroscope, a 3-axis accelerometer, and an onboard processor, and the AK8963, a 3-axis digital magnetometer. The chips are mutually connected using the AUX-I2C of the MPU-6500 [27].  ISM 2.4 GHz. The unit allocates a micro-SD card for data logging, controlling it in SPI bus mode using the FAT32 as file system to allow the direct download of the data on a PC. A couple of LEDs (green and red) represent the user interface along with two push-button switches for direct interaction and one main power switch. The unit is powered either by a CR2450 coin cell battery. The IMUboard-HW-1-0-0 unit is depicted in Figure 6. We experimentally proved both the low-power and the high accuracy of the proposed technique using a long time observation interval. We ran two types of experiments. In the first type of experiment, we considered a master that periodically sent its ST to a slave. In turn, the slave outputs the synchronization results to a PC connected via a serial port. For the second experiment, to consider different initial conditions and their effects on the overall synchronization accuracy and speed, we used three different slaves, each of them connected to the PC via a serial port. For both experiments, a seven days measurement campaign has been performed and during this long observation period, the slaves sent on the serial connection all the information regarding the algorithm implementation and, in particular, the number of skipped slots (when perfect synchronization with the master was achieved) and the register values regarding the fractional timer implementation. In Table 3 it is possible to find the summary of the boards and algorithm parameters that have been used to run the experiments (f IMU is the sampling frequency of the IMU sensors).

A. POWER CONSUMPTION
The power consumption analysis has been performed using the Power Profiler Kit (PPK) [28], which is a low cost, highly accurate power measurement tool from Nordic Semiconductor. The PPK measures the current drained by the custom board. All measurements refer to the same power supply voltage of 3V DC. The current drained by the unit has been measured while receiving the packet, during the processing phase and with the radio switched-off (stand-by phase). These scenarios correspond to the main states of the algorithm. Fig. 7 and Fig. 8 show the corresponding measurements. Fig. 7a shows the slave current monitored over a T M = 20 s interval, with a single reception event, while Fig. 7b zooms over the synchronization packet processing phase. The measured values averaged over the whole T M = 20 s synchronization interval give a 7.045 µA mean drained current. By averaging the drained current during the synchronization phase (whose duration is 34.783 ms), when the radio is switched on a 2.1 mA. Figure 8 shows that with the radio switched-off between two packet receptions the average current is 3.515 µA. These measured values and the data collected during the previous measurements campaign were processed using a MATLAB script to evaluate the algorithm performances.
The maximum number of skipped slots is 53, as it will be shown in next section. Fig. 9 reports the instantaneous power consumption evaluation of the slave when the algorithm is not executed. In particular, Fig. 9a shows the global view, while Fig. 9b shows the drained current at 20 s intervals. Fig. 10 reports the current drained during the same time interval of Fig. 9a but with the algorithm in use. For each iteration the radio activation is delayed by a quantity always greater than the skipped intervals as explained in the Section V-D. Reduction in radio resource access brings to a significant power consumption reduction.   Comparison of the average current drained when the algorithm is adopted with respect to periodic synchronization packet transmission.
What previously discussed is even more evident in Fig. 12 that reports the algorithm efficiency as a function of the skipped slots in the form of percentage power reduction with respect to the case in which the algorithm is not used. The algorithm effects are immediately apparent and there is a 50% saving as soon as the algorithm starts up. The saving rapidly increases as the number of skipped slots increase, and asymptotically tends to be constant when this number further increases. Fig. 11 e Fig.12 report the results after the first 5 skipped slots. The algorithm settings are listed in Table 3.

B. SYNCHRONIZATION PERFORMANCE
To validate the synchronization performance in terms of accuracy and speed of the method, we used 3 slaves, each of them starting by different initial conditions. Every slave and the master were hosted on nominally identical boards. Table 4 reports the ideal initial values (IV) computed by the master  and the slaves. The master took into account these values all along the experiment. Table 4 also shows the values computed by the slaves to synchronize themselves with the master.
The first consideration is that nominally identical units exhibit temporal mismatches even when operating in the same conditions [29]. Moreover, it is worth to underline that only two synchronization intervals were needed to obtain the calibrated values, demonstrating that the proposed algorithm is quite effective, as the synchronization to the master timer is achieved with only two synchronization messages. In particular, in this phase, the first message allowed for a first (coarse) synchronization, while the second one brought to a finer synchronization.
Algorithm performance and precision are a function of the adopted parameters. Let us refer to the values listed in Table 3. For each of the three slaves, we extracted the maximum, minimum and average number of skipped slots listed in Table 5, during the 7-days measurement campaign. These values were taken as a reference to derive the algorithm precision. In fact, if we consider, for instance, Slave #1, the maximum number of skipped slots before requiring a new calibration is equal to 53. Thus, for 1474 consecutive intervals (i.e., 29480s) the time mismatch was lower than T min = 500 µs, according to what we discussed in Section V. The worst case was reported with Slave #1 that during one of the sessions skipped only 12 slots. However, that means that in the worst case, the minimum time interval in which the whole network was synchronized over a 7-days long experiment was 1600 s. The power drained during this measurement campaign along with the number of skipped slots is reported in Table 5. It is evident that the minimum percentage power saving obtained on the whole network amounts to 85%, and it corresponds to the Slave #1 worst synchronization case. Yet, the data show that the maximum power saving slightly varies around the 96% for each slave of the network.

VII. CONCLUSION AND FUTURE WORK
The paper introduced the FLSA which allows performing the time synchronization between the nodes of a WBAN. The algorithm is based on the master-slave paradigm and it ensures an extremely low-power consumption without loss of accuracy, even on a long time period, as proven by the reported measurement campaigns.
One of the advantages of the proposed algorithm is that it can be easily implemented in any low-cost and low-power system thanks to its very low computational complexity. As an example, the paper proposes an implementation on a SoC integrating a low-power ARM R Cortex TM -M4. It has been experimentally demonstrated that at start-up, even if nominally identical boards start from different conditions, the algorithm quickly reaches synchronization and maintains it for long periods of time, being able to exploit the maximum precision made available through the hardware.
Future work can be done evaluating more complex schemes of fractional implementation such as tri-modulus and in general multi-modulus, evaluating the impact of the possible evolutions in terms of accuracy and computational complexity. The use of a multi-modulus systems would also make the randomization of modulus control more effective.
Another topic subject to future work will be the multi-hop expansion of the proposed approach.