Guiding measurement protocols of connected medical devices using digital twins: A statistical methodology applied to detecting and monitoring lymphedema

This article describes the preparation of a clinical study on a connected medical device dedicated to facilitate the prevention of lymphedema for patients treated for breast cancer after axillary lymph node dissection (ALND), and to alert them of any manifestation or resurgence in order to prevent its aggravation. A simulator of the entire physical process called digital twin has been developed for this purpose, in a hardware-in-the-loop framework. Statistical calibration of its input parameters, by stochastic inversion and using sensitivity studies, led to establish one or more measurement protocols allowing to capture the signal on a mobile device (phone or tablet) and to detect signal breaks that are physically signiﬁcant. The measured signal makes it possible to report quickly on the worsening of the patient’s condition and to warn the therapists within a very reasonable period of time. The general methodology of this work seems to us to be easily reproducible in the preparation of clinical studies of other types for connected devices, which aim to develop measurement protocols limiting the often signiﬁcant cost of such studies. One of the immediate prospects of this work is the initiation of a clinical study on different patients who have been treated by surgery for breast cancer, after improving the robustness of the design of the prototype to take into account the uncertainties affecting the measurements.


I. INTRODUCTION
The development of wearable, connected health devices is growing considerably in parallel with the autonomous computing capabilities of mobile information reception and management tools.Any certification of these objects is based on a certification of the protocol for their use and on a scientific proof of its efficiency and safety.Such evaluations The associate editor coordinating the review of this manuscript and approving it for publication was Shunfeng Cheng.
require clinical studies to be engaged [1].Because such studies are often extremely costly, it is of major interest to pre-determine the conditions of use to be tested in order to avoid exploring unnecessary situations.Consider as an example a non-invasive connected object, such as a bracelet, watch or headband.The need to carry out physical signal denoising under possibly non-stationary conditions requires the prior availability of as many measurements as possible.We can assume that the bracelet is worn once a day on a regular basis, several times a day, or continuously during the day, or even night and day.Despite being non-invasive, such a device imposes a certain discomfort on the user and its use can cause anxiety, especially when the phenomenon it seeks to detect can occur at any time, is quick, and potentially irreversible if not detected and treated on time.In practice, a trade-off has to be made between acceptance and optimisation for medical needs.
This type of situation characterizes, among other things, the problem of detecting the occurrence and measuring the growth of arm lymphedema, a frequent secondary effect of ALND [2].It is associated with an important public health problem, for which no preventive solution exists today, apart from microsurgical healing, specific exercises and risk-reducing behaviors such as avoiding taking blood pressure or wearing tight clothing [3], [4].Indeed, lymphedema can occur abruptly and very randomly up to two years after surgery, and significantly impact a patient's life [5].Numerous techniques have been proposed to detect the onset of lymphedema, or even to quantify its evolution (generally globally, by measuring the evolution of the volume of upper limb).The most common is the method of anatomical measurements made with a tape measure (Section II), possibly accompanied with observations on the tactile aspects of the skin [6].Some alternative approaches have been proposed, based on water displacement [7]- [9] (that suffer from high uncertainties and can be invasive), optoelectronics [10], infrared laser perometry [11], tissue tonometry [12], or absorptiometry [13], among others.Independently of their precision, they remain difficult, if not impossible, to use in the context of outpatient treatment, as they usually require extensive equipment and very controlled conditions.In the current context where technological development allows the production of computer vision algorithms that can be embedded, this type of approach seems promising for daily and repeated measurements of arm variations [14], [15], allowing for the detection of such an onset.However, at the present time, no truly portable solution seems to be really available, while the problem of undetected serious lymphedema remains unresolved by medical advances [16].
This problem motivated the development of a non-invasive connected device allowing the detection and monitoring of arm lymphedema.Such an approach imposes to verify the capacity to use the signal produced by the device, and to solve the difficulties posed by the selection of one or several measurement protocols.This article deals precisely with the latter issue.
Indeed, to our knowledge there is no generic methodology to efficiently design measurement protocols for a clinical study on such connected devices.An empirical approach would be to perform various measurements on healthy subjects, in order to design more appropriate protocols.Even considering an ''agile'' approach to the design of such measurement protocols [17], the costs involved in trial and modification would certainly be very high, and extrapolation beyond the pre-tested perimeter remains hazardous.
Outside the medical field, this type of problem is not new; in the industrial world, computer-aided design (CAD) is commonly used to generate models to represent a physical phenomenon.The dynamics underlying this phenomenon (for example, fluid mechanics) are modelled using theoretical tools such as partial differential equations.Estimated at the centroids of a CAD mesh, the resulting numerical model defines a digital twin of the phenomenon of interest, i.e. a simulator that can generate, based on physical input parameters, a wide variety of physically plausible situations [18].These simulations are valuable tools for design and guide real-world testing procedures.In the case of simulating the response of a human body to certain stresses, numerical biomechanical models based on Behaviour Centred Design (BCD) principles [19] and finite elements methods are increasingly used to design and test protective equipment (such as airbags, protective suits, etc.) [20], [21] or other medical purposes [22], [23].However, the development of such numerical models is often extremely costly, and they are sometimes very demanding in terms of computational resources [23].Developing such models, in a so-called hardware-in-the-loop approach [24], can be prohibitive, if not impossible.
Above all, such models are based on a mechanistic vision of the world, which is inherently opposed to the purely data-based (empirical) vision that governs clinical studies and certification processes.
In certain situations, however, the behaviour of the connected devices depends on a small number of parameters.Therefore it is possible to rapidly reproduce the hybrid process that mixes the physical phenomenon of interest and the measurement process.The principle of a connected health device is to capture and send information to a processing tool (e.g., a smartphone or tablet), which itself will enable dialogue with the patient, his or her physician, and possibly trigger recommendations or alerts.A simulation model should make it possible to convey the numerous noises linked to the collection and transfer of the measurement to the processing centre.These specifications particularly differentiate the case of connected objects to other simulated systems, and require the implementation of principles inherited from the VVUQ approach (Verification, Validation and Uncertainty Quantification) which has become standard in the exploration and use of numerical simulation models [25].Note that, since such models are not intended to be a direct diagnostic aid, they do not have to be extremely precise, although their validation, based on real data, is indispensable.Indeed, these simulation tools make it possible to test several virtual measurement protocols and to prioritize those for which a future clinical study is expected to be of maximum informational gain.
Through the case study of arm lymphedema, this article therefore presents a methodology adapted to the design of a measurement protocol for a connected health device.In the present case, this object is composed of a connected sleeve and a related digital processing software operated on a smartphone [26], [27].This article will first describe the case study that motivated this research work (Section II) as well as the development of a mathematical model of the ''(biological stimulus)-(sensor)-(processor)'' system, whose computer implementation defines a digital twin (Section III).The statistical study of this simulation model requires the recalibration of certain sources of uncertainty based on patient data, using a stochastic inversion procedure, which is described in Section V. A sensitivity study is then carried out, which makes it possible to rank the sources of uncertainty and to judge the need to carry out measurements on a certain time scale (Section VI).Finally, several virtual measurement protocols are tested in Section VII.A discussion section closes this article, which proposes several avenues for future research.

II. MOTIVATING APPLICATION: LYMPHEDEMA
Lymphedema is a swelling of the arm caused by an accumulation of lymphatic fluid between the connective tissues, which can be congenital or (most often) can be induced by the surgical or radiosurgical treatment of cancer.This pathology currently affects more than one hundred million patients around the world, according to the Lymphatic Education and Research Network [28].The present article focuses on upper limb lymphedema developed after axillary cleaning during breast cancer treatment (ALND), following a breast removal operation (see Fig. 1 and [5]).In the situation we are considering, it can appear on the breast side a few weeks after axillary cleaning or after a few months if not years.With physiotherapy, the swelling may be limited, but the risk of a resurgence is certain when care stops.The progression from pitting edema to progressive structural distortion characterizes the natural history of lymphedema, which often extends over the course of weeks to months but in some cases years [29].More precisely, approximately 90% of the lymphedema occur during the first 24 months after treatment, and the remainder occur years to decades later [30].For this reason, it is necessary to remain vigilant over time in order to report the first symptoms as soon as possible, and to monitor and treat the patient very regularly if lymphedema is reported.
The detection of lymphedema is commonly practiced using rudimentary means: measurements are taken at certain positions of the arm, mainly four, shown in Fig. 2, using a tape measure.According to the procedure used at the Senology Unit of the Strasbourg University Hospital (HUS), the arm likely to be impacted must show a difference in circumference of at least 2 cm compared to the so-called contralateral arm.In the early stages, however, it is almost impossible to detect lymphedema.It should also be noted that certain actions, which are still relatively unknown, can promote the onset of lymphedema (for example, sports activities or typing on a computer keyboard).Late detection does not clearly link the onset to a certain type of activity, the restriction of which may help patients.
These detection and monitoring are all the more important since the effects of lymphedema can be devastating for the patient, from a clinical as well as psychological or social point of view [31].Women undergoing lymphedema have a degraded quality of life, with significant physical and psychological consequences [31].Clinically, the significant swelling of the limb due to lymphedema reduces mobility and dexterity, and can cause pain to the patient.It can be a serious obstacle to normal professional activity, requiring professional reclassification or even permanent work stoppage.In 20 to 40% of lymphedema, infectious complications (erysipelas type, cellulitis) appear, resulting in fever, general malaise, severe pain and sometimes inflammation.In the advanced stages, sclerosis appears, causing skin lesions and alterations of ligaments and tendons.For women with dyed skin, lymphedema is a source of psychological and social difficulties: disruption of body image, disruption of identity landmarks, loss of self-esteem, anxiety (up to depressive episodes) [32].
In addition, lymphedema is underrecognized and underdocumented [6], so the currently accepted rates of incidence and prevalence are underestimated.Estimates of the risk of lymphedema after breast cancer treatment vary widely from 14 to 40% [33].The frequency of lymphedema is 15% to 28% after ALND and 2.5% to 6.9% after a sentinel lymph node biopsy (SLNB); the frequency may increase to 44% when the surgical procedure is combined with radiotherapy [34].There are different risk factors: the number of lymph nodes removed, external radiotherapy, obesity, mastectomy, different physical practices.Obesity is an aggravating factor with a high prevalence: 75% of patients with lymphedema have a BMI greater than 40 kg/m 2 [35].Another problem is the variability in the objectively defined threshold of volume excess required for diagnosis of lymphedema [34].Besides, large uncertainties remain regarding what tools are the most efficient for disease detection [6], some of them being too invasive or costly.
In the face of these difficulties, sentinel-node sampling techniques reduce the estimated risk of breast cancer-associated lymphedema to 6 to 10% [36].Besides, a prospective study has shown that after breast cancer surgery, surveillance (with aggressive risk-reduction measures taken) is associated with a lower observed incidence of lymphedema (36.4% in the control group vs. 4.4% with surveillance) [37].Although randomized and controlled trials (RCTs) of surveillance are lacking, there is a growing belief that surveillance leads to earlier diagnosis and to treatment strategies that are both more efficient and more cost-effective than the previous situation with no surveillance practices [38].Setting up systematic surveillance remains, however, a complicated task.
Today, the postoperative diagnosis of arm lymphedema in a breast cancer patient is initially made by the patient's personal presumption, as she notices a clear swelling of her upper limb.Secondly, it is the medical opinion that confirms or invalidates the presumption of lymphedema of the upper limb, following a consultation.The diagnosis is therefore not instantaneous; there may be significant latency between the onset of lymphedema, suspicion of lymphedema, and final diagnosis.
At the HUS Senology unit, treatment involves various physiotherapeutic interventions: decongestive lymphatic therapy relies heavily on manual lymphatic drainage, a massage technique that stimulates lymphatic contractility through gentle, directed stretching of the skin.In addition, decongestive lymphatic therapy includes skin care, serial application of multilayer bandaging, exercise (gentle, repetitive contraction of the musculature beneath the bandages) and mechanical pressotherapy.Typically, the patient participates in 10 sessions over 2 weeks.Then, the patient can be transitioned to self-care.Some authors recommend a surveillance program that includes a quarterly assessment of arm measurement during the two years after treatment, when most cases appear.There should also be prompt use of compression garments and decongestive physiotherapy for symptoms or for worrisome and significant changes in surveillance measurements.
The described context and the treatment difficulties clearly motivate the search for practical, non-invasive solutions that favor the early detection and follow-up of patients suffering from lymphedema, finally allowing an objective correlation between symptoms and certain activities.The following section describes the outcome of such research.

B. A CONNECTED SYSTEM TO DETECT ARM VARIATIONS 1) GENERAL PRINCIPLE
In 2018, following a long-term research collaboration between the HUS and Quantmetry on breast cancer [39], [40], both entities developed a connected sleeve (Fig. 3) devoted to help prevent and analyze lymphedema: the lymphometer [26], [27].The main task of this connected device is to enable the patient, through a very simple process (less than five min), to monitor on a daily basis (this routine will be detailed later on), the size of her arm at a level of accuracy that is better than a visual inspection by the patient herself.More precisely, this sleeve was designed to take measurements of a patient's arm circumference at the four different locations described in Fig. 2, mimicking the usual clinical practice at the HUS, thus generating a time series of dimension four.As evoked previously, a major stake is to limit the user's discomfort.For this reason, the sleeve should be worn, if possible, only on the arm potentially affected by lymphedema.Collected data is then sent on a health data hosting certified cloud server (HDS).Datasets are aggregated and compiled and then transmitted to the patient and the physician who follows her personally, as well as to the scientific community anonymously (if the patient agrees).The stored datasets are protected and secured, and the anonymity of patients is thus preserved.
The complete acquisition and treatment system, which includes the connected sleeve itself, visualisation tools and an alert subsystem, is represented on Fig. 4 and more thoroughly described in the following subsection.

2) DESIGN AND WORKING PRINCIPLE
The connected sleeve, which is the data acquisition device, consists of two main parts: (a) A stretchable fabric; (b) An electronic device comprising the measurement circuit (sensor + conditioner) and a communication module.See Fig. 5 for a complete view of this measurement chain.The sensor, a stretchable piece of rope that translates linearly its length into an electrical resistance, measures a variation in sleeve    stretch due to the variation in arm circumference.Hooks on both sides enable to connect it to the fabric of the sleeve and to the electronic PCB (Fig. 6).
Since a microprocessor cannot directly measure an electrical resistance, a conditioner circuitry is designed to convert the resistance value into an electric voltage.This is done by measuring the loading time (time to reach a certain level of electric voltage) of a capacitor through the electrical resistance of the sensor (Fig. 7).The circuit is advantageously simple and the relation between the sensor value and the measured time is linear.If R is the resistance of the sensor and C the capacitor (here a constant), the loading time is proportional to the time constant τ = R * C. The measurement relative precision can be easily derived from the sensor and capacitor precision: Once resistance is found, the sensor's transfer function is inverted to find the value of the stretch.
To communicate the sensor value to the server, the first hardware version used WiFi.The main issue with WiFi is that the chip warms up when transmitting data.Unfortunately, lymphedema can worsen with heat.Eventually, Bluetooth 5.0 Low Energy (BLE) was chosen, since it consumes very little energy and does not warm up at all.In addition, this versatile technology provides robust short-range communication for a large range of devices (see [41] for a recent review of its features).The main difference between Bluetooth and WiFi is the following: with WiFi the connected sleeve connects directly to the setup box to send the data, with Bluetooth the device connects to the patient's Smartphone and uses the Smartphone's connection to send the data.The two prototypes of the electronic board are shown in Fig. 8 and Fig. 9.
On the Smartphone's side, an application was developed to connect to the sleeve, display measurements and alerts to the patient, and send data to the practitioner's dashboard.Screenshots of patient Smartphone's mobile application can be seen on Fig. 10.

III. DIGITAL TWIN MODELING
In this section, we describe the steps leading to the definition of a numerical model M of the device simulating, at each instant t and for a position i along the arm, a noisy measurement y (i) * t that is transmitted to the smartphone via Bluetooth.We assume that the smartphone is held at a distance of less than five meters from the sleeve, without obstacles, so that the noise and delays affecting the signal transmission can be considered negligible [41].Other sources of possible interference in the 2.4 GHz band, such as WiFi [42], are not taken into account in this modeling.The main notations introduced in this section are summarized in Table 1 for the reader's convenience.
The numerical model M aims to mimic the effects of the biological stimulus that makes an arm grow at certain locations.It should allow us to generate a wide range of plausible trajectories of values of y (i) t at each measured position i along the arm.This model was defined as the sum of models M 1 and M 2 where M 1 is a first stochastic process simulating the real growth of arm circumference y (i) t , and M 2 is a second stochastic process that affects y (i) t with the measurement noise.These two sub-models are detailed in the two following subsections The statistical estimation of M 1 and M 2 is based on a clinical dataset described in Section IV, which is the subject of Section V.

A. STOCHASTIC MODELING OF ARM GROWTH
A simple growth model for the theoretical arm circumference at position i and time t, based on the growth function f , can be defined as follows: where y (i) 0 is the average arm circumference at position i before the operation, the position parameter t 0 indicates the time when the growth starts and ε (i) t can be interpreted as the epistemic error between the real and predicted circumference at time t and position i.A minimal parameterization for the growth function f should incorporate a local growth rate r (i) > 0 and an upper limit value (physically plausible) y (i) ∞ > y i,0 for the circumference [43].Several growth functions f usually encountered in biological problems are considered for the study, and described in Appendix A-A.
Obviously, the parameters y ∞ are correlated.Note that f was always chosen to be an increasing function of t, since the therapeutic effects cannot completely reverse the growth process [6].
While y (i) 0 can be assessed by averaging all measures {z (i) t } t on the contralateral arm, parameters y (i) ∞ and r (i) can be considered as random, with corresponding distributions π(y (i) ∞ ) and π(r (i) ), as they are different for each patient.The predictable low number of such observations around the first growing steps, due to the fact that most patients are measured after a first suspicion of lymphedema, induces to choose a parametric framework for the assessment of π(y This requires to reparameterize the problem.Indeed, in this framework, most usable results, in terms of statistical theory and algorithms, are based on the assumption that the random input parameters are Gaussian and take their values on the real line [44]- [48].Denoting 1,k are independent Gaussian variables: where the unknown parameters {µ 1,k } k must be estimated for each location.This practical Gaussian choice appeared to allow for sampling relevant trajectories of arm growth.
Parameter t 0 can besides be considered as random too, for the same reason.A rationale for choosing a lognormal stochastic model, namely defining the new parameterization: where (µ 1,c ) are to be estimated too, is based on the empirical distribution of arrival times t 0 of patients in hospital in the early stages of lymphedema, which are such that t 0 t 0 (t 0 can be statistically understood as a left-censoring variable for t 0 ).The dataset used for estimation is named D 1,A further in the text ( § IV).As shown in Fig. 11, it appears relevant to choose a lognormal distribution for t 0 .The proximity between t 0 and t 0 motivates to choose this same distribution for t 0 .This choice is discussed in the final section of this article.

B. STOCHASTIC MODELING OF MEASUREMENT NOISES
Due to its design, the connected sleeve necessarily introduces measurement noises that are summarized by model M 2 .
More precisely, the placement of the sleeve on the patient's arm introduces a first series of contextual errors.These contextual errors propagate in a second step to the electronic circuit which will in turn introduce errors (uncertainties, sampling, electromagnetic disturbances).Modeling and combining these noises can be done in controlled laboratory experiments.Under the assumptions made above, two sources of noise affecting each measurement can be determined: 1) The context-sensitive noise ε c , which is induced by the positioning of the sleeve.Appendix A-B describes how this noise can be assessed based on geometrical considerations.At a given location i and time t, from ( 13) the resulting noise ε c is a function of the circumference y t , two fixed parameters φ and d 0 and an unknown random vector X 2 = ( h, X θ ) that are independent on i and t.While X θ lives in IR and is such that E[X θ ] = 0, h has the sense of a small height offset concentrated around 0. Thus a centered Gaussian distribution appear as a reasonable hypothesis for the following vector of random parameters: with unknown diagonal covariance matrix 2) The sensor noise ε s , affecting the so-called sensor law, namely the relation between the stretching length, which is assimilated to the arm circumference y t at position i, and the capacitor resistance of the sensor.According to the manufacturer this relation is linear: A series of tests on healthy experimenters was conducted to check for the validity of this assumption and to assess the features of ε s .We found that where ξ ∼ N (0, 1) and ( α, γ ) is a reparametrization of (α, γ ).Details about the statistical validity of this assessment and the values of ( α, γ ) are provided in Appendix A-C.Finally, summing up the separate sources of noise, notice that the sensor actually measures This error propagates with the noise from the electronic circuit: at each location i on the arm, a statistical model for the measured value y * of the circumference transmitted to a smartphone can be formalized as where t , X 2 is given by the forms of ε s and ε c being provided by ( 6) and ( 13), respectively.

C. GLOBAL STOCHASTIC MODEL
From (2-3) and ( 7), a complete writing of the global stochastic model follows.At position i on the arm affected by lymphedema, at time t ≥ t 0 , the stochastic model of an observation of the true circumference is where ε ∼ N 0, σ (i)2 and M (i) is the sum of two instances of models M 1 and M 2 : 1,c defined by ( 2) and (3), and X 2 defined by (4).

IV. CLINICAL DATA
The available dataset, named D 1 , is a cohort of 20 patients operated from breast cancer and experiencing arm growth due to lymphedema (Fig. 12 and Fig. 13).As a reminder, we denote t 0 the delay between the operation and the first visit at the hospital, where a lymphedema is truly detected.From t 0 , each patient receives curative treatment based on compression and massage techniques at regular appointments (every 5 days) for an average period of one month.At each appointment at time t ≥ t 0 , measures {y t } of the circumference of both arms are taken with measuring tape at several positions i every 5 centimeters; z (i) t denotes the measure of the contralateral arm, which is not affected by the lymphedema.The noises resulting from the measurements were considered as negligible since these measurements are repeated several times for each position and averaged.Note that the weights were measured too, in order to check that length variations were only due to the growth of the lymphedema.
A treatment sequence reduces the speed at which the circumference of the arm grows and sometimes diminishes it, but it cannot completely eliminate the lymphedema (Fig. 14).Stopping a treatment sequence usually results in new growth of the arm until the next treatment sequence.This is due to the biological mechanism that causes lymphedema, which is not affected by the treatment [6].Denoting t i the end of the ith occurence of a treatment sequence, it is therefore possible to divide D 1 into two subgroups: Thus, each patient trajectory is divided between one trajectory belonging to D 1,A , for which t 0 is unknown, and others belonging to D 1,B , for which t 0 is known and corresponds to the last value t i (i ≥ 1): they correspond to a set of ''virtual patients'' that experiment the same biological dynamics of arm growth, but do not share similar growth parameters because of the effects of treatments.This assumption, on the basis of biological knowledge [6], is very similar to a relaxation of the so-called As Bad As Old (ABAO) concept characterizing the reliability of complex mechanical systems submitted to corrective maintenance [49].

V. STATISTICAL MODELING AND STOCHASTIC INVERSION A. STATISTICAL FRAMEWORK
To simplify the notations, let t = 0 denote the time of operation for each patient belonging to the D 1 cohort.The period of observation for a given patient j ∈ {1, . . ., with: • t 0,j being the time of first return to the hospital (after the lymphedema has begun at unknown time t 0,j ); • t +1,j being the time at which the +1 sequence of curative treatment ends for the patient j, and consequently the time at which the growth of the lymphedema restarts freely; • t +1,j > t +1,j being the time at which the +2 sequence of curative treatment begins; • t m,j is the time of last observation for the patient j.Within each range [t ,j , t +1,j ], the patient's arms are measured at several times {t ,k,j } k with k ∈ {0, . . ., q − 1}, such that t ,0,j = t ,j and t ,q,j = t +1,j .The time period t ,k+1,j − t ,k,j is fixed by the protocol and independent on j (but not always regular).
For a position i ∈ {1, . . ., d} on the arm affected by lymphedema, the observed circumference for patient j at known time t ,k,j , for ≥ 0, is explained by (using the notation of ( 9))

Note that t (i)
,j is a missing (random) observation when = 0 (subcohort D 1,A ): in this case t (i) ,j is the time at which the lymphedema begins.When > 0 (for each member of subcohort D 1,B ), ,j is made equal to t (i) ,j .A second observation equation arises from s ∈ {1, . . ., S} independent, repetitive measures on the contralateral arm: With S = 5, since the small natural variation in arm circumference described by ε in (1) was found to be well approximated by N (0, 0.1), the classical empirical estimator t ,j,s has small variance σ (i)2 t ,j /S 0.02, which can be considered as negligible.t ,k,j } i, ,,k ∈ (IR + * ) d×m×q can be written in a more compact way as Y * j = H X j , d j + U j (12) where X j = {X (i) 1,j , X 2,j } i ∈ IR 3d+2 is a Gaussian vector of missing observations, U j = {ε (i) t ,k,j ,j } i, ,k ∈ IR d×m×q is a centered Gaussian noise, assumed to be independent on X j , is a fixed known parameter vector, and H : IR 3d×4 × IR d×m → (IR + * ) d×q×q is a nonlinear computational model.For the sake of simplicity, in the following we denote Y * and X the fully observed and missing observations, and summarize the set of unknown Gaussian parameters of X by .
The statistical assessment of , linked to observable values Y * j through (12), is similar to the stochastic inversion of the nonlinear model H ; this framework was described and well studied by [44]- [46], among others.Following [45], a Stochastic Expectation Maximization (SEM) algorithm can be used to solve this problem.In an iterative approach, it maximizes the likelihood of the complete data Z = (Y * , X ) by sampling the missing data X according to a current value of the set of parameters , and producing an irreductible Markov chain 1 , . . ., M , . . .that converges consistently towards the maximum likelihood estimator (MLE) of [50].More precisely, the (k +1) iteration of the SEM algorithm has 3 steps: • E step: Compute the conditional density p(.|Y * ; (k) ) of missing data X (k) , given Y * and a current estimate (k) .
• S step: Complete the sample Z (k)  = (Y * , X (k) ) by sampling X (k) from p(.|Y * ; (k) ).• M step: Update (k+1) as the current MLE using Z (k) .We let the interested reader consult the article [45], which provides all the implementation details in our Gaussian case, and describes in particular the use of an MCMC (Hastings-Metropolis) method to carry out step S [51].Following theoretical and practical recommendations [45], [50], [52], [53], estimation results are provided as an average value of computed over the last (stationary) iterations of the SEM algorithm, plus a confidence interval computed either by estimating the Monte Carlo variance or by non-parametric bootstrap (with similar results).
However, it was necessary, to ensure the good behavior of the algorithm, to hybridize it with a specific estimation approach of the distribution of the location parameter t 0 , in order to struggle against the lack of data information about t 0 .Such a problem can frequently appear when dealing with location parameters [54].To overcome this problem, profile maximized likelihoods [55] were computed with SEM by fixing q regular values t 0,j over n 1 intervals (defined between 0 and t 0,j ).The values t 0,j corresponding to the 30 highest values of the q × n 1 profile MLE were selected and used to estimate the distribution of t 0 .
The relevance of the whole estimation process was checked by a simulation approach summarized in Appendix A − D.

C. RESULTS
For the logistic growth function, the estimation results of are placed in Tables 2 and 3, accompanied with standard deviations computed by Monte Carlo.
Estimated growth curves (without noises) are plotted on Fig. 15, while their noisy equivalent are simulated over Fig. 16.Such simulations were used to check that growth models were versatile enough to represent the circumference  growth over time for a given patient, for at least four positions on the arm.While the logistic growth seemed to be nearly better than the other choices, all these functions have been kept for the rest of the analysis.

VI. SENSITIVITY STUDY
Once stochastic inversion is conducted, the uncertainty quantification framework requires to check if the uncertainty affecting Y * is mainly dependent on the uncertainty affecting the meaningful input parameters and not on the various noises introduced in the model of Y * , following a principle formalized in [56].Ranking the effects of input uncertainties is the major task of global sensitivity analysis [57], and conducting  6), while ''Epsilon'' corresponds to the small natural variation ε ∼ N (0, 0.1) in arm circumference, related to Equation (1).such an analysis is therefore required after the assessment.In the current framework, as the estimated Gaussian multivariate distribution is assumed to have diagonal covariance, the use of Sobol' indices for functional output [58], [59] appears relevant.The basic principle of these indices, which determine the part of the variance of the output signal (at each time step) by input variances, is recalled in Appendix A-E.They require to sample from the input parameter distributions estimated in the previous section.
The first-order functional Sobol indices associated with the random input parameters are shown in Fig. 17.They allow to compare at a first order the influences of these parameters on the variance of the output signal, by time steps.Results displayed in these figures arise from the use of the logistic growth function, but the results were found to be similar for the other functions.
As expected, and fortunately, the results show the major influence of the parameters guiding real growth (X 1,a and X 1,b , depending on y ∞ and r), which means that uncertainties do not overwhelm information.The influence of the growth rate r decreases over time, which also seems intuitive.
A key result is that as the model tries to reproduce behavior away from the shoulder, the importance of the sensor noise ξ ∼ N (0, 1) from Equation ( 6) increases significantly.Keeping in mind that the numerical model is a simplified representation, this result however leads to additional efforts in order to ''freeze'' the sensor positioning and make the sleeve threading procedure more robust, e.g. by providing position marks or by taking a photo that validates or invalidates the threading.Finally, we note that the impact of the other parameters involved in contextual noise and sensor-related measurement noise remain very limited.This result allows us to consider some of these parameters as fixed in a simulation step (next section).It also reinforces our idea that the proposed design is efficient.

VII. DESIGNING MEASUREMENT PROTOCOLS
Obtaining a calibrated simulation model thus feeds a digital twin of the experiment.This twin should incorporate, as an additional set of parameters, a protocol for the use of the connected device.It also provides a test strategy to assess the appearance of lymphedema.Indeed the use of statistical tests to detect ruptures in time series is well documented.In particular, non-parametric tests which do not assume any distribution on the data are well-suited for our case.Three popular tests relying on rank statistics have been chosen: Mann-Whitney test [60], Mann-Kendall test [61], [62] and Pettitt test [63].The first one assesses if two populations are equally distributed, the second one is a trend test and the third one searches for a rupture in a time series.These tests have been used over a sliding window, i.e.only making use of the last n w measurements.Although n w cannot be too large as it would induce a big delay in the detection, it cannot be too small either in order to avoid too many false positives.
A sample of 1000 Monte Carlo runs was used to explore the ability of these statistical tests to detect significant magnifications of the arm, using two thresholds of statistical significance (α = 5% and (α = 1%).They are light in terms of implementation and computational cost, so that they can be embedded on a mobile device.Conducted for each of the four sensor positions and the Monte Carlo distribution, the tests determined the mean time distributions for detection of clinically significant magnification for several measurement strategies: • Per day, one to three measurements are made; • A measurement can be defined as a succession of repetitive acquisitions (via a switch), and the median of these repetitions is kept, but should remain lower than 5 minutes.We quickly empirically converged towards the strategies presented over Fig. 23 and Fig. 24.According to the most sensitive Mann-Whitney test, three measurements per day, with several repeated acquisitions, always appear necessary to provide a 90% chance of detecting a significant change in arm circumference in a time window of less than a week, without the ergonomic complication related to the use of the sleeve on the contralateral arm.Incorrect detections, corresponding to simulations for which growth is wrongly detected (before actual growth), were estimated on average at 9% for α = 5% and 12% for α = 1%.Such false positives are annoying, but less so than false negatives that falsely indicate a lack of regime change.Such false negatives were estimated at less than 3%.Note that the significant changes detected in circumferences were always found to be lower than the clinical repair of 2 cm evoked thereinbefore.In practice, five acquisitions are possible in less than five minutes, which pleads for selecting the strategy summarized on Fig. 24.
Clearly, more than three occurrences of a measurement from one repetition would allow the average detection time to fall below one day.But from an ergonomic point of view, such a strategy seems very cumbersome for a patient.We therefore recommend adopting a protocol based on these five acquisitions and the use of the Mann-Whitney and Mann-Kendall tests using a 5% significance threshold.

VIII. DISCUSSION
This paper describes the main steps of a methodology mixing numerical modeling and statistical analysis of the effect of uncertainties that we propose for the preparation of clinical studies associated with connected devices.These steps are part of a more general VVUQ methodology, related to the use of numerical twins in an industrial setting, but to our knowledge it has not been adapted or implemented so far in the case of such connected devices.We show its feasibility and its usefulness to guide the choice of clinical protocols, which would allow to decrease the usually very important cost of empirical trials.The first tests carried out on the Lymphometer device proved satisfactory in terms of technical feasibility.Moreover, such an approach seems to us fundamental in order to iteratively improve the design of these connected devices.Thus, the impact of noise on the signal can be evaluated a priori, and the influence of parameters related to controllable constraints, such as those related to positioning, can be highlighted and adjusted.
In this article, we made the choice of simplifying assumptions about the uncertainties to be taken into account in the analysis.These hypotheses are reflected in the choice of a frequentist statistical approach by likelihood maximization.A Bayesian approach to inversion, which assumes that the unknown parameter vector is itself random, would undoubtedly make it possible to better apprehend these uncertainties and to propose a more relevant study by simulation [46].Markov Chains Monte Carlo (MCMC) methods or RTO (Randomize-Then-Optimize [64]) methods should ideally be used instead of the SEM approach, while some techniques are available to elicit priors on [48].The latter could incorporate constraints on the components of and increase its dimensionality (as, for instance, the relations between θ and φ in Appendix A-B).In this sense, the methodology presented in this paper can certainly be amplified.
Several assumptions could also be relaxed in direct relation with the use case, so as to improve the relevance of the overall model, without changing the essence of the methodology.We cite three of them here.
First, it is conceivable to allow the input Gaussian random variables to be correlated, and to adapt the inversion algorithm for this purpose.In doing so, sensitivity analysis can no longer be conducted using Sobol' indices (which have a less clear meaning in this framework [65]), but other tools such as Hilbert-Schmidt information criteria (HSIC; [66]) or Shapley indices [65], [67] can be used.
Second, several other features of the model could be improved.For instance, in the geometrical analysis conducted in Appendix A-B, it is expected that the horizontal noise is close to be centered, but this assumption is difficult to check.This suggests that a bias could be incorporated, which would play a role in the simulation tasks.
Third, the choice of a lognormal distribution for the time t 0 associated with the beginning of a lymphedema can be criticized on the basis of the literature dedicated to risk and lifetime analysis.Poisson-distributed arrivals are, for instance, an usual hypothesis of such analyses, and deserve to be tested.
Finally, while it is outside the scope of this article, it is likely that this kind of study should insert additional noise due to the possible change of wireless networking technology.Despite its robustness to the range of devices and its good behavior at weak distances, Bluetooth technology is known to have some security weaknesses [68].Its use for collecting personal health data must be questioned according to legislation.Besides, other sources of noise that were not taken into account in this study, as the WiFi [42], could certainly be added to the simulation model to further improve the validity of this model.Nonetheless, the main features of the methodology proposed in this article remain unchanged.

IX. CONCLUSION
Several research avenues are opening up before us, now that we can use simulation to build clinical study protocols for this connected device.This work is undoubtedly a necessary prerequisite to further improve the prototype and it use.Note that the methodological principles apply to other connected objects (only the data generation model changes).Therefore it seems possible to test several technologies in this way, more quickly, and also improve our methodology.For example, the significant progress made in computer vision in recent years [14], [15] may suggest that an alternative solution to the connected sleeve could involve the use of videos taken by cell phones, taken in such a way as to perform a 3D reconstruction of the arm.However, such technologies are still consuming in computational time, so our current ambition is to improve the design of the prototype, while at the same time pool the resources for the clinical study.
All the more so since discussions are underway at HUS and within the Senologic International Society, between many practitioners around the world, that aim to collect and compare a wide range of lymphedema monitoring data.The objective of such a collection would be to specify the characteristics of lymphedema growth patterns that take into account morphotypes and types of surgery.The specialization of growth models could allow numerical twins, such as the one proposed in this paper, to better represent the evolution of lymphedema and improve the quality of use of measuring devices.

A. DETAILS ON GROWTH FUNCTIONS
Frequently encountered in statistical biological growth studies [43], [69], the growth patterns summarized in Table 4 were used to represent the appearance of a biological stimulus causing an increase in arm circumference.They have no real biological value in the present context, but are rather toy  models allowing us to study the robustness of the simulated device, thanks to the variety of situations they allow to represent.While the biological mechanism at the origin of arm enlargement is still completely unsolved [70], the initiation and dynamics of growth remain extremely poorly known, as the vast majority of observations are made once significant growth has occurred.
Using the parameterization in (1), top of Fig. 18 illustrates the difference between the growths, for y 0 = 0.1, y ∞ = 1 and r = 1, while the bottom illustrates the effect of varying the speed of growth for the logistic case.

B. CONTEXTUAL NOISE
The contextual noise (or error) c is caused by the approximate placement of the sleeve on the arm.A first source of error is due to a bad axial placement of the sleeve.While one would want to localize the sensor at arm height h, it is actually   located at height h + h.Then a second source of error is induced by rotating the sensor by an angle θ around the axis perpendicular to the arm.
In order to estimate c , in the vicinity of a sensor the arm is geometrically represented by a cone.The latter is defined by an apex and a perimeter, which is in this case assimilated to the circumference of the arm in the vicinity of the sensor.The apex can be interpreted as a fixed point on which the end of the arm would come to rest.This type of representation that ''freezes'' degrees of freedom is usually used in motion animation, for example to reproduce sign language [71]).Fig. 22 gives an overview of this geometric problem, and  A bad axial placement of the sleeve can then be formalized as the difference in circumference induced by the rotation and the offset evoked hereinbefore.It is recalled that the intersection of a plane with a cone is an ellipse.Moreover, if the plane is perpendicular to the axis of revolution, the intersection is a circle.The cone is entirely determined by the angle φ, which depends on each patient.
The circumference to be measured is P 0 = πd 0 .The height offset places the sensor at a height where the circumference of the arm is P 1 = πd 1 .Finally, the rotation of the device by an angle θ induces a measurement of the circumference which is equal to the perimeter of the ellipse arising from the intersection between the cone and the inclined plane of the angle.We want to determine this circumference P 2 , and c is therefore equal to P 2 − P 0 .One can determine with elementary geometry (Fig. 22) the width a and height b parameters of the ellipse.Referring to the figure, we will call d 1 = BC = d 0 + 2 h tan φ.We will also name a 1 = AD and a 2 = AE.Thus the width parameter a = 1 2 (a 1 + a 2 ).To determine a 1 , we use the three following equations Resolving these equations, we obtain In the same way, we can determine a 2 with the following equations The point I is the middle of the segment AD so a = EI = DI .Although it is not quite straightforward, one can realize that b = HI .In the same way we determined d 1 , we get Once again, we use elementary geometry to determine IJ and AJ : Then we can approximate the circumference of the ellipse using a formula owed to Euler: P 2 = π 2(a 2 + b 2 ).The final noise expression is given by We point out the fact that it is easy to see on Fig. 22 that the angle θ must satisfy the constraint: |θ| < π 2 − φ.Hence tan θ tan φ < 1 and the previous parameters are well defined.Denote X θ = tan π/2 π/2−φ θ which belongs to R and the noise can be considered as a function of (d 0 , h, X θ , φ).
Among these four parameters, d 0 can be directly known from the measurement of the circumference z of the contralateral arm in the stastistical estimation (Section V), using the formula d 0 = z/π, or fixed in Monte Carlo studies (Sections VI and VII).Furthermore, the unknown angle φ is theoretically different for each patient, and should be considered as random.However, note that φ measures the deviation between a straight (theoretical) arm and a real arm, each pressing on the same fictitious point.Based on this observation, preliminary laboratory experiments, conducted on different real arms, showed that the value of φ π/2, is very close to 0 and almost constant (φ 0.065), and can be estimated as follows.Assuming to have N s repeated measures y j of the circumference at a given location at various heights, an estimator of φ is given by φ For this reason, this parameter has been considered deterministic in this study, without loss of methodological generality.Finally the considered contextual noise depends only on the two unknown (considered as random) parameters ( h, X θ ).Besides, note that since φ π/2, then  X θ tan(θ).Since, geometrically, the intuitive expectation of θ is 0, this is the same for X θ .The same rationale applies for the expectation of h , although this parameter is physically bounded by unknown bounds.However, based on the fact that (again for physical reasons) h is highly concentrated around 0, a Gaussian assumption for this random parameter appears as a mild hypothesis.

C. SENSOR LAW
Controlled, repeated measurements made with the connected devices were conducted in laboratory, in order to check the relevance of the stochastic model placed on the sensor noise ε s and to quantify its parameters (see § III-B).A dataset of 168 sensor stretch measurements was obtained by healthy experimenters at Quantmetry in laboratory experiments, following the protocol summarized in Table 5.
Data was collected by measuring the impedance value when the sensor is stretched from its initial length and repeating the measure.The measures and linear regression results are plotted in Fig. 19.Pearson's linear correlation coefficient was found to be 0.95, which agrees with the statement of the manufacturer.

D. CHECKING FOR ESTIMATION VALIDITY
In this section the validity of the statistical estimation described in § V is checked.M = 500 datasets similar to the D 1 observations are repetitively sampled based on the estimates of obtained in § V-C (Tables 2 and 3), for each growth function considered in Appendix A-A.The corresponding MLE were computed.The results of these estimations are summarized in Table 7 for two positions i, that shows the relative Mean Square Error (MSE) associated to each parameter.For such models, the maximum likelihood estimation proposed in § V globally performs correctly (although the estimation for the Von Bertanlanffy model appears slightly less good), which makes us confident in the results presented hereinbefore.

E. SOBOL' INDICES
Sensitivity analysis is used to study how the uncertainty in the output of a model can be related to the input uncertainties [57].Specifically, it allows the modeler to identify the input variables that have a major influence on the output and to prioritize them.The approach proposed by Sobol [72], which can be extended into the functional framework [59], is the most common.We provide here some basic explanations and the general idea, for a better readability.Suppose that f is an integrable square function defined on a d−dimensional space.Hoeffding's decomposition [73] states that it is possible to decompose f as a sum of elementary functions which is unique if 1 0 f i 1 ...i s (x i 1 , . . ., x i s )dx i k = 0, 1 ≤ k ≤ s, {i 1 , . . ., i s } ⊆ {1, . . ., d}, f 0 being constant [72].Assuming that the input variables X are independent, the variance of These indices are between 0 and 1, and their interpretation is straightforward: they express at the first order the percentage of variance explained by the interaction under consideration.As the number of indices grows exponentially with the dimension, the analysis is most often restricted to indices of orders 1 or 2. Statistical methods to estimate these indices are, among others, reviewed in [74].

FIGURE 1 .
FIGURE 1. Examples of lymphedema.Images collected at the Strasbourg University Hospital.

FIGURE 2 .
FIGURE 2. Measurement positions and layout of lymphometer sensors.

FIGURE 5 .
FIGURE 5. Measuring chain of the connected sleeve.

FIGURE 7 .
FIGURE 7. Electronic circuit based on an RC conditioner.

FIGURE 10 .
FIGURE 10.Screenshot of the patient's smartphone application.The final version of this application modifies the alert text to Possible lymphedema detected.

FIGURE 11 .
FIGURE 11.Histogram of arrival times t 0 of patients at the hospital in the early stages of lymphedema.While 17 (on a sample of 20) patient times were collected, 13 arrival times were extracted (< 6 years) since the four others were collected in a historical period during which the alert of lymphedema first stages was less accurate.

FIGURE 12 .
FIGURE 12.Changes of the gap (cm,in absolute value) between the circumferences of the two arms of patients belonging to the D 1 cohort, for position i = 10 cm.The straight line indicates the limit value for which a lymphedema is clinically detected.Circles denote the times at which physiotherapy is practiced.

FIGURE 13 .
FIGURE 13.Changes of the gap (cm,in absolute value) between the circumferences of the two arms of patients belonging to the D 1 cohort, for position i = 20 cm.The straight line indicates the limit value for which a lymphedema is clinically detected.Circles denote the times at which physiotherapy is practiced.

FIGURE 14 .
FIGURE 14. Evolution of the arm circumference (in cm) for two patients belonging to the D 1 cohort.The x−axis corresponds to a range of positions on the arm.
), the complete model for then 1 −sample of output observations Y * j = {y (i) *

FIGURE 15 .
FIGURE 15.Estimated growth curves (without noise, using a logistic function) for 10 patients belonging to the D1 cohort, for positions on arm at 10 (top), 40 cm (middle) and 50 cm (bottom).

FIGURE 16 .
FIGURE 16.Simulated noisy growth curves for a patient belonging to the D1 cohort, for positions on arm at 10 cm (top), 40 cm (middle) and 50 cm (bottom).

FIGURE 18 .
FIGURE 18.Growth model comparison (top) and growth speed comparison for the logistic model (bottom).

FIGURE 19 .
FIGURE 19.Measurements of the stretching as a function of impedance.

FIGURE 20 .
FIGURE 20.Evolution of the standard deviation of σ (y ) as a function of the stretching.

FIGURE 21 .
FIGURE 21.Normal QQ-plot of the linear regression of the standard deviations.

FIGURE 22 .
FIGURE 22. Diagram of the circumference error induced by rotation on a cone.

FIGURE 23 .
FIGURE 23.Distribution of the delays (in days) between the onset or resurgence of lymphedema, generated by simulation, and its detection through measurements performed on four positions of the arm, for a strategy of repeated measurements three times a day.Each measurement is the median of three repeated acquisitions (via a switch).The tests used are Mann-Withney (MW), Mann-Kendall (MK) and Pettitt's change-point detection test, for two significance thresholds (5% and 1%).''Delta'' indicates the maximum difference in circumference found on 1000 simulated samples.The rate of correct detection was between 88% and 93%.

FIGURE 24 .
FIGURE24.Distribution of the delays (in days) between the onset or resurgence of lymphedema, generated by simulation, and its detection from measurements performed on four positions of the arm, for a strategy of repeated measurements three times a day.Each measurement is the median of five repeated acquisitions (via a switch).The tests used are Mann-Withney (MW), Mann-Kendall (MK) and Pettitt's change-point detection test, for two significance thresholds (5% and 1%).''Delta'' indicates the maximum difference in circumference found on 1000 simulated samples.The rate of correct detection was between 89% and 93%.

TABLE 1 .
Main notations used in the modeling process.

TABLE 2 .
Maximum likelihood estimation of the position-dependent components of obtained by SEM, for several positions i , using a logistic growth function.Standard deviations are provided between parentheses.The values for (the more readable parametrizations) E[t 0 ] and Var[t 0 ] are provided in days.

TABLE 3 .
Maximum likelihood estimation of the position-independent components of obtained by SEM, using a logistic growth function.Standard deviations are provided between parentheses.

TABLE 4 .
Growth functions f t , t 0 , y 0 , x, y used for the study (after reparameterization).

TABLE 5 .
Measurement protocol of sensor stretching (m is for ''measurement(s)'' and No for ''number of'').

TABLE 6 .
Estimated linear regression parameters (with standard deviation between parentheses).

Table 6 .
It was noticed that the standard deviation σ (y t ) of ε s 18VOLUME 9, 2021

TABLE 7 .
Relative mean square errors (in %) of the MLE computed over 500 replicated datasets.increaseslinearly with the circumference y