Feature Selection, Construction, and Validation of a Lightweight Model for Foot Function Assessment During Gait With In-Shoe Motion Sensors

Identifying and monitoring overpronation and oversupination in the long term during activities of daily living are essential for people’s ambulatory health. Using an in-shoe motion sensor (IMS) with power-saving functions is a potential solution. In this study, we challenged the development of an estimation model of foot function using the foot center of pressure excursion index (CPEI) as an index via linear multivariate regression, which is sufficiently light for this type of IMS. Data collected from 65 and 17 participants were involved in model construction and validation, respectively. We validated ten scenarios simulating daily living activities, including walking on different surfaces, using different shoes, with or without carrying a bag, and indoors and outdoors. We applied statistical parametric mapping (SPM) to determine significant predictors and performed our original feature selection algorithm, leave-one-subject-out least absolute shrinkage and selection operator (LASSO), to compress the volume of the predictors. We successfully discovered significant sex-specific predictors for foot function estimation from foot motion and constructed large effect-sized sex-specific foot function estimation models that achieved high-precision CPEI estimation. In the validation, the model successfully estimated a maximum of 99.0% and 100.0% males’ and females’ data under the same experimental conditions with the training data and 92.8%–100.0% and 85.8%–100.0% data in different scenarios. The constructed models are effective and possible to provide applications for long-term foot function monitoring by using an IMS.

shown that foot shape significantly affects foot function, and abnormal foot structures even pose the risk of lower limb injuries [1], [2]. Overpronation and oversupination, which are directly related to low and high foot-arch types, are the main categories of extreme foot functions, as shown in Fig. 1. Prior studies have suggested a strong association between these extreme foot functions, highlighting foot symptoms, fall risks, and disabilities such as hallux valgus, shin splints, and even knee osteoarthritis [3], [4], [5], [6], [7], [8].
In daily living, people tend to overpronate or oversupinate their feet to deal with ill-fitting shoes and uneven ground surfaces during walking or running. Overpronation or oversupination may induce exacerbation due to overuse or weakness of several muscles in the lower extremities. In contrast, most people do not recognize it independently during daily activities [9], [10], [11]. Fortunately, previous studies has shown that abnormal foot functions can be corrected by purchasing motion-control shoes or insoles or by providing appropriate physical training if people know their current foot function [9], [12], [13]. Therefore, identifying and monitoring people in the long term during daily living activities should be essential for people's ambulatory health.
Conventionally, foot-type assessments relied on clinicians' examinations or expensive instruments in experimental environments [1]. Foot pronation/supination was commonly assessed through methods such as measuring the rearfoot angle in the frontal plane using a goniometer, X-ray photography, or motion capture marker [10], [14], [15]; morphologically rating the feet using the foot posture index [16]; and calculating the center of pressure excursion index (CPEI) from a dynamic footprint measured by a pressure sensor array [17], [18]. Compared with other measurement methods, overpronation or oversupination assessment using the CPEI was considered more promising because it is highly reliable and removes reliance on clinical experts [17], [18]. Low or negative CPEI values indicate overpronation, and extremely high values indicate oversupination.
Smart wearable sensors have been developed to avoid environmental constraints and reduce measurement costs. Following the concept of CPEI-based assessment, smart instrumented footwear insoles with multiple flexible pressure sensors to classify foot types were proposed as a solution [19], [20]. In addition, smart shoes/insoles with motion sensors inside were also considered promising in various healthcare applications via daily gait analysis, such as Parkinson's disease, gait rehabilitation, frailty detection, and even foot deformity detection [21], [22], [23]. Here, we refer to this smart motion sensor as an "in-shoe motion sensor (IMS)." Compared to foot pressure devices, by processing inertial measurement unit (IMU) signals, in addition to common spatiotemporal gait parameters (GPs), an IMS enables to spontaneously obtain more abundant information, including instantaneous linear and rotational motion and 3-D foot angular posture [21], [24], [25], which are intimately related to gait kinematics.
In general, IMSs can or wirelessly transmit detailed waveforms to a smartphone or server for further analysis with high power consumption. The wireless transmission requires IMSs to be routinely and frequently charged, thus losing usability. In our previous study, we developed a new type of IMS that is small, lightweight, and can be attached to the insoles.
We demonstrated that with optimally designed power-saving operation sequences for IMS and operation mode in practical applications, IMSs could achieve high usability for long-term daily measurement without needing charge the battery for one year [24]. One key point of power saving is that our IMS can finish simple data processing and calculate simple parameters on the edge device. We produced this type of IMS system as a product named A-RROWG. 1 The features mentioned above enable this kind of IMS to collect daily gait data over long periods, regardless of location and time, without the user being aware of the sensor being attached. We considered installing an assessment model on this kind of IMS would be a solution for convenient long-term monitoring of foot function.
Therefore, we challenged the development of a lightweight foot function estimation model that is feasible for an IMS via linear multivariate regression. We chose the CPEI as the foot function reference index and hypothesized that the actual state of the foot function should be expressed via gait. Lots of potential predictors were proposed in previous studies [26], [27], [28], [29], [30]. Previous reports have shown potential predictors for the foot function assessment model, including weight or body mass index (BMI) [26] and the gait kinematics calculated from multiple segments of the lower limbs. Regarding gait kinematics, Chuter [27] discovered that the maximum rearfoot eversion in subjects with overpronated feet is more apparent than normal ones during gait. In contrast, Zhang et al. [28] additionally discovered that overpronated subjects also have greater rearfoot eversion at heel strike (HS) and toe-off (TO) than normal ones. The temporal series kinematics studies of Houck et al. [29] and Levinger et al. [30] indicated that pronated feet have more eversion and medial tibia tilt during the stance phase. However, since the relationship between foot function and motion remains to be clarified, and studies concerning foot function assessment using only foot motion during gait are still rarely seen, more effective predictors need to be determined from foot motion.
Gait is a periodic motion. We considered that the same foot motions concerning foot function are also repeated at the same stage in every gait cycle (GC). Hence, predictors can be determined from gait patterns during specific gait phases associated with foot function. We previously presented the correlation between foot function and inertial foot motion signals via linear correlation analysis and constructed the primary model for foot function estimation at the 2021 IEEE Biomedical Circuits and Systems Conference (BioCAS) [31]. In this article, we improved our model via the approaches.
1) Conducting statistical parametric mapping (SPM) has proven effective in biomechanical studies [32], [33] to determine effective predictors of foot motion. 2) Using demographic data, including age, height, weight, BMI, and several GPs [24] as auxiliary predictors to improve model precision. 3) Using an original method called "leave-one-subjectout least absolute shrinkage and selection operator" (LOSO-LASSO) to reduce redundant predictors and select appropriate predictors.
We considered our approach first makes the model lightweighted enough to fit IMS for long-term healthcare applications, second helps us to elucidate the correlation between foot motion and foot function that makes the model biomechanically interpretable, and third makes model robust against the individual difference of gait. After establishing the model, we also validated it in simulated practical use case scenarios. Different ground surfaces, shoes, and carrying weight influence gait kinematics [34], [35], [36]. For practical use cases in which data can be collected at any time, place, and status, we selected common scenarios, including using different shoes, walking on different surfaces, and carrying different bags to validate the model and the precision in different use cases. Considering the musculoskeletal structure differences between sexes, we will construct a CPEI estimation model for males and females, respectively. Please note that at this study's current feasibility study stage, we conduct all the data processing and model construction tasks in an offline environment (on a personal computer).

II. COMPARED WITH RELATED WORKS
Utilizing the biomechanics knowledge above, Miezal et al. [37] proposed a foot function assessment method based on multiple IMUs mounted on different lower limb segments. Compared with the method proposed in [37], for the IMSs, when we assumed the mid-foot and hindfoot connected as a rigid body, they measured foot motions that originated from only a single segment. Thus, models for using multiple sensors were not suitable for the IMS.
Chen et al. [38] suggested that with the aid of foot pressure sensors, oversupination can be estimated via fast Fourier transforming (FFT) approximate five-second-long walking gait signal patterns measured by a motion sensor only mounted on foot to generate predictors and inputting them to a support vector machine model by then. However, on one side, the type of sensor like A-RROWG does not rely on any other auxiliary sensor devices and requires the computation volume of any GP to be sufficiently light making their approach unsuitable to the IMS. On the other side, in practical use, interpretable models were required in potential healthcare and clinical applications. Therefore, rather than machine learning models, which are always a black box and contain complicated data processing and computations, we considered a linear multivariate regression model should be a feasible candidate.
There are plenty of techniques for feature selection, including LASSO [39], Bayesian methods, e.g., Bayesian LASSO [40], and deep learning methods for sparse learning [41] which have been proposed. Compared to LASSO or Bayesian LASSO, deep learning methods need relatively large training data to tune each weight in the hidden layer, while biomedical datasets usually have smaller sizes. Bayesian LASSO accounts for uncertainty in the choice of regularization parameter and may lead to more accurate prediction than LASSO. However, in contrast, it may require more expertise to interpret.
In LASSO, cross-validation approaches [42] were commonly used for choosing the value of the LASSO tuning parameter. However, conventionally, training sets and validation sets in cross-validation were randomly decided without considering individual differences existed in data. Therefore, to deal with individual differences and ensure the model robust against other subjects, we combined a LOSO process with LASSO. Conceptually near the jackknife resampling method [43], we obtained multiple LASSO analysis results by looping the LOSO process for all subjects. By statistically analyzing these results, we can approximate the nature of the population estimator and thus make the LASSO analysis more robust against individual differences.

A. Participants
We recruited two groups of participants of different sexes, ages, heights, and weights; all of their attributes were recorded before the experiments. We successfully collected data from 65 participants (33 males and 32 females), forming the training data for model construction. Hereafter, we refer to this group as Group T. We also successfully collected the data from 17 separately recruited participants (eight males and nine females), forming the other group whose data were used for model validation. Hereafter, we refer to this group as Group V.
All participants could walk independently without assistive devices such as a cane, crutches, or orthotic devices. They had a normal or corrected-to-normal vision, no history of neuromuscular or orthopedic disorders, and no communication impairments. Before the experiments, the experimental procedure was explained to all participants, and then, we obtained their informed consent. The NEC Ethical Review Committee for Life Sciences approved the study on the December 1, 2021 (Approval number: LS2021-015).
B. Other Recommendations 1) Experiment for Training Data Collection: Considering that users could walk at any speed in the practical use case, participants in Group T were asked to walk in a 15-m indoor straight vinyl flooring path at a slower, regular, and faster speed for three round trips (six trials) at each speed. The order of required walking speed was shuffled for each participant. We asked the participants to wear the sports shoes we prepared, which we defined as standard shoes. 2) Experiment for Validation Data Collection: We selected ten different scenarios, which are listed in Table I. We asked the participants to prepare the shoes they were accustomed to for the experiment. For example, sports-type shoes, sneakers, business-type shoes, and high-heeled pumps (heel height less than 3 cm) were allowed. We prepared three bag types with different weights: a rucksack with a 5 kg weight, a shoulder bag with a 2 kg weight, and a handbag with a 2 kg weight. To carry the shoulder bag and handbag, participants could decide which side they felt comfortable with. Participants in Group V were asked to walk in a 15-m straight path under different experimental conditions. They were also asked to walk three round trips indoors and were allowed to choose one, two, or three round trips outdoors to reduce any burden induced by hot or cold temperatures.

C. Devices and Systems for Experiment
The experimental setup is shown in Fig. 2. Two IMSs with the same circuit structure as A-RROWG were embedded in an insole placed under both feet arches near the calcaneus side to ensure that participants could walk naturally when measuring foot motion. The IMS also contained a six-axis IMU (BMI 160, Bosch Sensortec, Reutlingen, Germany), an ARM Cortex-M4F microcontrol unit (MCU) (nRF52832, CPU: 64 MHz, RAM: 64 kB, ROM: 512 kB, Nordic Semiconductor, Trondheim, Norway), an EEPROM (S-24C32C, 32k bit, ABLIC, Tokyo, Japan), a real-time clock (RTC) (RX8130CE, EPSON, Suwa, Japan), and a 3-V lithium-coin battery (CR2430, 300 mAh) [ Fig. 2(a)]. The MCU includes a Bluetooth low energy (BLE) module. The device is lightweight (10 g, including the coin battery) and small (29 × 40 × 7 mm) enough to be placed at the foot's arch. Our IMSs enable to output nine types of foot motion signals with the data sequence numbers of every measurement. The signals include directly measuring three axes of acceleration, i.e., linear motions A x , A y , and A z , three axes of angular velocity, i.e., rotating motions G x , G y , and G z , and three axes of foot-sole posture expressed by Euler angles E x , E y , and E z , which were calculated using a Madgwick filter [44]. The measurement range of acceleration was ±16 G, and the angular velocity was ±2000 • /s. We stopped the powersaving function at this feasibility study stage and recorded the detailed foot motion waveforms in real time by transferring them to a smartphone via Bluetooth universal asynchronous receiver/transmitter (UART) mode. Due to the packet loss induced by limitations of Bluetooth communication capacity and conditions, partial data loss occurred in the recorded foot motion waveforms. Please note that in the practical use case, the raw waveform will be only processed inside the edge device without transferring to the smart phone, and only calculated final results will be transferred to the smartphone. There should be no losing data inside the edge device.
The CPEI was measured using the foot pressure measurement system, Pedar 2 (novel GmbH, Munich, Germany, referred to as "Pedar" hereafter). Pedar consists of a pair of insole-shaped 2-D foot pressure sensor arrays, a data recording terminal, and batteries. The pressure sensors were laid on the IMS-equipped insole in shoes [ Fig. 2 ; the data recording terminal and batteries were attached to a participant's waist [not shown in Fig. 2(b)]. The data were transferred to the terminal by a cable and recorded onto an SD memory card. Pedar recorded the distribution of foot pressure and directly output the center of pressure (CoP) trajectories, composed of xand y-CoP coordinates in ".fgt" files, foot pressure data of every pixel of the sensor in ".asc" files, and every frame of the dynamic footprint in ".sol" files using the data processing software of Pedar. In the dynamic footprint image, the resolution of each pixel is 5 mm. To calculate the CPEI in accordance with the method in [18], we manually measured every participant's foot width and the trisection of the foot length from the toe from the dynamic footprint image [ Fig. 2(b)]. Both IMSs and Pedar were operated at a sampling frequency of 100 Hz.

D. Signal Processing
We conducted all data processing and computations on MATLAB (Mathworks, Natick, MA, USA) in this study.
In each dependent walking trial, the CoP waveforms were divided into strides by detecting HS events using a thresholding method, and then in each stride, partial waveforms in the stance phase were extracted with the aid of the system-synchronized total foot pressure waveform [45]. The first and last strides of each trial were excluded first to ensure that all strides were derived from a uniform walking speed. Next, we temporally normalized the CoP waveform to apply a functional depth outlier detection method [46] to exclude extreme strides from all strides of one foot. After that, the CPEIs of the remaining strides were calculated in accordance with prior work [18]. Finally, the CPEIs of the remaining strides were averaged, respectively, in corresponding trials, and the average CPEI of one trial was treated as one measurement where each participant had a total of 18 measurements for both feet.
In every walking trial, the defective parts of foot motion waveforms were first linearly interpolated in accordance with the losing data sequence numbers. Interpolated data points were labeled. The acceleration values were then corrected to the global coordinates in each independent trial. After that, the waveforms were partitioned into individual strides by detecting HS events, and the stance and swing phases in each stride were divided by detecting TO events. The HS and TO detection on the IMS signal was based on the gait event detection algorithm demonstrated in our prior work, which was aided by A y signals [47]. In addition, the following 11 GPs in one stride were used, where the first four concern foot eversion postures: maximum foot sole posture in the frontal plane (E y ) (GP1), average E y during the stance phase (GP2), E y at HS (GP3), E y at TO (GP4), stride length (GP5), gait speed (GP6), maximum E x in dorsiflexion and plantarflexion directions (GP7 and GP8), maximum circumduction (GP9), maximum foot height (GP10), and toe in/out angle in the transverse plane (GP11) [24] [ Fig. 2(c)]. Note that GP5, GP9, and GP10 were normalized by each participant's height. We referred to the posture in which the foot sole completely touched the ground (around 21 percent gait cycle (%GC)-25%GC) as "foot-flat," a neutral posture. The average signals during the foot-flat of each stride were treated as offsets. The offsets were subtracted from all signals in each stride. Furthermore, we normalized the amplitude of acceleration and angular velocity signals in each stride by using the corresponding maximum instantaneous walking velocity during the swing phase, which was calculated by integrating A y from a foot-flat to the end of the stride.
The first and last strides of each trial in that participants did not walk under a uniform speed were excluded. In the next step, we excluded strides whose proportion of interpolated data points in one stride was over 25%. Then, we temporally normalized the remaining strides using a fixed subdivision of the gait phase that varies the stance phase from 1%GC to 60%GC and the swing phase from 61%GC to 100%GC, which changed one stride into a 100 × 9 matrix. Next, as the step of an outlier stride exclusion, we also used a similar functional depth outlier detection [46] method to exclude extreme strides from each trial. GPs in the same excluded stride were eliminated as well. In the final step, the average normalized waveform and GPs of one trial were calculated from the remaining strides, and these data formed a 100 × 9 × N data cube, where N is the number of dataset. For males and females, N = 1132 and 1087, respectively. Note that when all strides in one trial were excluded, the average of this trial is no longer recorded.

E. Feature Selection and Model Construction
In this study, we conducted SPM to analyze the correlation between foot function and foot motion in detailed gait phases. If significant %GCs existed, the foot motion signals in these %GCs could be used to create predictors for foot function assessment. Fig. 3 shows the process of feature selection and model construction.
During the SPM analysis, it was implemented in a hierarchical manner, similar to the analysis of variance (ANOVA) with a post hoc t-test. In our case, one stride of foot motion was first treated as a time-series multidimensional vector field from 1%GC to 100%GC, and the vector in which the %GCs significantly correlated with foot function was analyzed statistically. Specific vector components (scalar field), e.g., A x and G z , in those %GCs with statistical significances that reached the vector field level should then be tested post hoc.
As shown in Fig. 3(b), because the CPEI is an analog quantity, a canonical correlation analysis (CCA) with SPM (SPM-CCA) was applied [34]. For each sex, the data cube and CPEI were the paired input into the CCA. The %GCs whose test statistic of the CCA exceeded a critical test statistic threshold calculated in accordance with the random field theory (RFT) [48] were determined as significant %GCs. The level of significance was set as p < 0.05. As a post hoc test, only data in significant %GCs were further investigated by Pearson's correlation (PC) analysis with SPM (SPM-PC) in each component of the signal vector. For each component, similarly, the %GCs whose test statistic of the PC exceeded an RFT-based critical test statistic threshold were judged as the final CPEI-correlated significant %GCs in each component. Because there were nine components in the vector, we conducted Šidák correction [49] on a level of significance in which p c < 0.0057.
After the SPM analysis, partial CPEI-correlated significant %GCs continuously appeared, acting as clusters, which we called gait phase clusters (GPCs). Incidentally, because foot motion is temporally successive, we considered that the integral average value could represent foot motion during those GPCs. We could use every signal in the GPC as a predictor variable. However, as previously mentioned, to make sure the assessment model is sufficiently light, we compressed the data by treating each GPC as a whole, where the integral average of the signal amplitude in the GPC, i.e., the average motion intensity of each GPC, was output as a single predictor, as expressed by the following equation: where C i means the ith IMS predictor, T s and T e mean the start and end of %GCs of GPCs, respectively, and W means the waveform of the foot motion signal where In the other case where CPEIcorrelated significant %GCs appeared discretely, the signal amplitude in these %GCs was directly guided as predictors [see Fig. 3(b)]. Here, we called these IMS predictors. Similarly, we preferentially used a linear regression method for prediction to lighten the model. Accompanying the IMS predictors, GPs and individual physical attributes (IPA), including age, height, weight, and BMI, were also conducted as candidate predictors. In the LOSO-LASSO processing, as shown in Fig. 3(c), in each LOSO process, the data of the uth participant were first excluded, then the remaining data were subjected to LASSO analysis. After completing the LOSO process, we can obtain total U LASSO analysis results and statistically analyze these results to approximate the nature of the population estimator. Here, U means the total number of participants.
We used the "lasso" function in MATLAB for LASSO computation. After inputting a fixed 1 × 100 regularization coefficient vector as one of the function outputs, we obtained a LASSO coefficient matrix with a size of C × 100, where C is the total number of candidate predictors. The regularization coefficient vector can be preliminarily obtained by inputting all datasets into the "lasso" function. Here, the elements in the regularization coefficient vector were geometric sequences, where the largest coefficient made the last column of the LASSO coefficient matrix all zeros. The vector elements were determined by performing the "lasso" function on partial training data in a preliminary stage. By looping the LOSO process, we obtained U label matrices, Λ u s, which were formed by substituting nonzero elements in LASSO coefficient matrices with 1. Here, Λ u means the label matrix obtained in the uth loop. By summing all Λ u , a matrix with a total counter Λ 0 was obtained, where the elements over 95% × U (males case: 32 and females case: 31) in this matrix were substituted with 1 while the remaining were substituted with 0, forming a final label matrix . The 100 column vectors ( 1 -100 ) mean the 100 LASSO-recommended predictor combinations for different regularization coefficients selected, where 1 and 0 means the predictor was and was not selected, respectively. Combinations 1 -100 were then conducted to construct 100 candidate models H 1 -H 100 . As the regularization coefficient increases, the number of selected predictors becomes fewer. In particular, when the elements of the regularization coefficient vector were ordered from smallest to largest, the elements in 1 were all ones, while those in 100 were all zeros. Consequently, the next task should be to determine the optimal assessment model for the candidates.

F. Model Evaluation and Validation
All candidate models H 1 -H 100 were evaluated via leaveone-subject-out cross-validation (LOSOCV), and Akaike's information criterion (AIC) [50] was applied as the evaluation index. Each H i had an AIC value, and that with the lowest AIC value was chosen as the optimal one, denoted as M o [ Fig. 4(a)]. In the LOSOCV of M o , Case 2 intraclass correlation coefficient, ICC(2,1), and ICC(2, k) [51] were used to evaluate the degree of absolute agreement between the true and estimated CPEI in a single rater case, k. Additionally, the mean absolute error (MAE) and coefficient of correlation (r ) of the true versus estimated value were also used as the evaluation indices of LOSOCV. The coefficient of determination (R 2 ) of the multivariate regression models using all training data (not LOSOCV) was also evaluated. The guidelines for interpreting the ICC interrater agreement were excellent (>0.749), good (0.600-0.749), fair (0.400-0.599), and poor (<0.400) [52], those for interpreting r were none (<0.100), small (0.100-0.299), medium (0.300-0.499), and large (>0.499), and those for interpreting R 2 were none (<0.020), small (0.020-0.129), medium (0.130-0.259), and large (>0.259) [53]. For comparison, models derived by optimizing three other patterns of predictor combinations in the same process as M o , M 1 : IPA only, M 2 : M 1 plus GP1-GP4, and M 3 : M 2 plus GP5-GP11 were also constructed and evaluated [ Fig. 4(b)].
Bland-Altman (BA) plots [54], [55] were applied as the tool to evaluate the limit of agreement (LoA) between the Pedar method and the proposed method. Both the samplebased LoA and the confidence limits of LoA in the population were calculated. If the differences and averages between the two methods followed a normal distribution, which was preliminarily tested by a Kolmogorov-Smirnov test, we applied a t-test and PC test to examine whether there was a fixed and proportional bias. The LoA of the 95% confidence interval was determined from the perfect agreement (PA) line ± 1. In the model validation stage, we evaluate the validity by the ratio of validation data in Group V, whose BA plots were within the agreement range, i.e., the success rate of measurements, denoted as Q A . The optimistic agreement range, i.e., the range between UULoA and LLLoA, was applied here. Because the validation data size was limited, we utilized a probability-distribution-based method to estimate Q A to eliminate the randomness.
We hypothesized that the residual of BA plots of training and validation data to the PA line, denoted as R A and R V , followed the normal distribution, Here, µs and σ s mean the averages and SD, respectively. Because the model was based on multivariate regression, theoretically, µ A ≡ 0. Furthermore, because of the limited data size, we calculated the 95% confidence levels of µ V , σ A , and σ V and obtained their upper and lower limits, (µ VL , µ VU ), (σ AL , σ AU ), and (σ VL , σ VU ), respectively. Hence, if we use an optimistic agreement range, the agreement range of the residual should be fixed as −1.96σ AL to 1.96σ AU . By then, Q A should be in the area of N (µ V , σ 2 V ) inside the interval of −1.96σ AL to 1.96σ AU . Because µ V and σ V are independent of each other, the largest and smallest areas for N (µ Vi , σ 2 Vi ) subject to µ Vi ∈ [µ VL , µ VU ] and σ Vi ∈ [σ VL , σ VU ] would be the upper and lower limit of Q A , denoted as Q AU and Q AL , which can be expressed by the following equations: Q AL = min min In the other statistical analyses, t-tests and ANOVA were used to compare the differences between two groups and among three or more. All the levels of importance were set as p < 0.05. In the post hoc of ANOVA, Šidák correction was also applied to correct the p-values.

A. Participants' Characteristics and Pedar-Measured CPEIs
The participants' characteristics in Groups T and V are shown in Tables II and III, respectively. Moreover, the results of statistical analyses concerning CPEIs in various conditions are shown in Table IV. For all, the statistics of males and females were collected, respectively.
From Tables II and IV, the Pedar-measured CPEIs of males were significantly bigger than those of females ( p < 0.001), which indicates females pronated their feet more in gait. For both males and females, CPEIs significantly varied in different gait speed categories according to ANOVA. The CPEIs at normal speed had the maximum mean values, while those at fast speed had the minimum ones. In addition, the CPEIs at the slow speed had the minimum SD, while those at the fast speed had the maximum. Specially, the CPEIs at the fast speed were significantly smaller than those at normal speed. Comparing the results between Groups T and V under the same conditions (total in Table II versus S1 in Table III), there were no significant differences in both sexes (Table IV), which suggested the stability of our measurements.
The mean and SD of CPEIs of all ten scenarios are listed in Table III, and the results of the comparison are shown in Table IV. In Table IV, we included several representative pairs to introduce whether different walking conditions would affect CPEI values. ANOVA and its post hoc suggested that even walking under different assumed usual conditions, the CPEIs did not perform differently in females, and we did not find any significant differences between listed pairs for males in multiple comparisons. Although not shown in Table IV, significant differences were found in males between S4 and S10 (walking on vinyl with rucksack versus on lawn without bag), and S2 and S10 (walking on vinyl with handbag versus on lawn without bag).
In addition, we also examined the impact of a single bodyside-weighted condition on left-to-right CPEI difference. As a result, for both sexes, there were no significant differences in left-to-right CPEI difference whether with a bag on a single side of the body or not, which suggested that limited weight on a single side of the body will not affect the characteristics of foot function.

B. SPM Analysis
Compared with the males and females, their average waveform appeared approximately similar. In contrast, the SD of waveforms, particularly in the frontal and horizontal plane (G y , G z , E y , and E z ), appeared to have more different shapes (Fig. 5).
The results of SPM-CCA depicted that for both sexes, the foot motion signal vectors in all GCs significantly correlated with the CPEIs. The statistic SPM{t} curves of the post hoc SPM-PC analysis depicted the strength of the positive (+) and negative (−) correlation between CPEIs and each type of foot motion signal. The sections of curves that exceeded critical thresholds depicted the significant GC intervals that Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. correlated with the CPEIs, defined as GPCs here. The GPCs in acceleration signals appeared more fragmentary because the smoothness of the acceleration waveform was smaller than  that of the angular velocities and sole-to-ground Euler angles. The shape of the statistic SPM{t} curves and the location of GPCs between males and females were different, which may indicate that foot function affects males and females differently (Fig. 5). As a result, 69 and 45 GPCs, i.e., IMS predictors, were obtained for males and females, respectively.

C. Feature Selection
To obtain the final optimal predictor combination M o , including IPA and GP predictors, for males and females, a total of 84 and 60 candidate predictors were input to LOSO-LASSO, respectively. By finding the lowest AIC, we determined M o for males and females, with 18 and 10 predictors finally selected, respectively (Fig. 6). The final predictors and their correlation analyses with the CPEI are listed in Tables V and VI. Regarding the IPA and GP predictors, the heightnormalized circumduction (GP9) was selected for both sexes (r = 0.162 and 0.271, both with a small effect size), while age, BMI, and average foot sole posture in the frontal plane (GP2) were additionally selected for males and only weight was additionally selected for females. GP2 had the highest correlation with the CPEI in males' predictors (r = −0.367, medium effect size). The results suggested that BMI and weight positively correlated with the CPEI, meaning that participants with higher body mass or weight had less overpronated foot function.
The signal type origins and IMS predictor gait phases can be referenced from the "Detail" column. Spatially, foot motions in the frontal (xz) and horizontal (xy) planes were suggested to be essential for foot function assessment (C m5 -C m8 , C m11 -C m18 , C f 3 -C f 10 , and GP10) for both males and females. Temporally, these GPCs interspersed over many gait periods were seen in both sexes, whereas the distribution of GPCs was different. In particular, we, respectively, observed that partial predictors originating from the accelerations and angular velocities signals were distributed before preswing only in males (C m5 , C m6 , C m9 , C m10 , C m11 , and C m15 ) while concentrated in the terminal stance phase. In contrast, partial predictors originating from the foot-sole posture were distributed only in the early stance phase in females (C f 8 and C f 10 ) (Fig. 7).
By referencing the mean value and linear correlation coefficient of the predictor with the CPEI, the direction of foot motions during these phases and its changing trend with the CPEI increasing (supination trend) can be determined. For males, compared with overpronated feet, normal feet mainly had less anterior deceleration (C m9 , positive value) at early loading responses, a higher internal rotation velocity (C m15 ) during the early mid-stance, and more medial acceleration and fewer eversion postures during the terminal stance (C m5 and C m16 ). For females, C f 8 , which has the biggest correlation coefficient (r = 0.320, medium effect size), originated from E y during 1%GC-22%GC. Because we set foot-flat as a neutral position, its negative mean value and positive correlation coefficient showed that the foot with a bigger range of motion in the inverse direction (the difference from E y at 21%GC-25%GC to E y at HS is negative) had a bigger CPEI. Compared with overpronation, normal feet also mainly have more internal rotation postures during the early stance phases (C f 10 ) and have fewer eversion postures during the late stance phases (C f 9 ).
We additionally found a number of significant predictors in foot motion during the swing phase. Our results suggested that normal participants tend to have larger eversion rotation velocities (females, C f 5 ) during the initial swing, small internal (or big external) rotation velocities (females, C f 7 ), or larger lateral decelerations during the mid-swing (males, positive value, C m8 ), larger eversion velocities (females, C f 6 ), larger inversions (males, negative value, C m17 ), and external rotation (males, C m18 ) postures before HS.
Furthermore, we also listed the coefficients of predictors and their p-values in a multivariate regression model in the tables. Although the linear correlation coefficient with the CPEI of all predictors only had a small to medium effect size, the constructed models of both males and females achieved a large effect size (R 2 = 0.469, p < 0.001 and R 2 = 0.407, p < 0.001, respectively).  as a single rater, and had a good ICC agreement with the true values when using IMS and other raters, e.g., Pedar to assess the foot function spontaneously. These results suggested that foot function can only be estimated well by applying IMS predictors.

D. Model Evaluation
According to the BA plots, the average of true and estimated values linearly correlated with their differences, i.e., there were proportional biases ( p < 0.001, < 0.001) between the two raters. Note that in Fig. 8, the lines of LoAs were very near to their confidence limit lines because of the relatively big data size.

E. Model Validation
According to the BA plots shown in Fig. 9, the model successfully assessed a maximum of 99.7% and 100.0% of the data of males and females, respectively, in Group V under the same experimental conditions of the training data (S1), which suggested that the models had high robustness to different participants. In the indoor environment, no matter how the participants held the baggage, the models wellestimated the validation data in that there were no apparent negative changes in the Q A of measurements (S2-S4). In the outdoor environment (S6-S10), the upper limit of the Q A of males and females maintained high levels (males: 99.0%-100.0%; females: 90.8%-97.8%) despite walking on different ground surfaces, even in the wild paths (S6-S7 and S9-S10).
We observed that the Q A s of females decreased in the scenarios right after shoe shifting (S4-S5 and S7-S8) although it recovered in the following scenario, while this phenomenon was not observed in males.

A. Applying the Model to an IMS
In this study, we developed a simple and effective model for CPEI estimation from straight walking foot motion with high precision using the MAE and large effect sized R 2 via multivariate regression. According to the ICC values for both sexes, the model achieved a "fair" agreement when considering using IMS as a single rater and achieved a "good" agreement when considering using both IMS and Pedar as raters in the same CPEI estimation. Referencing the MAE values, the model precision of males and females enabled discrimination of 0.65× and 0.63× SD of the training data. This precision satisfies the requirement for overpronation/normal/oversupination classification when following the proposal to categorize the foot function using quintiles [18].
In the validation, the model successfully estimated 92.8% to 100.0% and 85.8% to 100.0% (both are the ranges of the upper limit of Q A in S1-S8) data in the current precision, respectively. These data were collected by using different shoes, walking indoors or outdoors, and with or without baggage. This suggests that the proposed model has a high potential for daily living measurement. This model only involves a few steps of simple addition and multiplication, which can be afforded by the calculation capacity and easily integrated into the conventional algorithm of our A-RROWG.
A-RROWG automatically detects walking bouts, i.e., continuous walking motions, and excludes motions deviating from common walking, e.g., turning, walking backward, running, and jumping. A model constructed from straight walking foot motions should be sufficient for this type of sensor. Next, we will discuss how the model was used in A-RROWG during daily living measurement. On the basis of our previously developed power management algorithm [24], to extend battery life, there are three stages of operation sequences for one measurement cycle of the IMS. In Stage 1, the IMS was set to sleep mode, and measurements were only performed within a predetermined period, e.g., from 9 to 11 A.M. During that time, the IMS moved to Stage 2, in which it was woken up by the RTC and entered a suspended mode waiting for the arrival of walking signals. Otherwise, it would return to Stage 1. If walking signals were detected and repetitive walking was confirmed, the IMS moved to Stage 3, which ran at full sampling rate to record walking signals in the IMS buffer. Otherwise, it would also return to Stage 1. After a certain number of strides were analyzed, the IMS moved back to Stage 1 and waited for the next predetermined measurement time [ Fig. 10(a)].
Previously, at Stage 3, the IMS could calculate GP5-GP11 as well as GP1 of the ith stride by running an online foot-soleangle-based stride-segmentation algorithm [24]. In this algorithm, the indices of the first and second valley ( j i v and j i+1 v ) and peak ( j i p and j i+1 p ) of E x in the buffer were detected and then a part of the signals between two middle points around the foot-flat points, 1/2( j i v + j i p ) and 1/2( j i+1 v + j i+1 p ), was segmented for parameter calculation. To calculate GP2-GP4 and construct IMS predictors, we integrated a previously developed online gait event detection algorithm by detecting the specific signal features from A y [39] into the IMS operation sequences. In this algorithm, the IMS detected the indices of the first and second HSs ( j i HS and j i+1 HS ) between j i v and 1/2( j i v + j i p ) and between j i+1 v and 1/2( j i+1 v + j i+1 p ), and then detected the TO in this stride between 1/2( j i v + j i p ) and j i p [ Fig. 10(b)]. Note that the unit for all start and end points of GPCs for IMS predictor construction shown in Tables V and VI was depicted as %GC, which was derived from temporally normalizing the waveforms of one stride. Therefore, in this case, we must convert the %GC number to the distance from j i HS on the real-time axis to obtain the indices of the start and end points of GPCs in the buffer to construct IMS predictors.
By inputting the mandatory predictors into the constructed model installed in the IMS, the CPEI was finally obtained in the IMS. Note that predictors not inputted into the model can also be utilized for other purposes [ Fig. 10(b) and (c)]. We considered our IMS can be applied to long-term daily foot healthcare applications, as well as be feasible for oneshot measurement for ambulatory care and foot measuring by shoe fitters, as long as we preliminary tune the frequency of the measurement cycle in one day, e.g., from three times a day to once a minute.

B. CPEI Obtained via Pedar
According to a previous study by Menz et al. [18] using the Framingham Foot Study Cohort involving big sample sizes of participants (males: 1477 and females: 1901), males have a different distribution and a higher average CPEI than females. The results in this study agreed with their findings, which also means the measurements of the CPEI in our studies were effective. However, the average CPEI of our participants was lower than that of their cohorts in both sexes. According to the findings in [56], [57], and [58], the reason might be that Asians' feet tend to have lower CPEIs, i.e., more pronated feet than other races because they have a wider forefoot width and lower longitudinal arch. This indicates that although the CPEI can reflect the degree of overpronation or oversupination, the cutoff value for categorizing foot function needs to be observantly determined in line with sex and race in the applications.
We observed that the CPEI levels and variations changed with the self-controlled speed (Table II). In particular, when participants were asked to walk fast, the CPEI was significantly smaller than walking at a comfortable speed in both sexes, which agreed with the previous findings that a higher walking speed increases foot eversion and shifting the loading on the forefoot from the fifth to the first metatarsophalangeal joint side [59], [60]. However, we also observed that the average CPEI in the self-controlled slow speed was lower than that in the comfortable speed but higher than that in the fast speed. There are few studies concerning the selfcontrolled slower speed on the CPEI to the best of our knowledge. A possible reason is that instructed slow walking may have affected the nature of participants' gaits, which needs further investigation. The phenomenon mentioned above might suggest that the walking speed should be standardized or the CPEI should be corrected by the walking speed to assess foot function in accordance with the CPEI.
People always carry baggage with them in daily living. We were interested in whether having a load on the body, especially on the unilateral side, would affect the CPEI estimation in daily living because, under these states, the gait symmetry would be changed by the load [61]. This study found that having a load on the body, even on the unilateral side (rucksack: 5 kg and shoulder/handbag: 2 kg), did not significantly change the CPEI on each foot nor the difference of the CPEI between two feet. Furthermore, we observed that regardless of baggage, Q A did not change. However, whether heavier loads change the state of the CPEI needs to be further studied.
Unlike conventional studies that use a floor-mat-type plantar pressure sensor to measure the CPEI from barefoot [18], [57], we used an insole-shaped in-shoe-type sensor in this study. This plantar pressure sensor helped us achieve the spontaneous measurement of foot motion and CPEI, as well as outdoor measurements. However, because the sensors were inserted into shoes, sensors were first separated from the ground by the sole of the shoes, which was like laying on a cushion. The sensor insole may become curved or distorted along with the distortion of the shoes. Especially for high-heeled pumps, the foot in the shoes does not always step in the same place during walking. We must tolerate the impact on measurement precision induced by these limitations using the in-shoe type plantar sensor. In this study, we hypothesized that the CPEI reflected the reality of foot function, and the foot function was expressed via walking. Based on this hypothesis, regardless of the walking condition, the CPEI should be a fixed measurement value (here, the average of each walking trail) representing a specific foot function. Also, the variation induced by the walking speed, shoe type, or ground surface was considered a random variance according to the one-way random effect model of Case 1 [51]. However, CPEIs changed significantly when walking fast. As a result, the test-retest reliability of true values, ICC(1,1) of 18 CPEI measurements, was limited to 0.697 ("good" level), which was the other factor limiting model precision.

C. Correlation Between Predictors and CPEI
In this study, the final selected predictors for the optimal model can follow previous findings. Scott et al. [62] and Menz [63] indicated that older people tended to have more pronated feet, which was also observed in our female participants.
As IMSs were located under the foot arch near the calcaneal side, we assumed the mid-foot and rearfoot to be connected as a rigid body. Most of the IMS predictors were found in the frontal and horizontal plane, which was a good result since foot functions were depicted in the frontal plane as well as in the mediolateral (x) direction. The foot sole Euler angle, i.e., foot posture in the frontal plane during the stance phase (C m16 , C f 8 , and C f 9 ), was a significant predictor for both males and females. Although the foot-sole angle was not biomechanically equal to the subtalar joint angle because the tibia tilt in the frontal plane to the ground slightly changed during the stance phase [30], we considered that these predictors indeed partially reflected the overall status of participants' talocrural and subtalar joints. We also observed sex differences in the stance phase predictors (C m5 , C m6 , C m7 , C m9 to C m11 , C m13 , C m15 , C m16 , C f 3 , C f 4 , C f 8 to C f 10 ). In particular, females demonstrated that the CPEI correlated more with the foot posture (angle), while males correlated more with the motion (acceleration and angular velocity). Males and females have different foot anthropometric characteristics [64]. Previous studies [65], [66] indicated there are sex-based variations in CoP trajectories, and talocrural and subtalar joint kinematics in the frontal and horizontal plane during early stance phases have different impact attenuations and adaptations to surface strategies since females tend to have more ligamentous laxity than males [67]. The interosseous talocalcaneal ligament can be considered the central pivot of rotatory stability as well as mediolateral stability [68]. The higher ligamentous laxity of females could be a reason that the impact of weight at HS makes foot deformities much easier to be expressed during the stance phase. In contrast, for males, the overpronation or oversupination deformity was finally expressed only under multiple actions of weight impact and natural pronation motion during the terminal stance and preswing.
We also found several significant predictors in foot motion during the swing phase. Structurally, people swing their lower limbs laterally during the initial swing, then to their maximum position during mid-swing, and pull back medially for landing during the terminal swing [69]. Although many previous studies paid more attention on directly linking the foot function to the kinematics during the stance phase, a previous study on flat foot [70], which was associated with overpronated feet [18], indicated that people with severer flat feet tend to have more supination motion during the initial swing and small external rotation motion during mid-swing, which can be connected with C f 5 , C f 7 , and C m8 . Moreover, during the terminal swing, C m18 agreed that normal participants had more external rotation motion. From the study of Levinger et al. [30], immediately before HS, we can also observe that regardless of whether the participant had flat feet, the tibia tilt to the ground had the same level, whereas flat feet had less inversion hindfoot as the tibia angle and a smaller slope (eversion angular velocity) in eversion, which means flat feet also had less inversion foot posture and less eversion speed. This can connect well with C f 7 and C m17 .
Foot function is highly correlated with the alignment of the subtalar joint, which is the main weight-bearing surface and is mostly found in a valgus position under weight-bearing conditions [71]. Previous studies indicated that high body weight or obsess people presented more pronated feet, i.e., a smaller CPEI [72]. However, in this study, we observed contradictory results that CPEIs had a positive correlation with weight or BMI in our participants of both sexes (r = 0.181 and 0.220, p < 0.001 and < 0.001). Although over 1000 strides were included in the training data, because of the small participant size, we still cannot deny the contingency in them, which suggests we should include more participants for training in the future. However, previous studies indicated that runners have more chance of suffering from foot overpronation because their feet overpronate during running [10], [60]. Participants with a normal BMI may have made a great effort to keep their bodies fit through daily exercise, e.g., jogging. However, in this study, we did not aggregate questionnaires on exercise habits, which should be considered to improve model precision in the future.

D. Limitations of This Study
In this section, we discuss several limitations of this study. Our model has room for improvement in precision and usability if these limitations can be solved in the future.
This study successfully demonstrated the feasibility of estimating foot function by the IMS. However, the ICCs and MAEs showed room for improvement in precision in our model. We only take the integral average of the GPCs' amplitude to construct IMS predictors. This means that the information of waveform segments in GPCs was compressed into a point. Eskofier et al. [21] suggested that gait patterns were essential in many applications where the relevance of adjacent points on the temporal axis should also be considered. To achieve that in the future, we will use more predictors to express the waveform shape in the GPCs into the model. These new predictors can be constructed from, e.g., principle components or autocorrelation coefficients, under the premise that the IMS capacity can afford the calculation.
In the validation, we conducted the experiments one after the other when changing shoes (from S4 to S5 and S7 to S8). However, apparent decreases in hit rate from S4 to S5 and S7 to S8 in females were observed. This might reflect the impact of changing shoes on gait because we needed to give participants more time to acclimatize to the changed shoes. Unlike the male participants' shoes, we observed that the female participants' own shoes had various shapes, e.g., sneakers or pumps with/without high heels that was far different from the shape of the standard shoes. It was possible that participants needed time to adjust their senses to adapt to the change of shoes. After they had adapted, their gait returned to normal, which was considered the reason that the hit rate resumed from S6 to S7 and S9 to S10 despite the ground surface differing from the last experimental scenarios. This phenomenon also agrees with the previous study of Melvin et al. [73], which indicated that unfamiliar or new-bought shoes in daily living should be addressed in practical use. Fortunately, the previous study suggested that people can acclimatize to their new shoes by only continuously walking over 200 strides. Therefore, we may need users to record the start date of using their newbought shoes in the application and mask the data measured on that day in the database from the gait analysis.
We avoided conducting the outdoor experiments on rainy, snowy, sweltering, and cold days and did not consider the impacts of weather and climates on gait. However, these environmental factors, which were reported to affect gait [74], [75], must be considered in the practical daily measurements in the future. To ensure the reliability of the estimation values in practical use, as a possible solution, we may first conduct the weather and GPs information into the gait analysis application to filter those data obtained outdoors in the weather condition mentioned above, second treat the filtered data as defective time-series data, and finally interpolate them via a Kalman filter approach [76].

VI. CONCLUSION
In this study, we successfully developed a simple and highprecision foot function estimation model by merging biomechanical knowledge. The model only included a few steps of simple addition and multiplication, which is feasible for an IMS with a power-saving function for long-term monitoring.
By analyzing the correlation between foot motion and CPEI via SPM, we discovered significant sex-specific predictors for foot function assessment, most of which exist in the foot motion in the frontal plane and can connect well with previous biomechanical findings. The model of males and females achieved large effect-sized R 2 , MAEs of 6.05 and 3.30 of CPEIs and, a fair ICC agreement when using IMS as a single rater. Compared with the model without IMS predictors determined from foot-function-correlated gait phases, the model's precision with them was noticeably improved, suggesting that foot function cannot be estimated well without predictors integrating biomechanical knowledge.
For validation, the model successfully estimated a maximum of 99.0% and 100.0% males' and females' data under the same experimental conditions with the training data, and 92.8%-100.0% and 85.8%-100.0% data in the combinations of different ground surfaces, different shoes, and with/without baggage conditions. These results suggest that our model is reliable and effective in daily foot function assessment.
In the future, we will focus on constructing new predictors to improve model precision, improve robustness in more different daily living environments, validate the model on older adults or children, validate long-term monitoring, and validate intervention studies on correcting overpronation and oversupination.