Kinematic and Kinetic Validation of an Improved Depth Camera Motion Assessment System Using Rigid Bodies

The study of joint kinematics and dynamics has broad clinical applications, including the identification of pathological motions or compensation strategies and the analysis of dynamic stability. High-end motion capture systems, however, are expensive and require dedicated camera spaces with lengthy setup and data processing commitments. Depth cameras, such as the Microsoft Kinect, provide an inexpensive, marker-free alternative at the sacrifice of joint-position accuracy. In this work, we present a fast framework for adding biomechanical constraints to the joint estimates provided by a depth camera system. We also present a new model for the lower lumbar joint angle. We validate key joint position, angle, and velocity measurements against a gold standard active motion-capture system on ten healthy subjects performing sit to stand (STS). Our method showed significant improvement in mean absolute error and intraclass correlation coefficients for the recovered joint angles and position-based metrics. These improvements suggest that depth cameras can provide an accurate and clinically viable method of rapidly assessing the kinematics and kinetics of the STS action, providing data for further analysis using biomechanical or machine learning methods.


I. INTRODUCTION
M USCULOSKELETAL disorders of the spine and knee lead to approximately 39 million visits to clinical care facilities each year in the United States [1]. Despite the prevalence of these conditions, there remains a lack of scalable, accessible, and quantitative assessments for whole body biomechanics in clinic. The current clinical gold standard for documenting functional spine impairment is the measurement of Cobb angles in flexion and extension [2], [3], or the Sagittal Vertical Axis (SVA) from radiographs [4]. Such radiographs are inexpensive and offer a precise measurement of vertebral range of motion, but they only assess static postures. During daily functional activities such as sit-to-stand (STS), the strategy used to stand can vary [5], [6], [7], potentially changing the loads experienced by the joints. This results in both inconsistencies in patient care throughout the recovery process and challenges in understanding the relationship between static observations and functional abilities.
Full-body motion analysis can provide insight into pathological motions and compensation strategies. This analysis is performed in biomechanics labs using gold-standard techniques such as motion capture, force platforms, and surface electromyography. This data can be processed using full-body biomechanics software such as Anybody [8] or OpenSIM [9]. While these systems are a staple in obtaining high resolution kinematic, force, and muscular measurements, their application to regular clinical practice is limited by the time required to setup these measurements, the cost of the equipment, required expertise, and the need for a dedicated motion-capture space.
This has resulted in a dichotomy in analysis, with patients assessed with static measures focused at a particular body segment, while biomechanical labs are able to track and analyse the dynamic motion of the whole-body. Some researchers have explored the used of specialised wearable sensing systems for tracking spine function. Marras developed an exoskeletal tracking system for the lumbar spine to identify motions during occupational tasks, and to identify differences in individuals with low back pain [10], [11]. This system was shown to provide a quantitative kinematic measure of dysfunction based on a specific set of flexion tasks. Taylor and Consmüller developed a system for non-invasive back measurement using flexible strain gauges to measure the curvature of the spine [12], [13]. This system was shown to provide a reliable quantitative assessment of spine shape and range of motion when compared to X-ray. While these systems have been shown to provide good estimates of spine motion and can discriminate between pain and asymptomatic subjects, as they only track spine motion, they are not able to assess changes in full-body motion.
Depth cameras such as the Microsoft Kinect have been used as a marker-less method for assessing function. Unlike the prior motion capture strategies, no hardware (markers, sensors etc.) needs to be attached to the subject. This allows for rapid testing and simplifies clinical deployment. One of the disadvantages of the use of depth cameras is the method used to identify subject landmarks. As no markers are placed on the subject, the location of a subject's joint centres (Fig 1) relies on machine learning to label the pixels corresponding to each body segment. The intersection between body segments is then taken to be the estimated joint location [14]. This form of joint centre data from a depth camera is not unique to the Kinect; alternative depth camera sensors (Orbec, Intel RealSense, VicoVR, Depthsense, PMD, SIC), as well as skeletal tracking systems (Nuitrack, OpenNI) are commercially available. As there is no underlying rigid-body model, the estimated joint centres may be biologically inconsistent. This can lead to errors at the ankle, knee, and hip which complicate the use of depth sensors for later analysis [15]. Researchers have found that retro-reflective markers could be used to supplement the recovery process [16]. The addition of these markers adds to the experiment setup time, and sensitivity to the accuracy of marker placement.
An important distinction between this work and the work performed in the computer vision community is the underlying assumptions and goals of the final system. We develop a tool for rapid clinical assessment by applying a biomechanically realistic model to impose constraints on unconstrained estimates of joint position for a controlled task and environment. In contrast, the problem tackled by a number of these other works are the estimation of human poses across a wide range of tasks while being robust to real-world situations and environments [17].
Two approaches are generally taken when performing pose estimation: creating a skeletal model with a prior on the associated surface geometry, or the generation of a direct map between camera inputs and pose using machine learning. Pavlakos [18] uses convolutional neural networks to estimate the likelihood that a voxel contains a joint. This method resulted in an average 3D joint error of 9.6 cm for the Human3.6M sitting down motion and an average marker reconstruction error of 5 cm. This outperforms a number of other deep learning methods [19], [20] yet still highlights the inherent challenges in joint estimation, particularly in selfoccluding tasks such as sitting. This is consistent with the work by Mehta [21] who adopted a similar approach at the pixel level providing a real time (30 fps) system, but with a mean joint position error of 14-15 cm for the sit-down task. While these methods offer a promising method for versatile estimation of human motion, the current joint estimation error is high relative to the surface fitting methods.
Surface fitting methods usually use a simplified approximation of human shape, consisting of scaled cylinders or ellipsoids that are adjusted to a subjects body morphology. This simplified model is then used to estimate pose by relating these volumes to camera depth data. Recent advances have involved the use of Gaussian models to approximate body shape [22], with Ding [23] developing a method that can estimate joint centre position at 20 fps with an associated position error of 3.5 cm. Shuai [24] used spherical harmonic decomposition rather than Gaussians to track subjects with multiple depth cameras. The resulting model exhibited low marker re-projection error, though this error increased in actions with self occlusion such as sitting. Zhang [14] used a full-body skinned mesh model in conjunction with multiple depth cameras and force sensing shoes to estimate kinematic and dynamic state. The resulting system was slow, but accurate with a mean joint error of 3.8 cm at 6 fps. Unfortunately no results were published for any sitting or standing actions, but the authors do state the the system performance did decrease on self-occluding activities. The lower errors and potential for these methods to run in real time suggests that these methods may be suitable for clinical use, but the need for initial calibration of the shape model by performing an explicit calibration motion [22], [14], [23] or through manual labelling [24] detracts from their use. Similarly, the use of multiple cameras suggests a requirement of a dedicated motion capture space where the system can be setup and left undisturbed between sessions.

A. Contributions
This paper assesses the feasibility of using a single depth camera as a clinical assessment tool for whole-body kinematic and kinetic assessment. As such, this paper prioritises: 1) Accurate anatomical joint centre locations and joint angles which are needed for clinical assessment and future dynamic/musculoskeletal modelling. 2) Fast computation time to allow for immediate review by the clinician. 3) Ease of use by non-specialists in a clinical environment to perform a rapid motion assessment.
To these aims, we present a simple, fast method for taking any pre-estimated joint centre locations, automatically scaling skeletal parameters based on the subject height and recovering kinematic and kinetic measures from the biomechanical model. This system provides accurate, reproducible, and consistent estimates of anatomical joint centre locations, with a mean joint position error of 2.63 cm. An additional estimate of L5/S1 location is added to the kinematic model allowing for assessment of the lower back, an important site of analysis in clinical and occupational health scenarios. The proposed system is used as a post-processing step on the raw Kinect 2 skeleton, with the mean computation speed of 524 frames per second. This suggests this method can be incorporated into many existing real-time methods without a significant drop in frame-rate. Only a single RGB-D camera is used, allowing for deployment clinical space without the need of a dedicated, calibrated motion capture space. The extraction of kinematic states is performed only requiring the user to specify the subject's height, without any manual model tuning, or joint labelling. The entire time to setup the camera, coach the subject to perform the STS action, data collection, and kinematic recovery takes under 1 minute.

II. MODELLING FRAMEWORK
Rigid-body models are commonly used in biomechanics research to estimate joint kinematics and loading [25], [26], [27], [28], [29]. The mathematical formulations for the kinematics, kinetics, and dynamics of these systems can be taken from the robotics literature [30], providing a versatile method for analysing arbitrary rigid-body systems. In this work, we present and evaluate two rigid-body models: 1) Floating rigid-body model: constrained body segment lengths. This form of model is typically used in motion analysis, with no environmental constraints.

Raw Kinect Model
Floating Pelvis Model  2) Fixed-ankle rigid-body model: constrained body segment lengths and angle-ground contact. As the ankles do not move in the sit-to-stand action, a kinematic constraint on ankle position can be used to determine the effect on the recovered kinematic and kinetic measures. The models are driven by the raw Kinect shoulder, hip, knee, and ankle joint centre positions.
A. Model I: Floating rigid-body model 1) Model structure: The human body is commonly modelled as a floating tree system, consisting of a pelvic baselink with serial chains that terminate at the head, hands, and feet [31]. In this work, we study the kinematics of the lower limbs and trunk during STS, neglecting the motion of the arms. We consider a 3D rigid-body model with six segments: (left and right) lower leg, (left and right) upper leg, pelvis, and torso ( Figure 1). The corresponding joint centres are at the ankle, knee, hip, and lower-lumbar joints. The knee joint is modelled as a cylindrical joint. The ankles, hips, and lowerlumbar (L5/S1) joint are modelled as spherical joints with three successive rotations. The order of these rotations is based on relations to common range of motion measures [32]. Segment lengths are determined by recommended heightscaled, sex-specific, allometric relations [27]. These relations provide estimated link length for the upper and lower leg, (l U L and l LL ), shoulder width and hip width, (w S and w P ), the length between the midpoint of the hip centres and L5/S1 (h P ) and the length between L5/S1 and midpoint of the shoulder centres (h T ).
2) Kinematic Formulation: The following mathematical formulation utilizes relative coordinate frame transformations to relate the observed Kinect joint centre positions to corresponding joint angles in the rigid-body model. Local coordinate frames are defined in Fig. 2 pelvic frame (P ) is located at the midpoint between the hip joint centres. All other frame origins are located at the joint centre with Z axis pointing along the segment length in the sagittal plane. We represent the position and orientation of the pelvic frame as an X, Y , Z translation (t P ), and three sequential rotations about the X, Y , and Z axes (θ P X , θ P Y , θ P Z ). This is formulated as the homogeneous transformation between World and Pelvic frames g W,P : where R X , R Y , and R Z are the standard rotation matrices about the X, Y , and Z axes, respectively. The relative transformations between each adjacent segment are defined in the same notation. For example, the transformation between the Pelvis and Torso (T ) frames, g P,T can be written: using the coordinate frames and segment lengths defined in Figure 2. These homogeneous pose matrices are used to estimate the World frame locations of the left and right shoulder centres from their local positions and relative frame transformations: This process can be repeated for each joint centre to create the observation model for all joints: where η are the model parameters: and X I ∈ R 17 is the state vector containing the corresponding translations and rotations:

B. Model II: Rigid-body model with fixed-ankle
Our second rigid-body model introduces an additional constraint by fixing the position of the ankle joint centres. The raw joint centres from the depth camera are not constrained by the ground plane, allowing the ankle to phase through the floor or hover above the floor while the person is standing. During STS, we assume the position of the ankle remains fixed and can be constrained at a fixed position throughout the motion. To implement this constraint, we select the base link to be one of the feet and fix this position to the ground. The mathematical formulation of the observation model is similar to Section II-A, with the model starting at one foot and moving up the leg, before branching at the pelvis into the torso and second leg branches.
The state vector X II ∈ R 14 for the fixed ankle model has three fewer states when compared to the floating pelvis model, with the addition of the ipsilateral ankle rotation θ A ipsi ∈ R 3 and the removal of the pelvis translation and orientation (R 6 ): where the subscripts ipsi and contra refer to the ipsilateral and contralateral sides to the base foot. In our model, the ankle joint centre is fixed at a position based on the observed motion. The X and Z coordinates are taken to be the mean observed position throughout the motion. The Y coordinate is fixed to be equal to the mean anteriorposterior position of the knee at a standing posture.

C. Inverse Kinematics
The kinematic recovery process allows for the estimation of joint angles from observations of joint position. We use two methods of kinematic recovery are: Non-linear Least Squares (NLS) and Unscented Kalman Filtering (UKF).
1) Non-linear Least Squares (NLS): The error between the observed joint centres q and the expected joint centres h obs (η, X) is minimised for each frame k: 2) Unscented Kalman Filtering (UKF): While the NLS method allows for the estimation of the state at each frame, it does not enforce any relationship between sequential states. The UKF balances inaccuracies in measurement with an estimate of the change in state between two successive states [33], [34]. Using the notation for Kalman filters, every observed joint centre at frame k can be written: where v k is a model of the sensor noise which is taken to be white noise: v k ∼ N (0, R k ), R k is the covariance matrix of the Kinect, and the stateX k is the true state that underlies each observation. This observation model is combined with a process model f proc which relates previous estimates of the the true statē X k−1 to the current true state: where w k is a model of the process noise which is taken to be white noise: w k ∼ N (0, Q k ). To set limits on the variation of each of the states between samples, the process covariance Q is fixed to be the expected change due to the velocities σ V . This allows the process covariance to be written explicitly as the diagonal matrix: where ∆t is the time between samples, and the process model as the identity matrix.

D. Planarisation
The recovered 3D kinematic data is planarised for analysis of the sagittal kinematics. A plane is fit to the motion of the Kinect joint centres and the data is projected onto the plane. For the symmetric joint centres (ankles, knees, hips, shoulders), the mean of the sagittal plane positions is taken.
E. Lower lumbar joint (L5/S1) estimation The raw Kinect skeleton provides a single joint centre along the spine. We found the position of this joint centre to be inconsistent between subjects and within single trials. Due to this unreliability and lack of relation to an anatomical landmark, we disregard the mid-spine marker in our kinematic analysis and consider an alternate method for determining a joint between the hip and shoulders in the sagittal plane.
From marker-based motion capture data, the position of the lower lumbar joint, located at L5/S1, can be estimated from pelvic landmarks. An allometric model for the position of L5/S1 in a pelvic frame is presented in Reed et al. [35] Unfortunately, the pelvic orientation is not observable from the Kinect data, so we cannot apply this method.
A model for lumbosacral orientation using knee flexion and trunk inclination is presented by Anderson et al. [36]. In that work, a quadratic model was trained on four subjects in multiple static lifting postures. This model was not assessed on any test data. Using active motion capture data, we tested the Anderson model against the marker-based Reed method. We found that the model did not accurately predict the sacral orientation during STS.
In this work, we present a new regression model for KHL5, the angle formed by the knee, hip, and L5/S1 joints, driven by KHS, the angle formed by the knees, hips, and shoulders (joints present in the Kinect data). This model assumes that coordination between the hip and L5S1 joints follows a predictable pattern across subjects.
The model is trained using marker-based motion capture data (protocol detailed in Section III-B). We define the pelvic frame by anterior and posterior superior iliac spine (ASIS and PSIS) markers shown in Fig. 5. The location of the L5S1 joint centre is based on the model presented in Reed, in which the L5/S1 joint centre in given a frame defined by the ASIS and pubic symphysis (PS) landmarks. The PS landmark is not possible to mark on a clothed subject or easily observable from motion capture data. Using dry pelvis data from Reynolds, et al. [37], we re-derived the position of the L5/S1 joint in the ASIS-PSIS pelvic frame: where the pelvic width P W is the distance between the left and right ASIS landmarks. From the observed L5/S1 joint center, we compute the joint angles in the sagittal plane. A linear model was fit to the data: our recovery framework, this L5S1 model is used after the kinematic recovery and planarisation steps are performed.

III. EXPERIMENTAL VALIDATION
The modelling and kinematic recovery methods introduced in Section II were tested experimentally and validated against marker-based motion capture data on non-clinical subjects.

A. Experimental Protocol
Ten subjects (3F/7M, age: 30.9± 9.6, height: 1.76± 0.12 m, mass: 67.4±11.2 kg) were recruited under informed consent (UCSF IRB 16-21015). Subjects wore close fitting exercise clothing (sports bra, exercise shorts). The chair height was adjusted so that the subject's thighs were parallel to the ground, and their knees directly above their ankles during natural sitting. Subjects were asked to perform STS with their arms folded across their chest, hands touching the opposite elbow. The standing action was otherwise non-coached, with subjects performing the action naturally. Three trials, each consisting of three STS, were recorded for each subject.

B. Active Motion Capture Model
An 8-camera active motion capture system was used in this study to provide a ground-truth estimate of position and orientation of each body segment. Motion data of STS was simultaneously recorded from the Kinect and the motion capture system. The Kinect camera was located 2.5 meters directly in-front of the subject. The Kinect joint centres were streamed at 30Hz and saved with a UNIX timestamp onto a desktop computer. Each trial consisted of 883 ± 87 frames of Kinect depth data, and 14224 ± 1548 frames of Phasespace data for three successive stand-sit-stand motions (around 30 seconds). The Kinect and motion capture systems were time synchronised using a network time protocol server.
Thirty-two LED markers (Phasespace, San Leandro, CA) were recorded at 480Hz with an associated UNIX timestamp. Kinematic recovery was performed offline in MATLAB. The markers were placed onto the subjects skin using adhesive Velcro R based on the Plug-in-Gait markers set [38] ( Figure  5). Additional markers were placed on the medial elbow, knee, and ankle positions to allow for estimates of joint centre from the medio-lateral marker pairs. In cases where the subject's shorts or sports bra obscured the ASIS, PSIS, or XP landmarks, a clip was used to secure the marker to the clothes band at the desired landmark.
In addition to the STS protocol, a dataset was collected for identifying the functional joint centres for each segment using the Recap2 protocol [39]. Subjects were asked to move each joint through its full range of motion three times, starting with the wrists, elbows, and shoulders, before moving the ankles, knees, and hips. The Recap2 protocol was only used to find the functional centres for the ground truth motion capture model. NLS (Section II-C1) was used to recover the instantaneous position and orientation of each limb segment in 3D coordinates. Each limb segment was recovered independently without any modelling of the connection between connected limbs.
The rigid-body models used for each each segment are shown in Figure 5. The coordinate system is based on Wu [40], with the exception of the pelvis segment where the origin is located at the midpoint of the ASIS and PSIS markers. The labelling of the coordinate axes were also modified to simplify plotting and analysis in MATLAB. NLS was used to estimate the marker positions in the local coordinate frame for each subject.
The joint centres for the ground-truth model were recovered using functional methods (hip and shoulder), and marker-based methods (ankle, knee, and L5/S1). Geometric sphere fitting for the hip was chosen based on the recommendation by the ISB [40] and as all subjects were able to move sufficiently [41]. The inter-malleolar point was selected for the ankles from Wu [40], the inter-epicondyle point for the knee [42], and L5/S1 from the allometric model described in Section II-E. The recovered joint-centres were planarised and the relative angles were determined at each frame.

C. Data Analysis
All data processing was performed on previously stored Kinect 2 data on an Intel i7-5820K processor, with 32GB of RAM running Windows 7 Enterprise. Each trial of three stand-sit-stand actions consisted of roughly 880 frames and was post-processed at 524 ± 140 fps. A graphics card was not used to aid computation.
The joint angles recovered from each method were filtered and numerically differentiated to obtain joint velocity estimates. A first-order, low-pass Butterworth filter at 5 Hz was applied to the active motion capture and both rigid-body Kinect models [43], [44]. The raw kinect data was filtered more heavily, using a first-order low-pass Butterworth filter at 2 Hz. This was to account for significant noise in the raw joint angles leading to unrealistic velocity estimates.
We compute the horizontal distance between the shoulder joint and hip joint centres at each frame as well as its velocity. This is a surrogate for the Sagittal Vertical Axis (SVA), a metric for spinal alignment, measured by static radiographs as the distance between C7 and L5S1 [4]. We also compare recovered peak values for several metrics during STS: flexion and extension velocities of the torso, torso inclination angle, and SVA.
We consider each combination of sensor and model (raw Kinect, floating rigid-body Kinect, fixed-ankle rigid-body Kinect, and active motion capture) to be a different rater, allowing for the use of inter-rater reliability assessment methods. Three statistical measures were used to analyse the performance of the raw and rigid-body Kinect models against the active motion capture ground truth: 1) Mean Absolute Error (MAE): identifies the raw position or velocity error between methods.

IV. RESULTS
MAE, CCC, and ICC statistics are given for joint center positions (Table I), joint trajectories (Table II), velocity trajectories (Table III), and selected peak metrics (Table IV). A representative motion capture trace is shown in Figure 6.
Both rigid-body Kinect models (floating and fixed-ankle) achieved significantly lower MAE than the raw Kinect for all joint angle and position measures (Tables I, II). In comparison to the floating model, the fixed-ankle model had significantly less error in the ankle and knee positions and angles, and comparable error in all other measures.
Higher CCC and ICC values indicate greater reliability, relative consistency, and absolute agreement. For the position measures, the fixed-ankle model has higher CCC and ICC values than the raw Kinect in all cases. The floating pelvis model was better than the raw Kinect model, but has poor performance in recovering the ankle positions and angle.
The MAE of the velocity trajectories shown in Table III are comparable between the raw, floating, and fixed-ankle models. This similarity in performance was also seen in the CCC and ICC values, with the knee, hip, trunk and SVA velocities showing high agreement and repeatability for all methods. The estimated ankle velocities had lower ICC and CC values across the methods, but showed an improvement using the fixed-ankle model. The recovery of the angles and angular velocities at L5/S1 were consistently worse than that of the other joints, classified as good-excellent instead of excellent.
The peak measures in Table IV show the performance of the different methods in extracting candidate performance metrics from the trajectory data. The MAE for the floating and fixedankle models are significantly lower than the raw Kinect for the peak SVA and peak flexion velocity, but significantly higher for the flexion angle, and the extension angular velocity.  velocity. The peak trunk flexion angle and extension velocity were found to be consistent (ICC(3,1): excellent), but with lower agreement (ICC(2,1): good/fair).

V. DISCUSSION
The introduction of segment length constraints in the floating rigid-body model resulted in significant improvement in all joint position and angle measures. The fixed-ankle model, which combined segment length constraints with an ankle contact constraint, further improved the recovery of the ankle and knee joint angles. Accurate lower-limb recovery is essential for performing whole-body dynamic analysis. This model had excellent estimates of joint position and velocity trajectories when compared to the gold-standard motion capture. The peak metrics associated with the position data were found to provide good to excellent agreement and consistency. These improvements are also seen peak metrics obtained from the floating-pelvis model, though the MAE, CCC, and ICC values for the for the ankle, knee, and hip are comparable or worse than the fixed-ankle model. In contrast, the raw Kinect had higher MAE in recovered joint angles, notably at the ankle, knee, and hip, and poor-fair agreement for the peak position metrics.
The mean joint position error of 2.63 cm (in the sagittal plane) is substantially lower than those seen in more generalised camera methods. In comparison to the joint errors on the Human3.6m dataset presented by Mehta [21], the lowest mean joint position error for the sit-down action was found to be 10.4 cm reported by Pavlakos [18]. The approach proposed by Shuai [24] for the used of multiple depth cameras resulted in marker residuals of approximately 3.5 cm for the sit down then stand up action (MHAD action 9 [51]), but requires the use of three synchronised Kinect 2 cameras.
The trunk angle trajectory and peak trunk flexion angle were found to be consistent (ICC(3,1): 0.95, 0.84) for the fixedankle model, but with lower absolute agreements (ICC(2,1):  0.95, 0.64). This indicates that there may be a consistent offset between the motion-capture and the Kinect models. The MAE for these values suggest that there is an approximately 4 degree offset in the trunk angles between the active motion capture model and the fixed-ankle model. This offset may be based on the difference in the location of the shoulder centre between the motion capture and the Kinect models. The motion capture model defines the shoulder as the functional centre of the arm which is then fixed in the torso frame. This functional centre was estimated from the subject performing arm windmills in the sagittal plane. This motion also includes motion of the scapula, translating the location of the gleno-humeral centre (GHC). The estimated shoulder centre is likely to be in the centre of that space. During the sit to stand action, the subject's arms were placed across their chest, with their hands touching the opposite elbows. This action protracts the scapula, moving the GHC anteriorly. The Kinect however is estimating the location of the GHC for each frame accounting for this new location for the GHC.
The error in angular velocities were comparable across the raw Kinect and rigid-body methods. To obtain the velocities, the raw Kinect position data was heavily filtered. There is a notable trade-off in the performance of the models between the joint angle and angular velocity MAEs. In particular, the angular velocities of the floating-pelvis model outperforms the fixed-ankle model for all joints other than the ankle. As the floating-model is not constrained the model is able to respond  rapidly to changes in observed position, resulting in lower velocity errors. In contrast, these rapid changes are moderated by the constraints imposed by the fixed-ankle model. The improved position accuracy from the models, combined with accurate angular velocities suggests the suitability of using the fixed-ankle model for further dynamic analysis. This is not possible using the original raw Kinect data due to the inaccurate joint centre positions and corresponding angle errors.
It is important to note that this study was conducted with the person directly in front of the depth camera. From a previous study [52], the authors did find that there is an increase in estimated joint centre position when the camera is set at increasing angles from the subject. These joint centre errors were found to be higher in the lower limbs (+1.5 cm) in both standing and sitting activities, with higher error seen in the limb distal to the camera. While this issue could arise in clinic, especially in cases where the system may be rapidly set-up, the 30 degree offset used in the study is larger than a reasonably expected set-up error. Furthermore, the addition of an constraints to the ankle may improve recovery performance in the lower limbs.

VI. CONCLUSION
This work presents a framework for improving kinematic recovery from depth-camera data through the use of rigid-body modelling. We validated the performance of our proposed method and raw Kinect data through comparison against a ground-truth active motion capture system. The use of a rigidbody model and contact constraints significantly improves the accuracy of joint angles measured by a depth camera. This framework enhances the utility of a depth-camera for quantitative motion analysis. Accurate kinematic and kinetic measurements allow for expansion to dynamic analysis of joint torques. The proposed system has low cost, space, and time requirements and can be easily deployed in clinic, with the total time to setup, collect, and process the motion data taking less than a minute.
While the validation analysis was performed on sagittal plane measures during STS, the modelling framework and kinematic recovery are performed in 3D. This framework will be extensible to actions outside of the plane, but further validation must be performed on the 3D recovery.