Introduction
The countermovement jump (CMJ) is commonly used to measure lower-body explosive power and is characterised by an initial downward movement of the centre of mass (COM), known as countermovement, before toe-off [1]. Performance assessment with CMJ often involves motion capture and measurement of metrics such as peak velocity and vertical jump height. Traditionally, motion capture is performed using wearable sensors, optical motion capture (OMC) equipment and force plates, which are highly accurate. However, compared to smartphones, they are relatively expensive, not readily portable, and their operation requires some level of technical instruction. In addition, OMC requires physical body markers, which can be affected by skin and clothing artefacts. Moreover, wearable sensors, physical markers, and the awareness of being under observation may alter the real performance of subjects [2], [3].
Recent advances in computer vision research have enabled markerless motion capture (MMC) from videos. MMC often relies on human pose estimation (HPE) algorithms such as AlphaPose [4], OpenPose [5], and DeepLabCut [6]. These MMC techniques have shown potential to replace OMC, especially since smartphones are ubiquitous. However, there is still a lot to be done in evaluating the accuracy and usability of MMC.
Existing MMC approaches can be categorised based on capture plane (2D or 3D) and number of cameras (multi- or single-camera). 2D monocular (single-camera) techniques have been used for quantifying limb kinematics during underwater running [7] and sagittal plane kinematics during vertical jumps [8]. However, these works rely on deep learning approaches, where the generalisation ability depends on the size and diversity of the data and the model architecture. For example, trained athletes, casual trainers, and rehabilitation patients will exhibit different performance ranges. Since collecting large quantities of representative data is difficult, we take an alternative approach here, a quantitative approach, and we focus on the ease of deployment in practice and ease of use. The MyJump2 [9] app has been deployed for measuring jump height using a single smartphone. However, it requires manual selection of jump start and end frames. Previous researchers have performed 3D MMC using multiple cameras [10], [11]. However, the 3D multi-camera approach requires careful calibration and reconstruction of 3D poses from multiple 2D camera angles, which is not feasible for wide deployment in practice.
Therefore, we evaluated a single-smartphone-based MMC in measuring bilateral and unilateral countermovement jump height. Our main contributions are:
We use a simple setup with a single smartphone, with no strict requirements on view perpendicularity and subject's distance from the camera. This is a more realistic application setting where MMC is used outside the lab, without specialised equipment.
We show how to exploit gravity as reference for pixel-to-metric conversion as proposed in [12], removing the need for reference objects or manual calibration.
We analyse how accurately MMC measures jump heights compared with OMC and force plates.
We discuss situations in which MMC could be potentially useful.
Materials and Methods
A. Participants
Sixteen healthy adults (mean age: 30.87
B. Tasks
After a five-minute warm-up, each participant performed three repetitions each of CMJ bilateral (BL) and unilateral (UL) while simultaneous motion capture was performed using force plates, OMC, and MMC (Fig. 1).
Experiment setup showing simultaneous motion capture, preprocessing, and comparison with ground truths.
C. Apparatus
1) Force Plate
AMTI1 force plates sampling at 1000 Hz were used as the first ground truth. To obtain the flight time
\begin{equation*}
h = 100gT_{f}^{2}/8 \tag{1}
\end{equation*}
2) Optical Motion Capture
Optical motion capture was performed using four synchronised CODA2 3D cameras sampling at 100 Hz and synchronised with the force plate. Four clusters, each consisting of four light-emitting diode (LED) markers, were placed on the left and right lateral sides of the thigh and shank (Fig. 3). Moreover, six LED markers were placed on the anterior superior iliac crest (anterior and posterior), and greater trochanter (left and right). Three LED markers were attached to the lateral side of the calcaneus and on the first and fifth metatarsals of the dominant foot.
Examples of noise in unilateral jumps during pose estimation as seen by observing the limb heatmap colors. (a) In frame 2, the left and right limbs are swapped. In frame 3, the right limb is wrongly detected as two limbs. (b) A failure case showing movements that are not characteristic of countermovement jumps.
For a motor task with duration
3) Markerless Motion Capture
Markerless motion capture was performed in the side view using one Motorola G4 smartphone camera with a resolution of 720p and a frame rate of 30 frames per second (fps). The smartphone was placed on a tripod perpendicular to the dominant foot of the participant. We placed no strict requirements on camera view perpendicularity and distance to the participant. However, we ensured that the camera remained stationary and participants remained fully visible in the camera view.
To obtain motion data from the recorded videos, we performed 2D HPE using OpenPose [5]. The HPE algorithm outputs a sequence
D. Data Preprocessing
During preprocessing, we performed denoising, segmentation, resampling, and rescaling.
1) Denoising
As shown in Fig. 3(a), occasional false detections in pose estimation appear as spikes on the motion time series. In most cases, these spikes could be removed by smoothing. However, 19 unilateral jumps such as Fig. 3(b) showed uncharacteristic movements and were removed as failure cases. To avoid filtering out important motion data, we performed smoothing of the OMC and MMC time series using z-score smoothing [15], proposed specifically for spike removal in motion sequences, and a second-order Savitzy-Golay [16] (Savgol) filter. The Savgol filter is known to smooth data with little distortion [17], and we chose a window size of 21 to preserve the main maxima and minima of the time series for accurate segmentation.
2) Segmentation and Resampling
Each jump repetition is characterised by a dominant peak corresponding to the maximum vertical height attained by the hip (Fig. 4). Using these peaks as reference, we segmented each jump with a window
Segmentation of jumps repetitions. (a) Raw hip vertical motion signal with peaks and selected jump windows. (b) Segmented and synchronised jumps based on selected windows.
3) Rescaling
Two approaches were taken to rescale MMC from pixels (px) to a metric scale, namely reverse minmax (RMM) and pixel-to-metric (PTM).
Reverse MinMax (RMM) involved using OMC as reference to rescale MMC into metric mm. This was done by applying MinMax on both OMC and MMC, and then rescaling MMC into mm using the scaling factor obtained from OMC. Let vectors
\begin{equation*}
\mathbf {q^*} = \left\lbrace \frac{\mathbf {q_{px}}_{i} - \textsc {min}(\mathbf {q_{px}})}{\textsc {max}(\mathbf {q_{px}}) - \textsc {min}(\mathbf {q_{px}})} \right\rbrace \tag{2}
\end{equation*}
\begin{align*}
\mathbf {q}_{mm} = & \lbrace \mathbf {q^*_{i}}[\textsc {max}(\mathbf {p_{mm}})-\textsc {min}(\mathbf {p_{mm}})]\\
& +\textsc {min}(\mathbf {p_{mm}}) | i=1,\ldots, N\rbrace \tag{3}
\end{align*}
Pixel-to-Metric (PTM) Conversion was performed based on the ‘free-fall’ of the centre of mass during a vertical jump. PTM uses
\begin{equation*}
d(t) = d_{0} + v_{0}t + \frac{1}{2}gt^{2} \tag{4}
\end{equation*}
\begin{equation*}
(\text{500} \, T^{2} \, g)mm = |d_{0} - d_{T}|px \tag{5}
\end{equation*}
\begin{equation*}
\mathcal {R} = \frac{\text{500} \, T^{2} \, g}{|d_{0} - d_{T}|} \tag{6}
\end{equation*}
\begin{equation*}
\mathbf {q}_{mm} = \lbrace \mathcal {R}(\mathbf {q_{px_{i}})} | i=1,\ldots, N\rbrace \tag{7}
\end{equation*}
E. Quantifying Jump Height
We measured jump heights directly from the OMC and rescaled MMC time series as the maximum vertical displacement of the fifth metatarsal (small toe). We believe this approach is more straightforward than basing measurements on the flight time of the centre of mass, which may vary based on jump strategy.
Analysis and Results
Table I shows the average jump heights recorded for participants across all three repetitions of each task. Each MMC measurement was obtained using the reverse-minmax (RMM) and pixel-to-metric (PTM) approaches as described in Section II-D3. The mean
A. Analysis
We consider all jump repetitions from all participants as individual measurements, thereby recording 6 jumps per participant and 96 jumps in total, of which 77 (48 bilateral and 29 unilateral) were valid and used for analysis. For quantitative comparison, we use the intraclass correlation coefficient [19] (ICC) and Bland-Altman analysis [20] (BA). ICC and BA are often used for comparing new methods of measurements with a gold standard [9], [14], [21].
1) Intra-Class Correlation (ICC)
We took the simultaneous capture of each jump repetition by FP, OMC, PTM and RMM each as a rating. Using the ICC
We also computed the intra-rater ICCs to obtain the intra-session test-retest reliability of each measuring technique (shown in Table II) across the three repetitions for each participant. We obtained the ICCs using the Pingouin [23] intraclass_corr module.
2) Bland-Altman Plots
The Bland-Altman plots are often used in clinical settings to visualise the agreement between two different methods of quantifying measurements based on bias and limits of agreement (LOA) [24]. The bias
Bland-Altman plots for (a) bilateral; (b) unilateral; (c) all CMJs. Each data point in each scatterplot represents a single jump repetition.
B. Results
In this section, we analyse the level of agreement of MMC with OMC and we put this work in context with similar approaches based on ICC, bias, LOA, and simplicity of setup (Table III).
1) MMC vs OMC
First, the accuracy of MMC in quantifying jump height is evaluated with OMC as ground truth. Both MMC and OMC are measured using the vertical displacement of the toe. As shown in Table III, both MMC
2) MMC vs Force Plate
The jump height measured from the force plates is taken as the main ground truth in this section. As shown in Table III, MMC
Discussion
In this study, we have evaluated 2D markerless motion capture with a single smartphone in quantifying vertical jump height during countermovement jumps. Optical motion capture (OMC) was performed using CODA, and markerless motion capture (MMC) was performed using OpenPose with a single smartphone camera. Jump heights obtained from force plate flight times were used as the first ground truth for evaluating jump height, while OMC was used as the second ground truth. We found that MMC can quantify jump heights with ICC between 0.84 and 0.99 without manual segmentation and camera calibration. For all jumps, the greatest agreement is found between OMC and MMC
Although our proposed methods achieve comparable results, the acceptability of LOA will depend on measures similar to the minimally important difference [26] (MID) in each application context. In order to be acceptable, the LOA should be smaller than the MID. For example, the MID in an elite sports context with high accuracy and precision requirements would be considerably smaller than the MID in recreational athletes.
There are some limitations to our approach. For example, the pixel-to-metric conversion requires a calibrating jump, and movements towards or away from the camera during each task change the pixel-to-metric scale. In general, the main sources of error we identify in MMC are:
Video quality. The quality of the video and the amount of clutter in the background affect the confidence of detected keypoints during pose estimation.
Video viewpoint. Accurate detection of body parts is affected by video viewpoint. For example, pose estimation sometimes fails when used for unilateral CMJ in the side view (Fig. 3). Future studies will explore other views for the unilateral CMJ.
Noise in HPE output. The noise level could be influenced by HPE model accuracy, background clutter, and lighting conditions.
Approximations. Preprocessing steps such as smoothing, segmentation, MMC scaling and pixel-to-metric conversion involve approximations, introducing errors.
Conclusion
The results of the analyses in this study suggest that markerless motion capture with a single smartphone is promising. However, its use case will depend on the domain-specific minimally important differences (MID). For example, for applications with very small MID, monocular MMC could provide enhanced feedback and/or augmentation for body-worn sensors and markers. On the other hand, for applications such as measuring countermovement jump height, MMC frame-by-frame tracking accuracy is not critical for the method used in this study. Hence, as shown in this study, 2D monocular MMC could potentially replace sensors and physical markers for such applications.
This study focuses on two variants of one motor task with sixteen participants. Future studies will focus on improving and generalising the techniques used to cover a comprehensive range of motor tasks. In addition, the videos used in this study were captured in the side view. Future studies will consider other views and their effects on capture techniques.