Online Signature Verification Using Locally Weighted Dynamic Time Warping via Multiple Fusion Strategies

The process of employing a dynamic signature verification system to verify the writer’s identity is known as online signature verification. It can be used as a security system to verify entrance applications and password substitutes, and as a forensic tool to support expert’s investigation, for example. This study proposes a novel online signature verification system based on a single-template strategy to improve performance in real-world scenarios. It uses discriminative mean signature template sets as well as fusion strategies of multiple local weighting and warping schemes for dynamic time warping (DTW). First, there is the creation of a set of user-specific mean signature templates for each feature using a recent time-series averaging method, i.e., Euclidean barycenter-based DTW barycenter averaging. Then, we acquire a local weighting estimate considering local stability sequences based on multiple and direct matching points between the mean signature templates and references for dependent and independent DTW. Moreover, we derive fusion strategies to calculate locally weighted DTW sets and concatenate them as a feature vector for each warping, followed by constructing a support vector machine (SVM) classifier, respectively. Finally, in the verification phase, we employ the single-template technique to obtain a discriminative fused score using SVMs between the mean template sets and a query sample. The suggested method’s efficiency is demonstrated by extensive experimental results acquired utilizing three public online signature datasets: SVC2004 Task1/Task2, and MCYT-100.


I. INTRODUCTION
Handwriting is a common tool of communication in our daily lives, and signatures are socially and legally accepted as a form of individual authentication based on the behavioral characteristics and unique features of each person. Recently, online signature verification systems have been used in biometrics [1]- [3] and forensics [4]- [7] because of the widespread adoption of consumer electronics applications and products (e.g., tablets, phablets, and cell/mobile phones).
Parametric and functional approaches are used to extract a set of features from dynamic signatures. The former method depicts signatures as a set of parameters or vectors (e.g., total signature duration, number of pen ups/downs, and average/maximum speed), whereas the latter method represents signatures as time functions (e.g., pen position trajectory, pressure, and velocity). The functional technique has consistently outperformed the parametric approach [1]- [3]. As a result, the functional approach is the emphasis of this research.
The useful method has been adopted for template matching using a distance measure such as dynamic time warping (DTW). Template matching can be classified into multipleand single-template strategies. The former compares respective distances between a query pattern and each reference using several statistical measures (e.g., max, median, min, and mean), whereas the latter calculates a single template chosen or generated from a reference set. The singletemplate strategy has numerous advantages, such as speed, security, and tolerance based totally on one-to-one matching. However, its accuracy is lower than that of the multipletemplate approach [8].
A novel single-template strategy using a time-series averaging method, namely, Euclidean barycenter-based DTW barycenter averaging (EB-DBA) [8], and locally/globally weighted DTW (LG-DTW), has been recently proposed [9], [10]. The study [9] adopted multiple matching points (MMPs) [11] for the local weighting estimate and the variable importance obtained using gradient boosting (GB) [12] for the global weighting estimate, whereas the other study [10] used direct matching points (DMPs) [13] instead of the MMPs for the local weighting estimate in LG-DTW. Based on the recent demand for high-speed systems in the big data era, these single-template strategies can decrease calculation complexity while achieving efficient performance, which results in an accuracy improvement.
However, there are some challenges regarding verification performance in real use as follows: (1) Using MMPs and DMPs independently reduces their complementary effects and may result in the loss of detailed local stability information that existed between the mean signature templates and reference sets. (2) The individual use of MMPs and DMPs makes the system difficult to adapt to changes in writing conditions (e.g., device and signal types), template aging, and skilled forgery attacks, all of which occur frequently in real-world scenarios. Therefore, we have obtained the following solutions to mitigate these challenges: (1) We introduced a modified local weighting scheme for DTW using both MMPs and DMPs (namely, LM-DTW and LD-DTW, respectively) for dependent and independent warping to incorporate more detailed and flexible local stability information and to minimize intra-class discrepancies effectively. (2) To enhance the inter-user variability, we applied the multiple fusion strategies: the representation-level fusion to concatenate the LM-DTW and LD-DTW as a single vector (namely, F-DTW) for each warping, followed by the score-level fusion to combine each score from multiple support vector machine (SVM) classifiers [14], which were constructed for each warping. Notably, this study is an extension of our previous research [15]. Then, we revised the previous method and conducted additional experiments as outlined below: (1) We updated the global weighting scheme with GB into the score-level fusion with SVM to enhance the discriminative power of F-DTW and to further improve performance. (2) In addition to the previous experiments conducted using an SVC2004 Task1 dataset [16], we conducted comprehensive experiments using two public datasets, SVC2004 Task2 [16] and MCYT-100 [17], to confirm the generalization performance of the proposed method. The remainder of the paper is organized as follows: In Section II, we review recent online signature verification methods and distance measures relevant to this study.
Throughout Section III, we present the proposed online signature verification method. In Section IV, we explain the experimental methods and results. In the penultimate part (Section V), we discuss the findings and their applicability in real-world scenarios. Finally, in Section VI, we present the conclusion.

A. ONLINE SIGNATURE VERIFICATION
In the past decade, numerous online signature verification systems have been proposed [1]- [3]. We can classify the systems into two main matching methods: model-based and distance-based approaches. Model-based approaches describe data distribution by employing generative models (e.g., Gaussian models [18] and hidden Markov models (HMMs) [19]) and discriminative models (e.g., SVMs [20], convolutional neural networks (CNNs) [21], and recurrent neural networks (RNNs) [22]). Distance-based approaches match query signatures with reference sets by employing distance measures such as DTW [23]. Distance-based approaches are superior in forensic situations with limited data availability for enrollment because a model-based approach would face overfitting problems.
Among various distance-based systems, template matching is commonly used for online signature verification [18]. Template matching includes single-and multiple-template strategies. The single-template strategy has more advantages, such as speed, security, and tolerance, which are in high demand in the current digital era. However, it does not perform as well as the multiple-template strategy [18].
A recent study [8] proposed an effective single-template strategy that uses mean signature templates created using a novel time-series averaging method called EB-DBA to tackle the difficulties. The template creation method expanded the possibility of template matching in online signature verification, as in [9]- [11], [13], [24].
In template matching techniques, distance measures play an important role in calculating the dissimilarity between the templates and the query signatures with different lengths.

B. DISTANCE MEASURES
The distance measures can be categorized into lockstep and elastic distance measures [25]. The lockstep measures are computed by strictly aligning the time-series indices with one-to-one mappings, e.g., Euclidean distances. However, these measures are sensitive to noise, outliers, and basic shape variations with irregular lengths. To overcome these drawbacks, elastic measures such as DTW [23] have been proposed for optimally aligning the indices of a time series with a one-to-many mapping based on dynamic programming.

1) DTW
For K-dimensional multivariate time series, we can calculate DTW using dependent and independent warping [26]. A DTW with independent warping (DTW I ) is individually calculated for each time sequence, assuming that each DTW is a distance measure with a one-dimensional trajectory in onedimensional Euclidean space. DTW with dependent warping (DTW D ) is directly derived as a single DTW corresponding to the set of time sequence, assuming the considered Kdimensional time series as a one-dimensional trajectory in K-dimensional Euclidean space. Recent online signature verification studies [8]- [10], [24] show that DTW D and DTW I have different/complementary discriminative powers. The details of the DTW calculation are presented below.
Assuming A and B are two K-dimensional multivariate time series of different lengths, I and J, respectively, they are defined as follows: Then, DTW I and DTW D can be computed as follows.
a: DTW with independent warping (DTW I ) First, I ×J cost matrix is constructed using the cost function d(·, ·) between two time points defined as Then, a warping path W = {w z } Z z=1 with max(I, J) ≤ Z ≤ (I + J − 1) is derived based on the cost matrix, satisfying the boundary, continuity, and monotonicity conditions set forth in [23].
Finally, k-th dimensional DTW k I can be defined as follows: where d(w z ) = d(a k (i), b k (j)) corresponds to i and j at position z in the warping path by recursively calculating the cumulative distance as follows: In a similar way to DTW I , DTW D can be defined by calculating it with dependent warping to obtain a single distance from the set of time sequences as follows: where d(·, ·) in Eq. (1) is replaced with As a result, DTW can search for the best alignment and attempt to minimize the distances between time sequences of varying lengths; thus, it has been widely used in online signature verification [2], [18].
However, the DTW is sensitive to noise and outliers in time sequences because it needs to pair all elements of a sequence. To compensate for such drawbacks, some weighting methods for DTW have been proposed.

2) Weighting Schemes for DTW
The weighting schemes for DTW can be classified into local and global weighting schemes. The details are summarized below.
a: Local weighting scheme A weighting function that adds weight to the DTW cost function between matching points is included in the local weighting scheme.
A previous study [27] proposed a weighted DTW (WDTW), which adds a multiplicative weight penalty based on the distances between the points in the warping path. The cost matrix is updated using this approach to incorporate a modified logistic weight function that assigns an additional weight to the DTW cost function between the reference and test points. A sliding window DTW (SW-DTW) has also been proposed as a relevant approach [28]. SW-DTW adopted the modified DTW cost function using a window function to consider the context by incorporating a weighted average of the neighboring distances.
Some studies have incorporated the weighting scheme in the DTW matching to estimate local stability domains in online signature verification. The study [29] proposed a stability-modulated DTW (SM-DTW) to incorporate the most similar parts between a pair of signatures into the distance measure computed by DTW. Other studies [30], [31] analyzed the most stable domains for each signer using a weighted DMP to incorporate information contained in the DTW matching.
These studies bring to light the potential for local weighting methods to be applied to DTW. However, most of the previous approaches needed a multiple-template strategy and/or adequate parameter optimization, which results in high computational complexity. To compensate for these difficulties, we have proposed the locally weighted DTW [11], [13], where we calculate the local stability of the mean signature template set through MMPs or DMPs and apply the weights to the DTW cost functions.

b: Global weighting scheme
The global weighting scheme incorporates a featureweighing/selection and applies it to DTW or its variants. Canonical time warping (CTW) [32] combines DTW with canonical correlation analysis (CCA) to calculate spatial projections and determine the linear combinations of variables between two different multivariate sequences. CTW can incorporate feature-weighting/selection, and a dimensionality reduction mechanism for aligning signals of different dimensions. The other research [24] proposed a novel VOLUME 4, 2016 single-template strategy that uses a global weighting scheme to combine the multiple DTWs while weighting them with the variable importance through GB.
These studies highlighted the limited performance of the single-template strategy in online signature verification. However, the separate/independent use of local or global weighting results in the limited discriminative power of DTW. Therefore, we proposed a novel single-template strategy to overcome these challenges.

III. PROPOSED METHOD
A. OUTLINE Figure 1 shows the outline of the proposed online signature verification method.
After online signature input, we applied preprocessing to improve quality and extract the common function-based features. We implemented a single-template strategy in the enrollment phase, including mean signature template generation based on the EB-DBA and fusion strategies of multiple local weighting and warping schemes for DTW using the reference set. In the verification phase, we computed the fused score between a test sample and the mean signature templates of a purported user. Finally, the system provides a genuine or forged result for the test sample if the fused score is less or more than a designated threshold for each user.
• There are three unique features: horizontal and vertical pen coordinates x(t), y(t), and pen pressure p(t). • Four additional features are derived from the original x(t), y(t) as follows: path-tangent angle θ(t), path velocity magnitude ν(t), log curvature radius ρ(t), and total acceleration magnitude α(t): where the derivatives of discrete-time signals are computed by a second-order regression that removes small noisy variations using the following formula [33]: It should be noted that some digital devices (e.g., cell/smart phones) provide original signals without pen pressure. In that case, we select six of the above seven function-based features that do not include p(t).

D. SINGLE-TEMPLATE STRATEGY
The proposed single-template strategy comprises three steps: 1) the mean signature templates, 2) local weighting estimates, and 3) fusion schemes (Fig. 2). The details of each step are described in the following subsections, referring to the definitions in Section II-B1.

1) Mean Signature Templates
The single-template strategy uses user-specific mean signature templates (i.e., multiple prototypes corresponding to each user's feature) through EB-DBA [8] to consider intrauser variations among all reference samples.
EB-DBA is an effective time series averaging method that iteratively refines Euclidean barycenter (EB) sequence to minimize its DTW to average target sequences based on expectation-maximization scheme. Concretely, we created a EB sequence of N references in which the elements are resampled to reach their average length I equally. We then proceeded with the DBA [34] from the original N references employing the EB sequence for the initial sequence.
From K function-based features (i.e., K = 6 for the SVC2004 Task1; K = 7 for the SVC2004 Task2 and the MCYT-100 in this study), we obtained K mean signature templates of lengths I through EB-DBA for each user.

2) Local Weighting Estimates
We estimate local stability regions in signatures to detect local fluctuations and incorporate intra-user variations into the distance measure. This study adopted MMPs and DMPs to estimate the complementary local stability of the mean signature templates (Fig. 3).

a: MMPs
MMPs [11] detect multiple matching points in DTW trajectories where the mean signature template set and the references are significantly distorted. Consequently, the MMP sequence indicates the local instability of the mean  Process of the proposed single-template strategy: (1) mean signature template set creation per feature (pen coordinates "X" and "Y," pen pressure "P," path-tangent angle "Ang," path velocity magnitude "Vel," log curvature radius "Logcr," and total acceleration magnitude "Tam") through EB-DBA (solid black lines) using the five original reference sequences (dashed lines in different colors); (2) local weighting estimates with MMPs and DMPs for independent and dependent DTW; (3) fusion schemes in representation-and score-level fusions through the SVM models. Finally, we obtain the Score ID in the verification phase. signature template sequence, and the inverse of the averaged MMP sequence {mmp i } I i=1 can be considered as the local stability.
For the sake of simplicity, let us assume that there is an I-length univariate time sequence corresponding to mean signature template A and the original set of N references B = {B n } N n=1 with a J n -length univariate time sequence. Then, the estimation process of the MMP-based local stability can be outlined as follows.
(1) We first calculate the standard DTW for each warping between A and B, and obtain a set of N optimal warping paths according to the formula below: (2) Next, we compute N MMP sequences from W (A, B), and obtain the averaged MMP sequence as below: where c n i is the cardinality of a set, represented as card {·}, belonging to the ith point of A defined as follows: By following the steps outlined above, we finally obtained I-length local weight sequences LM I = {LM k I } K k=1 for independent warping and LM D for dependent warping, defined as follows: DMPs [13] detect the DTW trajectories' averaged matching points where one-to-one matching relations exist between the mean signature template set and all references.
The estimation process of the DMP-based local stability is described as follows.
(1) In the same way as the MMPs, we first calculate a set of N optimal warping paths where a Z n -length warping path is described by W n (A, B n ) = {(p n z , q n z )} Zn z=1 with 1 ≤ p n z ≤ I, 1 ≤ q n z ≤ J n , and max(I, J n ) ≤ Z n ≤ I + J n − 1.

VOLUME 4, 2016
(2) Next, we compute the N DMP sequences from the set of warping paths. When the multiplicity of the warping relation for each component is defined as the number of consecutive occurrences of the component index in W n (A, B n ), the multiplicities corresponding to the respective matching components of p n z and q n z are obtained as: The ith point of A, where the multiplicity simultaneously satisfies both m n i = 1 and m n j = 1, can be defined as a DMP. By following the steps outlined above, we finally obtained I-length local weight sequences LD I = {LD k I } K k=1 for independent warping and LD D for dependent warping, defined as follows:

3) Fusion Schemes
After calculating LM I and LM D with MMPs and LD I and LD D with DMPs, we applied fusion schemes to obtain a discriminative score while maximizing the inter-user variations. This study used two fusion strategies [5]: representation-and score-level fusions.

a: Representation-level Fusion
For representation-level fusion, we obtained a locally weighted DTW with MMPs and DMPs (i.e., LM-DTW and LD-DTW, respectively), followed by concatenating them into a single vector, F-DTW.
To obtain LM-DTW, cost function d(·, ·) between two points of the considered time series, as defined in Eqs. (1) and (5), can be rewritten by weighting it by the corresponding local weight sequences, LM I and LM D , respectively, as follows: Likewise LM-DTW, to obtain LD-DTW, the cost function d(·, ·) in Eqs. (1) and (5) can be rewritten using LD I and LD D as follows: As a result, we obtained {LM-DTW k I } K k=1 and {LD-DTW k I } K k=1 for independent warping; LM-DTW D and LD-DTW D for dependent warping.
Finally, we obtained F-DTW with independent and dependent warping (namely, F -DT W I and F -DT W D , respectively), described as follows: For score-level fusion, we constructed two SVM classifiers using F -DT W I and F -DT W D , respectively, and obtained the final score by fusing the scores derived from the classifiers. SVM [14] is a well-known machine learning classifier that is widely used in writer and signature verification systems [4], [5], [20]. Geometrically, an SVM builds a maximum-margin hyperplane based on the principle of structural risk minimization from statistical learning theory.
When constructing an SVM model, we used positive instances (the intra-user variations between the target user's mean signature template set and reference set) and negative instances (the inter-user variations between the target user's mean signature template set and the other users' mean signature template sets) for each user. For example, we obtained 5 positive and 39 or 99 negative instances by using five genuine signatures as the reference set in the SVC2004 Task1/Task2 datasets and MCYT-100, respectively (Section IV). Then, we employed a linear SVM with the L 2norm penalty and the squared hinge loss using a costsensitive learning method to handle the imbalanced class distributions. A grid search is applied to tune the SVM parameter (i.e., the penalty constant C).
In the enrollment phase, we constructed two SVM classifiers using F -DT W I and F -DT W D (namely, SVM I and SVM D , respectively). When inputting a query sample to SVM I and SVM D , we obtain the confidence scores, Score I and Score D , respectively, each of which is proportional to the signed distance of that sample to the hyperplane.
Finally, we obtained a final score, Score ID , by combining the two scores as follows:

E. OUTPUT
After evaluating the scores between the mean signature template sets of the purported user and the test samples in the verification phase, the system outputs an accept or reject result depending on whether the extent of dissimilarities is below or above the user-specific threshold. In this study, we defined the threshold by analyzing the equal error rate (EER) (Section IV-A2).

A. METHODS
In real scenarios, skilled forgery detection is a challenging task, especially for forensic document examiners (FDEs) [6], [7]. To overcome such challenges, we conducted experiments using the public online signature datasets: SVC2004 Task1/Task2 [16] and MCYT-100 [17]. These datasets contain the various stylized signatures with highly skilled forgeries collected from other contributors who had sufficient training time to produce valid forgeries. These situations are consistent with the addressed challenge; therefore, we adopted the three datasets in these experiments.

1) Signature Datasets a: SVC2004 Task1 and Task2
The SVC2004 Task1 and Task2 datasets contain 1,600 signatures, including Western and Asian signatures from 40 users (for a total of 3,200 signatures from 80 users). For each user, both datasets contain 20 genuine and 20 skillfully forged signatures. After sufficient practice, the original writers were advised to provide simple, invented signatures as genuine to avoid privacy issues. The SVC2004 Task1 includes horizontal and vertical pen coordinates with time stamps and pen up/down status, all captured using a digitizing tablet at a sampling rate of 100 Hz. However, the SVC2004 Task2 includes pen pressure, azimuth, and inclination signals. Among the seven function-based features (Section III-C), only the six features without the pen pressure feature are derived from the SVC2004 Task1.

b: MCYT-100
The MCYT-100 dataset consists of 5,000 Western signatures gathered from 100 users. The data include horizontal and vertical pen coordinates, pressure, azimuth, and inclination with time stamps, all of which were captured by a digitizing tablet at a sampling rate of 100 Hz. Each user is represented by 25 samples of both genuine and skillfully forged signatures.

2) Evaluation
Finally, we evaluated the signature verification performance by analyzing the EER with a user-dependent threshold, in which the false rejection and false acceptance rates are equal.
According to the experiments conducted in previous studies and real scenarios, we randomly selected N = 5 genuine signatures as the reference set in each experiment. For the test samples in the verification phase, the remaining 15 genuine signatures and 20 skillfully forged signatures were used on the SVC2004 Task1 and Task2; the remaining 20 genuine signatures and 25 skillfully forged signatures were used on the MCYT-100.
To prevent selection bias, we repeated all experiments five times on these three datasets. Finally, we obtained the average EERs.

1) Overall Performance
To confirm the effectiveness of the proposed method in template matching, we compared various combinations of VOLUME 4, 2016 templates and distance measures using three datasets under the same experimental conditions (Section IV-A).
The sets of templates and distance measures used are described below: • Template strategies: (1) "MT(Mean)": a multiple-template strategy with a mean measure, which was found to perform best among statistical measures [8], after the distances between a test sample and all the references were calculated. (2) "ST(Rep)": a single-template strategy in which a representative template set was chosen directly from the reference set using the minimum average distance measures from other samples.
a single-template strategy with a mean signature template set created through EB-DBA.
• Distance measures: (1) "DTW": the traditional DTW [23] with no weighting for the cost function. (2) "G-DTW": the previous DTW [24], applying global weighting to combine the multiple DTWs through GB. (3) "LM-DTW": the recent DTW [11], with the applied MMP-based local stability sequence as the weights for the cost function. (4) "LD-DTW": the recent DTW [13], with the applied DMP-based local stability sequence as the weights for the cost function. (5) "LG-DTW": the relevant measure [15], applying local and global weighting to the DTW. Concretely, after obtaining the F-DTW, we calculated the global weighting factors estimate through GB: , each of which satisfies 2K u=1 α u = 1 and 2 v=1 β v = 1, followed by computing the LG-DTW: (6) "Score ID ": the proposed method, applying the single-template technique to obtain a discriminative fused score through SVMs constructed using F-DTW. Figure 4 shows the overall performance of the proposed method in terms of EER. As shown in Fig. 4, we deduced the following results: • Among DTW measures, performance using mean signature templates ("ST(MST)") is considerably better than the conventional "ST(Rep)" and competitive with the multiple-template strategy ("MT(Mean)"). • In "ST(MST)," the performance is further improved by applying weighting schemes for DTW. Notably, the independent use of LM-DTW and LD-DTW reveals data dependency in performance among the datasets; thus, it is rational to use both LM-DTW and LD-DTW in the proposed method. In fact, LG-DTW and Score ID , both of which use F-DTW, outperform the independent use of LM-DTW and LD-DTW. • Among methods using F-DTW, performance using the score-level fusion with SVM (Score ID ) is better than the global weighting scheme with GB (LG-DTW). • Overall, the proposed single-template strategy (Score ID with "ST(MST)") achieves the lowest EERs across all datasets. To confirm statistical significance between the proposed Score ID and the other seven methods, we applied the statistical hypothesis tests. After confirming the global hypothesis tests to be significant on all datasets (i.e., the Friedman test [35] with a significant level of less than 0.001), we applied the Matched-Pairs test [36] along with the Holm method [37]. The Matched-Pairs test determines whether the difference in errors between two methods tested for equivalent subjects on the same dataset is statistically significant. The Holm method is used to adjust the pre-defined significance level for the multiple comparisons. As a result, the Score ID outperformed four methods (i.e., all three DTWs and G-DTW) on all datasets at a significance level of less than 0.05 (i.e., 1.58e-08 ≤ p-value ≤ 1.37e-03). Hence, we can conclude that there is a significant difference between the results of the proposed method and these four methods.
On the other hand, we cannot confirm the statistical significance between Score ID and LM-DTW or LD-DTW, which depends on the datasets. This result demonstrates that the individual use of LM-DTW and LD-DTW is susceptible to the writing conditions; therefore, it is reasonable to use both LM-DTW and LD-DTW in the proposed method. Additionally, we cannot confirm the statistical significance between Score ID and LG-DTW, both of which use F-DTW, on all datasets. This result indicates that the proposed F-DTW provides sufficient discriminative power in any fusion schemes; thus, it is meaningful to use the score-level fusion with linear SVM in Score ID , where it needs only a few parameters and low computational complexity compared to the global weighting scheme with GB in LG-DTW.
These results confirm that the proposed single-template strategy provides an effective template-matching approach for online signature verification.

2) Comparative Analysis of Weighted DTW
To further confirm the effectiveness of the proposed method, we compared the performance with the previous weighted  DTW methods [27], [28], [32] by applying the singletemplate strategy under the same experimental conditions.
The baselines of the weighted DTW are shown below: • "WDTW": applying the modified logistic weight function as the weight for the cost function [27]. • "SW-DTW": modifying the cost function by incorporating a weighted average of the neighboring distances using a window function with a width δ ∈ N while weighting with a constant α ∈ [0, 1] between the cost in amplitude and first order derivative [28]. • "CTW": combining DTW and CCA to allow featureweighting/selection and dimensionality reduction (a) SVC2004 Task1 dataset.
Notably, WDTW and SW-DTW were calculated using dependent warping based on the findings [8], [26] and their parameters were selected according to previous studies. Figure 5 compares the EERs of the proposed method with the previous weighted DTW using the single-template strategy with the mean signature templates. As a baseline, we displayed the results of the conventional DTW in Fig. 5, which we confirmed the statistically significant differences with Score ID (Section IV-B1). As shown in the figure, the proposed method (Score ID ) provides data independence and the lowest EERs compared with conventional methods for all datasets.
To confirm the statistical significance of the performance between the proposed method and the recent WDTW, SW-DTW, and CTW, we applied the statistical hypothesis tests following the previous experiments (Section IV-B1). As a result, the proposed method outperformed all the recent methods on all datasets at a significance level of less than 0.05, where most of the experiments were further VOLUME 4, 2016 (a) A mean signature template and a genuine signature.
(b) A mean signature template and a skilled forgery outperformed at a significance level of less than 0.001 (i.e., 2.60e-12 ≤ p-value ≤ 1.99e-02). Hence, we can conclude that there is a significant difference between the results of the proposed method and each of the recent WDTW, SW-DTW, and CTW.
These results confirm that the proposed method provides an effective measure, especially for the single-template strategy in online signature verification.

3) Comparative Analysis of State-of-the-Art Systems
To assess the effectiveness of the proposed single-template strategy in online signature verification, we compared the results of the proposed method's EER with those of stateof-the-art systems.
Tables 1-3 present the results obtained using the SVC2004 Task1/Task2 and MCYT-100, respectively, where only genuine signatures for the enrollment phase and genuine and skillfully forged signatures for the verification phase. In these tables, we displayed EERs of the proposed method as a representative of the single-template strategy following the previous experimental results (Section IV-B1). It should be noted that the comparative analysis of the SVC2004 Task1/Task2 datasets were set up as experiments using not only N = 5 but also N = 10 as the reference signatures for fair comparisons with the previous studies.
These tables show that the proposed single-template strategy outperforms other recent literature systems in all datasets. The results confirm the effectiveness of the proposed method for online signature verification, even while investigating skilled forgery scenarios.

V. DISCUSSION
In this study, we proposed a novel single-template strategy, that provides lower calculation complexity and higher verification performance simultaneously compared with the multiple-template strategy. The main contributions of this study are summarized as follows: • We adopted the mean signature template creation method with EB-DBA to incorporate the intra-user variations within the reference samples. • It provides locally weighted DTWs using both MMPs and DMPs (i.e., LM-DTW and LD-DTW, respectively) derived from the intra-user variations between the mean signature template and reference samples for independent and dependent warping to incorporate detailed and flexible local stability information and to minimize intra-class discrepancies effectively. • It employs multiple fusion strategies to improve interuser variability: the representation-level fusion, which concatenates LM-DTW and LD-DTW into a single vector, F-DTW, for each warping; and score-level fusion, which combines each score, through SVMs constructed using F-DTW for each warping.
Unlike recent black-box modeling strategies such as deep learning algorithms that require high computational complexity and many training samples, the proposed approach is superior, especially in forensic situations with limited available data [4]- [7].
The proposed method also relies on explainable stepwise methods that support FDEs to explore their differences and similarities and explain the rationale behind their assessment for legal professionals in the decision-making process. For example, it can provide detailed matching between the mean signature template and a query signature ( Figs. 1 and 2).
Furthermore, the proposed method can provide visual investigation tools to assist FDEs for forensic analysis in an explainable way. For instance, we can visualize F-DTW D matching between a mean signature template obtained through EB-DBA and query signatures in the spatiotemporal domain (Fig. 6). In this figure, the farther the distance with higher local weighting between the matching points, the higher the chances of the query signature being a forgery.
Therefore, it is particularly useful for applications, such as forensics and security, in which fairness, accountability, and transparency are critically important.

VI. CONCLUSION
To increase the performance of online signature verification, we devised a unique single-template technique for DTW based on mean signature template sets and fusion strategies of several local weighting and warping schemes.
In the enrollment phase, we used EB-DBA to obtain userspecific mean signature template sets, taking into account intra-user heterogeneity across reference samples. Then, for independent and dependent DTW, we calculated local weighting estimate by analyzing MMPs and DMPs between the mean signature template sets and reference samples, respectively, to incorporate detailed and flexible local stability information and to effectively minimize intra-class discrepancies. To improve inter-user variability, we used the representation-level fusion to concatenate LM-DTW and LD-DTW calculated with MMPs and DMPs into a single vector, F-DTW, for each warping, followed by the score-level fusion to combine each score through SVMs constructed using F-DTW for each warping.
The results of the experiment on the public online signature datasets SVC2004 Task1/Task2, and MCYT-100 proved  the usefulness of the suggested method for online signature verification. The proposed explainable stepwise strategy can bridge and compensate for the biometrics-forensics divide.