A Complete Deep Support Vector Data Description for One Class Learning

In recent years, Deep Support Vector Data Description (Deep SVDD) has emerged as a leading method in the field of anomaly detection. However, inaccuracies in parameter solving have been identified as a limitation of this approach, which negatively affects its accuracy and efficiency. To address this issue, we propose a new method, called Complete Deep Support Vector Data Description (CD-SVDD). Our CD-SVDD is constructed with a traditional deep neural network and utilizes a modified SVDD as its last layer. Its parameters are solved by an alternate iteration algorithm that ensures both high precision and fast convergence of solutions. By keeping the network weights fixed, we solve the center and radius of the modified SVDD based on its convex dual optimization problem. With the exact center and radius, we then update the parameters of the neural network by backpropagation. Compared to the existing deep SVDD, all parameters of our method are precisely solved. So, our method is defined to be “complete”. This approach enables us to maintain the $\nu $ -property found in shallow SVDD, which is beneficial for parameter selection and model interpretability. To evaluate the performance of CD-SVDD, we conducted extensive numerical experiments with five existing methods on two image datasets, CIFAR-10 and CIFAR-100, as well as five recorded benchmark datasets. Our results demonstrate that CD-SVDD achieves superior accuracy and efficiency in the detection of anomalies.


I. INTRODUCTION
Anomaly detection is a widely research field in machine learning and data mining, with the aim of identifying data that deviates from most instances.It includes point anomaly, contextual anomaly, and collective anomaly [1], [2], [3].Point anomaly, in particular, has been extensively researched and can be categorized into classification-based, clusteringbased, and nearest neighbor-based approaches [1].Among these methods, one-class classification models are commonly used and have shown admirable performance in various applications.
One-class classification models are trained using normal data to detect abnormal instances in prediction [4].The The associate editor coordinating the review of this manuscript and approving it for publication was Davide Patti .
classical approach is the one-class support vector machine (OCSVM) [5], [6], which assumes the origin is an abnormal point and learns a hyperplane to separate normal data from it.Although OCSVM has achieved great performance in various applications, it is limited by the use of a hyperplane to separate the data.Support vector data description (SVDD) [7], as a successful extension of OCSVM, separates normal and abnormal data by learning a hypersphere instead of a hyperplane.SVDD is more flexible and shows outstanding prediction performance.
To enhance the handling of high-dimensional data, some traditional dimension reduction methods have been employed.For example, in [8], the author utilizes principal component analysis (PCA) to reduce the dimension in image anomaly detection.In [9], Shravan et al. propose a document classifier based on PCA and OCSVM.In [10], Shen et al.
employ PCA and SVDD for non-linear process monitoring.However, despite their convenience, these dimension reduction methods may significantly impact prediction accuracy.Moreover, traditional kernel techniques may have limited adaptability when dealing with complex data structures.
1) Neural networks are often used for data preprocessing, followed by training of the anomaly detection model.For example, Alfeo et al. use an autoencoder for data dimension reduction before training the anomaly detection model in [16].In [17], Wang et al. propose an unsupervised deep learning method based on an autoencoder combined with OCSVM for anomaly detection.These approaches are referred to as mixed models where the two stages are carried out separately.However, similar to traditional dimension reduction techniques, differences between normal and abnormal data are not directly detected in data preprocessing.As a result, the performance of anomaly detection cannot be guaranteed.
2) The traditional anomaly detection model has been extended to the deep learning framework, where neural networks and traditional models are often trained alternately.Ruff et al. proposed a deep SVDD by extending ν-SVDD to the deep learning framework in [18].Similarly, in [19], a fully deep model, called a one-class neural network (OCNN), was proposed by extending OCSVM.These methods aim to improve the performance of anomaly detection by incorporating deep learning techniques.The fully deep models mentioned above generally perform better in predicting anomalies on large and complex datasets.However, the deep SVDD training process has a limitation where the center c of the hypersphere is manually fixed and cannot be updated, negatively affecting the prediction performance.Moreover, the square of the hypersphere radius R 2 is substituted with the quantile of the square distance from c to the mapped samples, which lacks a theoretical foundation and hinders the convergence speed and prediction accuracy of deep SVDD.
To address the issue of imprecise solutions and to enhance model interpretability, we introduce a novel approach called the Complete Deep Support Vector Data Description (CD-SVDD).We also demonstrated the theoretical property of parameter ν in our CD-SVDD and propose an efficient algorithm for implementing CD-SVDD.Compared to existing model, all parameters of our model is solved precisely.Therefore, we defined it to be ''complete''.
The main contributions of this paper are summarized as follows.
1) We propose the CD-SVDD.It can be well adapted to the situation where the dimension of data is high and the distribution is complex.The CD-SVDD has great model properties and achieves higher prediction accuracy.2) We further develop an efficient joint alternate algorithm for CD-SVDD, based on the strong duality of the optimization and backpropagation method.This algorithm is more efficient than the original algorithm and has a faster convergence speed.3) We demonstrate the ν-property of CD-SVDD, which provides a valuable guidance for parameter selection and enhances the interpretability of the model.The rest of this paper is outlined as follows.Section II briefly introduces the related work.In Section III, CD-SVDD is proposed and the corresponding efficient algorithm is developed.Then, we demonstrate the relevant ν-property.Then, in Section IV, abundant experiments are conducted to verify the validity of our method.Section V gives the conclusion.

II. RELATED WORK
In this section, we review the classical SVDD [7] and the deep SVDD [18].

A. SUPPORT VECTOR DATA DESCRIPTION
Let X be the input space.X = {x 1 , x 2 , . . ., x l } is a training set that is drawn independently from X .Let H be a reproducing kernel Hilbert space (RKHS) associated to a Mercer kernel K : X × X → R that is continuous, symmetric, and positive semidefinite [20].Let φ : X → H be the associated feature map that satisfies K (x i , x j ) = ⟨φ(x i ), φ(x j )⟩ for all x i , x j ∈ X .SVDD [7] finds a hypersphere that contains all normal data as the separation boundary.It can be summarized as the following optimization problem: min ( Here, R is the radius of the hypersphere.c is the hypersphere center.ξ i is the relaxation factor.The hyperparameter C is the penalty parameter.l is the sample size.The solution of the primal problem (1) is usually obtained by solving the following dual problem (2).
where Q ij = K (x i , x j ), α is the vector of Lagrangian multipliers, and e is a ones vector with appropriate dimensions.Additionally, by replacing C of Eq. ( 1) with ν, ν-SVDD has been proposed in [18].The corresponding optimization problem is formulated as: min ( Compared with C in SVDD, ν in ν-SVDD has a practical interpretation [21].ν ∈ (0, 1] is proved to be not only the upper bound of the proportion of observations outside the hypersphere, but also the lower bound of the proportion of the support vectors [18].

B. DEEP SUPPORT VECTOR DATA DESCRIPTION
Deep SVDD is inspired by ν-SVDD [18], and uses a neural network instead of the traditional kernel function.Soft-boundary deep SVDD is proposed to minimize the volume of the hypersphere that contains the outputs of the neural network.It can be formulated as: Here, R is the hypersphere radius, c is the hypersphere center.W j is the parameter matrix of the j-th layer neural network.L represents the number of layers of the neural network.φ(x i , W ) represents the outputs of the neural network with the input of the original data x i .λ and ν are hyperparameters.Furthermore, a simplified version of the deep SVDD, named one-class deep SVDD, is put forward as follows. min The above models extend the shallow SVDD to deep learning framework to improve the prediction performance.However, in their solution algorithms, the parameters R 2 and c are roughly calculated.Specifically, R 2 is just estimated by the quantile of the square distance from c to the data mapping result.This limits the performance of the deep SVDD.

III. A COMPLETE DEEP SUPPORT VECTOR DATA DESCRIPTION
As discussed in Section II-A, formulation (3) of ν-SVDD was proposed to improve the interpretability of the model.However, the non-convexity of the optimization problem with respect to R and the lack of guaranteed strong duality make it challenging to solve R 2 exactly [22].In deep SVDD, the distance quantile is used as an approximate representation of R 2 .However, in the deep-SVDD training process, the theoretical properties of ν in ν-SVDD cannot be guaranteed, which impedes the model's interpretability.
In this section, we propose the Complete Deep Support Vector Data Description(CD-SVDD).The parameters in CD-SVDD could be solved accurately, and it has greater theoretical properties.
Define the input space X ⊆ R d and the output space Here, φ θ is constructed by a neural network and θ is its corresponding weight parameters.l is the size of the training dataset X = {x 1 , x 2 , . . ., x l } ⊆ X .The aim of CD-SVDD is to jointly learn the neural network parameters θ together with minimizing the volume of a hypersphere containing the normal data.We give the formulation of CD-SVDD as follows. min This model constructs a hypersphere that contains normal data in the output space of the neural network.Rθ is the square of the radius of the hypersphere.c θ is the center of the hypersphere.φ θ (x i ) is the mapping result of x i .The second term is a penalty for training data outside the hypersphere.The third term is the regularization for the parameters of the neural network.Here, it is assumed that the neural network has K layers and θ k is the weights of the k-th layer.ν ∈ (0, 1] is a hyperparameter.The schematic diagram in Fig. 1. illustrates the workflow of CD-SVDD.First, the original data in R d are transformed into a low-dimensional space R b by a neural network.Then, CD-SVDD finds the minimum volume hypersphere that encloses the normal data.During training, only normal data are expected to be mapped inside the hypersphere.In the prediction process, any data point outside the hypersphere is treated as an outlier.
Define the map of the neural network from the 1 first layer to the K -th layer as φ θ 1 , φ θ 2 ,. . ., φ θ K .The feed-forward process of the CD-SVDD can be represented as Here, a value of ''+1'' indicates normal, and a value of ''−1'' implies abnormal.That is, samples located inside the hypersphere are regarded as normal samples, while those located outside the hypersphere are abnormal samples.
In our proposed CD-SVDD, the optimization problem (6) will be solved exactly, giving the CD-SVDD greater accuracy and convergence speed.At the same time, the exact solution of the parameters of CD-SVDD could also provide better properties of the parameter ν, which makes CD-SVDD have good interpretability.

A. MODIFIED SVDD
The parameters of optimization problem (6) can be solved jointly and precisely.Firstly, assuming the parameters of neural network are fixed, Rθ , c θ and φ θ could be abbreviated as R, c and φ.Then, (6) degenerates to following optimization problem: min Here, ν ∈ (0, 1] is a hyperparameter.It is easy to demonstrate the convexity of the problem (7) which satisfies the Slater conditions.It often refers to strong duality [23].This implies that the parameters R and c in the optimization problem ( 7) can be solved with precision.We call Eq. ( 7) a modified SVDD in this paper.
The modified SVDD holds the Theorem 1.It explains the relationship between the optimal solution of the optimization problem (7) and the parameter ν.
(b) If ν = 1, then at least one optimal solution has R = 0.
The proof is similar to Theorem 3 in [22], so we omit it here.
For (7), when ν = 1, there is at least one optimal solution R = 0 according to Theorem 1.Then, the problem ( 7) is equivalent to This case easily leads to a hypersphere collapse in neural networks [18].Therefore, the setting of ν = 1 is not recommended.
When ν ∈ (0, 1), the constraint R ≥ 0 in (7) will always be satisfied.The problem (7) could be written as min Note that if ν is set too close to 0, each ξ i will tend to 0.
According to the optimization problem (7), Rθ , c θ in the CD-SVDD can be accurately solved.However, to update the parameters of the neural network more accurately and efficiently, we solve R, c in the optimization problem (7) via its corresponding dual problem [24].
The Lagrange function [23] of ( 9) can be written as follows.
The optimal solution of center c * can be obtained by the following formula: Denoted i 1 , i 2 , . . ., i p as the index that satisfies 0 < α * i < 1/νl in α * , then the optimal solution of radius square R * is achieved by: R * =
Then, ( 6) can be rewritten as: Formulation (23) serves as a loss function to update the parameters in the neural networks.Notably, R and c in (23) can be precisely solved through the dual problem (20).In comparison, deep SVDD in (4) and ( 5) only provides an approximate estimation of R 2 and c through quantiles.Hence, our proposed method offers stronger theoretical support and yields a more accurate solution.Additionally, R and c in (23) are represented by φ(x) based on the KKT conditions.This enables the direct optimization of the hypersphere volume as part of the loss function, resulting in a smaller hypersphere volume.Consequently, it becomes easier to separate abnormal samples from the hypersphere, further enhancing the accuracy of the model.Furthermore, the precise solution method accelerates convergence and enhances the computational speed of the model.
After the optimal parameters of CD-SVDD are obtained, for any sample x, we can define an anomaly score as follows.
Then, we can give a prediction for x by the decision function: ⇐ calculating by Eq. ( 21) and ( 22) Algorithm 1 can be easily extended to the mini-batch case.Then, in each iteration, it just needs to solve a smaller optimization.Additionally, with the precisely calculating R * and c * in each batch, the whole algorithm can converge faster than deep SVDD.

D. ν-PROPERTY IN CD-SVDD
In this section, we demonstrate a great property of the parameter ν in CD-SVDD.For the sake of narrative, define the abstract sample sets A, B and C as follows.
Theorem 2. For CD-SVDD, ν ∈ (0, 1) is not only the upper bound of the proportion of the mapped observations outside the hypersphere, but also the lower bound of the proportion of the support vectors.
Therefore, we have Namely, When min-batch strategy is employed, the result similar to Theorem 2 can be derived.
The ν-property makes CD-SVDD have greater interpretability, which is helpful to guide the parameter selection in the model training.

IV. COMPARISONS WITH OTHER EXISTING METHODS
In this section, we give a comparative discussion between CD-SVDD and other existing methods, including classical neural networks and anomaly detection methods.

A. COMPARISON WITH EXISTING CLASSICAL NEURAL NETWORK METHODS
The hidden layer structure of CD-SVDD inherits traditional neural networks.For instance, it could be made up of convolution and pooling layers.Most importantly, our CD-SVDD is designed for the specific task of anomaly detection, while traditional CNNs (convolutional neural networks) and GANs (generative adversary networks) are intended for ordinary classification tasks or generative tasks.One of the main constructive differences of our CD-SVDD is the loss function, which is tailored for anomaly detection.The CD-SVDD generates a hypersphere in the mapped feature space as an anomaly detector.

B. COMPARISON WITH EXISTING ANOMALY DETECTION APPROACHES
The comparisons of CD-SVDD with shallow one-class methods, mixed deep anomaly detection approaches, and full deep methods are discussed.
1) The shallow one-class methods, such as OCSVM and SVDD, use a kernel function to map the original data into high-dimensional space [5], [7].However, this approach has weaker adaptability, especially when faced with complex data distributions.On the contrary, CD-SVDD learns a neural network to map the original data into a high-dimensional space, thereby improving model performance.The data representation ability of neural networks is more powerful than that of kernel functions, which results in CD-SVDD generally having better prediction performance than shallow models.2) In mixed deep anomaly detection approaches, the neural network training process and anomaly detection are usually carried out independently.This limits the prediction performance.On the contrary, CD-SVDD trains the parameters of neural network and modified SVDD together, which is more conducive to improve prediction performance.
3) The existing deep SVDD model approximates some of its parameters using a quantile approximation, which may lead to imprecise results.In contrast, our CD-SVDD uses an alternative iteration training strategy.First, the parameters of the modified SVDD are obtained through dual optimization.Then, the parameters of the neural network are updated using backpropagation.This allows for precise updates to the model parameters during training, resulting in higher accuracy.The training process is also faster as the loss function converges more quickly.Furthermore, the parameter ν in CD-SVDD has better interpretability, which can provide useful guidance for parameter selection.

V. NUMERICAL EXPERIMENTS A. EXPERIMENTAL SETUP
To verify the advantages of CD-SVDD, numerical experiments are conducted on two image datasets, i.e., CIFAR-10, CIFAR-100, 1 and five recorded benchmark datasets, i.e., a9a [26], codrna [27], epileptic [28], htru2 [29] and ijcnn1 [30] from UCI Machine Learning Repository 2 and LIBSVM. 3 Their statistics are given in Tables 1 and 2, respectively.Additionally, we verify the ν-property in CD-SVDD on five recorded benchmark datasets.CIFAR-10 and CIFAR-100 datasets contain rich physical images [31].Some examples are shown in Fig. 2.These two datasets were provided with specific training and division of test sets.For the other five benchmark datasets, we randomly take 80% for training, and the remaining 20% for testing.In particular, one-class methods are just trained with oneclass samples.For multiclass data, we take turns to use one of the categories of training samples to build the models.
All experiments are implemented by Python 3.10 on Windows 11 running on a PC with system configuration Intel Core i5-1140 CPU 2.70GHz with 16GB of RAM.
1) IF [32] is an anomaly detection method.It judges normal and abnormal data according to the average path lengths needed to ''isolate'' a data point.It holds that abnormal data  can usually be ''isolated'' by short average path lengths on the trees.
2) OCSVM [5] is a one-class classifier.It assumes that the origin is an outlier and searches for a hyperplane farthest from the outlier as the separation boundary.The non-linear kernel is introduced to improve the model accuracy.
3) ν-SVDD [7] is a one-class classification.It separates normal and anormal data by learning a hypersphere that contains all normal data.4) Soft-boundary Deep SVDD [18] is a fully deep anomaly detection model proposed based on ν-SVDD.A neural network is used as a mapping tool to train a hypersphere with the smallest volume.
5) One-class Deep SVDD [18] is a simplified version of the soft-boundary deep SVDD.Its objective function is changed to minimize the average distance between normal samples and its center.
In the experiments, IF is implemented by the ''ensemble.IsolationForest'' class in the sklearn package.Both OCSVM and ν-SVDD adopt the Gaussian kernel function [33] and the best hyperparameter is selected by grid search.σ is from the set {2 i |i = −7, −6.5, . . ., 7} and ν is from the set {0.01, 0.02, . . ., 0.99}.For deep learning methods, that is, soft-boundary deep SVDD, one-class deep SVDD, and CD-SVDD, we fix ν=0.1 referring to [18] and the batch size is 200.On the CIFAR-10 and CIFAR-100 datasets, the architectures of their neural networks are CIFAR10-LeNet [18].On the other five benchmark datasets, their architectures are fully connected feedforward neural networks.And we initialize network weights by uniform Glorot weights [34].They are implemented through the torch package.The convex optimization problem involved in CD-SVDD is solved by the block coordinate descent method [35].In order to verify the ν-property in CD-SVDD, we obtain the relationship between the CD-SVDD training error ratio, support vector ratio, and ν, where ν takes the value in {0.05, 0.1, . . ., 0.95}.
Taking into account the case of imbalanced test samples (one of the classes is regarded normal class, and the others abnormal), the value of AUC (area under curve) under optimal parameters is used to evaluate the prediction performance of each model [36].Furthermore, we calculate the F1 scores under the optimal parameters on the five recorded benchmark datasets.

B. RESULTS OF PREDICTION AND EFFICIENCY
To evaluate the accuracy and effectiveness of our proposed CD-SVDD method, we compare it with five other methods on two image datasets and five benchmark datasets.
1) CIFAR-10.It consists of images of common vehicles and animals.As shown in Table 3, CD-SVDD achieves higher AUC values in comparison to other models in 8 of 10 categories, which means that the model has greater accuracy.This suggests that CD-SVDD can more accurately identify common vehicles and animals.OCSVM and SVDD follow it.
Besides, the training iterations and computational time of three deep models, i.e., soft-boundary deep SVDD, oneclass deep SVDD and CD-SVDD, are shown in Table 4. ''Iterations'' corresponds to the iterations of updating the model parameters, and ''Time'' represents the total training time until the model reaches the optimal AUC value.''DP'' represents the time to solve the dual problem, and ''Network'' denotes the time to update the parameters of the neural network.It implies that CD-SVDD achieves the least iterations for 4 categories, and the shortest time for 5 categories.It can also be found that the computational time of CD-SVDD is shorter even with large iterations.This indicates that CD-SVDD has advantages in computation speed compared to the other two models.
2) CIFAR-100.It is comprised of images of animals, plants, and household products, providing an extremely diverse range of objects for classification.As shown in Table 5, CD-SVDD outperforms other models in 90 of 100 categories in terms of optimal AUC values, indicating its higher accuracy in identifying anomalies.Moreover, based on the AVERAGE result, CD-SVDD significantly outperforms other models.These results indicate that CD-SVDD can be effectively applied to real-world anomaly detection tasks.The corresponding iterations and the results of the computational time are shown in Table 6.CD-SVDD achieves the lowest number of iterations for 45 categories and has the shortest computational time for 54 categories.Additionally, we provide a box plot in Fig. 3 to observe the overall computational performance in all 100 categories of  CIFAR-100.The quartiles of iterations and corresponding time for CD-SVDD are significantly smaller than those for the other two deep models.This suggests that CD-SVDD can converge to the optimal AUC value more quickly, demonstrating the effectiveness of using optimization methods to solve the problem (9) and minimize the objective function (6).The outliers observed in the one-class deep SVDD solution process also suggest that CD-SVDD has a more stable solving process.
3) Five recorded benchmark datasets.We verify the advantages of our CD-SVDD in terms of prediction accuracy and computational efficiency.For prediction accuracy, we compare the optimal AUC value, average AUC value, and F1 score of each model on these datasets.Additionally, VOLUME 11, 2023 117501 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.we carried out a sign test and Friderman test on the experimental results.For computational efficiency, we compare computational time and iterations when the models achieve desirable and stable AUC values.Considering that performance of neural structures is often sensitive to initial conditions, we repeat the training three times and compare the corresponding optimal and average AUC values, respectively.As shown in Table 7, in terms of optimal AUC value, CD-SVDD has outstanding performance in 9 of 13 categories.As shown in Table 8, in terms of the average AUC value, CD-SVDD performed well in 10 out of 13 categories.Since the training results of OCSVM and SVDD on the same data are stable, we just train these models once.For shallow models IF, OCSVM and SVDD, the class '0' of dataset codrna cannot be detected properly at all, which reflects the limitations of shallow models.
We conduct a sign test [37] to further verify the prediction performance of CD-SVDD according to optimal AUC value in Table 7.The results are shown in Table 9.The total number of sample categories is denoted as ''N '', while ''Node'' represents the number of categories where the efficiency of the two models is equal.''S + '' and ''S − '' are the number of categories in which CD-SVDD is superior or inferior to the other model, respectively.''p'' indicates the results of the sign test.As shown in Table 9, at a significance level of  0.05, CD-SVDD outperforms the other five models in terms of prediction performance.This implies that the predictive performance of CD-SVDD is effective.
The Friderman test is conducted according to average AUC value in Table 8.As shown in Table 10 and Table 11, there are obvious differences in the prediction accuracy of each model.
Subsequently, we conducted a sign test to assess the disparity in prediction accuracy between CD-SVDD and each model according to Table 8.Details of the results are presented in Table 12, all of which demonstrated statistical significance at a confidence level of 0.05.This implies that the superior prediction accuracy exhibited by CD-SVDD is universal.
The optimal F1 score for each model is shown in Table 13.Among the 13 categories, CD-SVDD is significantly superior to other models in 9 categories.The one-class deep SVDD follows it.This indicates that CD-SVDD not only achieved success on image datasets but also has excellent performance on recorded datasets.This demonstrates the universality of CD-SVDD for data types.
The iterations and time of soft-boundary deep SVDD, oneclass deep SVDD, and CD-SVDD are shown in Table 14.CD-SVDD achieves the least iterations and the shortest time for 5 categories of benchmark datasets, which verifies the efficiency of our We conduct a sign test [37] to further verify the computational efficiency of CD-SVDD.As shown in Table 15, at a significance level of 0.1, the convergence rate of CD-SVDD is significantly better than that of soft-boundary deep SVDD.At a significance level of 0.05, the computational efficiency of CD-SVDD is also significantly better than that of one-class deep SVDD.These results demonstrate the effectiveness of applying convex optimization methods to the solving process.
It is important to note that, based on the results presented in Table 4, Table 6, and Table 14, the time required to solve the convex optimization problem (20) in our CD-SVDD is very minimal and can be considered negligible.Nevertheless, this optimization process plays a crucial role in our method by allowing for higher prediction performance and faster training speeds through an exact solution procedure.These findings further highlight the validity and effectiveness of utilizing convex optimization to calculate R and C in our CD-SVDD method.

C. RESULTS ON ν-PROPERTY OF CD-SVDD
We verify the ν-property of Theorem 2 using five recorded benchmark datasets.In the interest of brevity, we present results for only one class of training samples, labeled 0 for each dataset.The training error ratio and the support vector ratio curves, which change with the parameter ν, are shown in Fig. 4.
As the value of ν increases, the support vector ratio and training error ratio generally rise, indicating that ν controls the volume of the hypersphere.A larger value of ν may result in more training samples lying outside the hypersphere.It is obvious that the support vector ratio is never less than ν, and the error ratio is never greater than ν.This observation confirms Theorem 2. Other datasets have similar conclusions.

VI. CONCLUSION
In this paper, we introduce the Complete Deep Support Vector Data Description (CD-SVDD) and propose an efficient solving algorithm that accurately computes each parameter using optimization methods with fast computational speed.By training the parameters of the neural network and the modified-SVDD jointly and updating them alternately, we achieve faster convergence of the objective function and higher prediction accuracy.Additionally, we demonstrate the ν-property of CD-SVDD, which not only sets the upper bound of the proportion of mapped observations outside the hypersphere, but also serves as the lower bound of the proportion of support vectors, improving parameter selection and model interpretability.Our numerical experiments on two image datasets and five recorded benchmark datasets fully demonstrate the superior performance of CD-SVDD in prediction and computational efficiency, as well as the ν-property.However, the performance of CD-SVDD is still closely tied to the structure of the neural network, and it remains a challenge to find a structure that can consistently perform well on all datasets.One of our future goals is to explore the use of more complex neural network architectures, such as long-short-term memory (LSTM), to enhance the ability of our method to handle sequential data.

FIGURE 1 .
FIGURE 1.A schematic diagram of CD-SVDD.In solving the CD-SVDD, Rθ and c θ are represented by the output results of the neural network, which can be directly updated by the neural network parameter θ.That is why we define our method to be ''complete''.

Algorithm 1
C. A JOINT ALTERNATE ALGORITHMFirstly, R and c are updated by solving the dual variable α in optimization problem(20) with the initial parameter θ of the neural network.After that, θ is updated according to(23).Parameters are trained alternately until the primal problem (6) converges.The pseudo code is given in Algorithm 1. Joint Alternate Algorithm of CD-SVDD Input: training data X , hyperparameter ν Output: R * , c * , θ * 1: Initialize neural network parameter θ 2: while objective function (6) does not converge do 3: α * ⇐ solving convex optimization problem (20) with φ θ (X ) 4: θ ⇐ back propagation algorithm to minimize the loss (23) 5: end while 6: R * , c *

FIGURE 2 .
FIGURE 2. Some examples of image datasets.The left is from CIFAR-10.The right is from CIFAR-100.

3 .
Iterations and time (in seconds) of three deep models on CIFAR-100 dataset.

FIGURE 4 .
FIGURE 4. The changing curves of training error ratio, support vector ratio with different values of ν.

TABLE 1 .
Statistics of two image datasets.

TABLE 2 .
Statistics of five recorded benchmark datasets.

TABLE 3 .
The optimal AUC values (in Percentage) of six methods on CIFAR-10 dataset.

TABLE 4 .
Computational iterations and time (in Seconds) of three deep models on CIFAR-10 dataset.

TABLE 5 .
The optimal AUC values (in percentage) of six models on CIFAR-100.

TABLE 6 .
Iterations and time (in seconds) of three deep models on CIFAR-100 dataset.

TABLE 7 .
The optimal AUC values (in percentage) of six models on five recorded benchmark datasets.

TABLE 8 .
The average AUC values (in percentage) of six models on five recorded benchmark datasets.

TABLE 9 .
Sign test of model precision in seven datasets.

TABLE 10 .
The mean rank of AUC value of six models on five recorded benchmark datasets.

TABLE 11 .
The statistics of friderman test.

TABLE 12 .
Sign test of model precision in seven datasets.

TABLE 13 .
The optimal F1 score (in percentage) of six models on five recorded benchmark datasets.

TABLE 14 .
Iterations and time (in seconds) of three deep models on five recorded benchmark datasets.

TABLE 15 .
Sign test of model efficiency in seven datasets.