A New Class of Bessel Kernel Functions for Support Vector Machine

In this paper, we construct a Bessel-class kernels for Support Vector Machine. This new class of kernels are proved that they are continuous and satisfy Mercer’s condition. The presented Bessel-class kernels can degenerate to Gaussian kernel in an infinite smooth case. Compared to the other kernels, these present Bessel kernels can be flexibly applied to classification and regression with fewer constants to be adjusted. Additionally, four simulation experiments including classifications and regressions, have been carried out to show the good performance of these Bessel-class kernels.


I. INTRODUCTION
Support Vector Machine (SVM) is one of the competitive methods for classification and regression in machine learning(ML), and has been widely used in the areas of pattern recognition, speech recognition and text classification, etc., [1], [2], [3], [4].SVM is based on the structural risk minimization principle and capacity concept with pure combinatorial definitions [5], [6], [7].Compared with the traditional methods which minimize the empirical training error, SVM can avoid local minima by solving quadratic programming problem of convex objective function with a linear set of constraints [8], [9].Furthermore, the quality and complexity of the SVM solution does not depend on the dimension of the input space [5], [6], [7].
Based on the statistical learning theory, SVM belongs to the class of Kernel Methods.By choosing different mapping function ϕ(x), one can map the training data into a higher dimensional input space, and give an optimal separating hyperplane in that space.It is noted that SVM has advantage that it does not need the mapped patterns ϕ(x), and only need a kernel function K (x k , x l ) which involves the dot products of these patterns instead.As mentioned before, the kernel The associate editor coordinating the review of this manuscript and approving it for publication was Jon Atli Benediktsson .TABLE 1.Some traditional kernels [11].
function can be regarded as a similarity measure between the input objects.It is also emphasized that the kernel function must satisfy Mercer's condition [10].The mapping from the input space into the feature space is explained as well as the ''Kernel Trick''.With different kernel, SVM can be used to solve the specific problem.Table 1 lists some traditional kernels, which can be applied for most of general problems for SVM.Besides the linear kernel, the other kernel functions based on polynomial, sigmoid function, Gaussian radial basis function(RBF) listed in Table 1, have been successfully applied for many science and engineering problems.
The performance of SVM seriously relays on the kernel.Therefore, the choice of the kernel functions and the corresponding parameters is a key problem for an SVM [12].To the authors' knowledge, there are no good theories to select the best kernel function.Most of the selections for the kernel functions in SVMs depend on the experience of researchers without theoretical guidance.At present, most researches have studied the performance of SVM utilizing different kernels.Guo and Zhang et al [13] proposed a method using a kernelized multi-class support vector machine with a fast version of recursive feature elimination.Their proposed feature selection algorithm was efficient and worked well for problems of the large number features.Wang and Zhang et al [14] proposed a kernel function selection mechanism for support vector machine(SVM) under sparse representation.In their paper, the composed kernel function, which was suitable for the given data, can be selected according to the selection mechanism.Basing on the idea of enlarging the spatial resolution to increase separability between classes, Amari and Wu [15] proposed a method of modifying a kernel function.With the help of their method, the performance of a support vector machine classifier was improved.They also extended the method in Ref. [15] to improve the performance of SVMs by conformally transforming kernel functions in a data-dependent way [16].Based on Gaussian RBF and Polynomial kernels, their simulations for two artificial data sets revealed that the method is effective.
It is noted that the most used kernels are those listed in Table 1, or the combinations of those functions.The linear kernels are suitable for linearly related data.The polynomial kernels are generally applied for the global characteristics of the sample, which have strong generalization ability.The Gaussian RBF kernels have strong local effect to training data, but they are not very good to the overall characteristics of the data.It is also mentioned that the polynomial kernels and Gaussian RBF kernels are difficult to adjust the parameters in some cases.According to the features of the above kernels, one can also establish mixed kernel for some complex situations, instead of the single kernel.If all of the above approaches can not be optimized for selection, it seems that SVM is not valid to be applied for a specific ML problem.In the authors' opinion, the poor performance is induced by the unsuitable kernel.When a good kernel is used, SVM can achieve good results.Therefore, it is necessary to search new kernel type for SVMs.In this paper, we have presented a type of Bessel-class kernels for SVM in ML problem.This new class of kernels are continuous and satisfy the satisfies Mercer's condition [17].These Bessel-class kernels can degenerate to Gaussian kernels in an infinite case.Additionally, this class of Bessel kernels can be flexibly used with only parameter to be adjusted, which can greatly simplify the computation in ML procedure.
A brief outline of this paper is as follows.In Section II, some basic formulas on SVMs for classification and regression problems are briefly reviewed to ensure the completeness of this paper.In Section III, a new class of kernels based Bessel functions are given and proved.In Section IV, four simulation experiments including classifications and regressions have been given to verify the validation of the new presented kernels.Some conclusions are drawn in Section V.

II. SVM FORMULAS
In this Section, some basic formulas on SVM for classification and regression problems are shortly reviewed.For all further details, one can go to those references [5], [6], [7].

A. FORMULAS OF CLASSIFICATION
SVM is based on a context of convex optimization theory.In the primal weight space, the nonlinear SVM classifier is defined as where w ∈ R n , b ∈ R, ϕ(x) is map function, which maps x into a higher dimensional feature space which can be infinite dimensional.
Consider a set of given training vectors x i ∈ R n (i = 1, . . ., N ), in two classes, and the indicator (feature) vector y ∈ R N with y i ∈ {−1, 1}, the primal SVM for classification can be written as follows min w,b,ξ where ξ k is slack variable, c is a positive real tradeoff constant.By applying Lagrangian, Eq.( 2) can be solved via the dual problem where α k are Lagrange multipliers, and K (x k , x l ) is the kernel function as follow After the dual problem is solved, the vector w can then be given as by using primal-dual relationship.Consequently, Eq.( 1) can be written by 5358 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

B. FORMULAS OF REGRESSION
For a set of given training data {x k , y k } (k = 1, . . ., N ), the problem for regression in the standard form of SVM can be written as which is similar to Eq.( 1).The quadratic programming(QP) problem for Eq.( 7) can be given by min where ε > 0 is the tolerance of accuracy for function estimation, ξ k and ξ * k are the slack variables.The dual problem for Eq.( 8) is where α k and α * k are the Lagrange multipliers.The vector w can also be given by In the dual space, the function y(x) can be written as follow

III. KERNELS AND KERNEL TRICK A. CHARACTERISATION OF KERNELS
It is known that the linear SVM has been extended to a nonlinear technique by Vapnik in 1995 by introducing a kernel function K (x k , x l ), which is obviously the inner product of map function ϕ(x) as shown in Eq.( 4).Fortunately, it is not necessarily to know the explicit expression of the mapping, since that one can evaluate the inner products of map functions in Hilbert space by following result.According to Hilbert space theory, a symmetric, continuous function K (•, •) satisfying Mercer's condition [17], there exists an expansion where x, z ∈ R N , and λ k > 0.
Mercer's condition requires that where g(•) is any square integrable function.
It is noted that the integral is taken over a compact subset of R N and the kernel function can be written in the form of inner product as shown in Eq.( 4).Furthermore, we have the following proposition, which is equivalent to Eq.( 13).
Proposition 1: Let X be a finite input space with K (x, z) a symmetric function on X.Then K (x, z) is a kernel function if and only if the matrix is positive semi-definite (has non-negative eigenvalues).

B. BESSEL KERNELS
It is known that the kernel trick is powerful to handle the non-linear separable problem, by mapping the inseparable data into a higher dimensional space.Although there exists many types of kernels for SVMs, as mentioned before, there is in general no best choice for all of the problems.Generally, the polynomial kernel in low order or Gaussian kernel can be chosen to be a good initial try.The present study in this paper focuses on a new class of kernels based Bessel functions where The function given by Eq.( 15) is continuous, has maximum value K v (x, x) = 1 for x = z (||x − z|| = 0) and degenerates to zero as ||x − z|| → ∞.
proof: From the Poisson integral, the Bessel function can be written as [18] With the help of Eq.( 15), one has From above equation, it is obviously K v (x 0 , z 0 ) exists for arbitrary ||x 0 − z 0 ||.Consider that If x 0 → 0 and z 0 → 0, then ||x 0 − z 0 || → 0 holds.With the help of Eq.( 18), one has lim ≥ 0, we have Eq.( 20) implies that the K v (x, z) has maximum value if x = z (||x − z|| = 0).Furthermore, we also have The Bessel function has the expansions as follows [18] J Thus, one can obtain according to Eqs.( 15), ( 22) and (23).□ Theorem 1: The function given by Eq.( 15) satisfies Mercer's condition, and is a kernel function.
There exists two ways to prove THEOREM 1.
proof: From Eq.( 15), it is obviously that the function satisfies K v (x, z) = K v (z, x).Furthermore, K v (x, z) is proved to be continuous by LEMMA 1.The function K v (x, z) can be related to Hankel transforms as follow [18] where the subscript ||ω|| = 1 denotes the surface integral over the unit sphere in R N .
k,l=1 can be given as follow From Lemma 1, one has 1 . With the help of Hurwitz theorem, the determinant and subdeterminant det([K v (x k , x l )] P k,l=1 ) > 0 (P = 1, . . ., M ) hold.When another train data x M +1 is considered, the train set is then {x 1 , . . ., x M , x M +1 }.We can force the (M + 1)-th vector to move from ∞ to the real situation ) > 0 (P = 1, . . ., M + 1) hold as the (M + 1)-th vector moving to the real situation.
Based on the above mathematical induction, matrix [K v (x k , x l )] M k,l=1 is positive definite.□ Form the above Proofs, one can see that the function given by Eq.( 15) is kernel function.Furthermore, PROOF 2 shows that the matrix of function (15) is not just positive semidefinite, but indeed positive definite.It could be concluded that the proposed Bessel-class kernel function has some good properties over the other kernels.

C. RELATION BETWEEN BESSEL KERNEL AND GAUSSIAN KERNEL
Since that Gaussian kernel has good property, it is widely used in SVM.However, the presented Bessel-class kernels constructed in this paper share this property.We will prove that Gaussian kernel is a type of Bessel-class kernels in an ultimate case.
Theorem 2: The Bessel-class kernel K v (x, z) will recover the Gaussian kernel, if v → ∞, i.e., where σ = √ 2v.proof: From Eqs.( 15) and ( 22), one has where v → ∞ is applied.□ Theorem 2 illustrates that Gaussian kernel is an infinite smooth case of Bessel-class kernels.It is also shown in Fig. 1 that the parameter v control the kernel shape in such a way that the curve becomes flat when v increases.

IV. SIMULATION EXPERIMENTS
To evaluate the performance of the presented Bessel-class kernel in SVM, four experiments including classifications and regressions have been given in this section.In the simulations, the 5-folds cross-validation has been used, and the optimal parameters are searched by using the Grid Search (GS) and Particle Swarm Optimization (PSO) based on crossvalidation.In practice, it is found that PSO is faster than GS.Furthermore, the features are normalized by for the features of every sample.

A. RAISIN GRAINS CLASSIFICATION (2 CLASSES)
In this subsection, a benchmark problem for raisin grains classification is performed.The datasets used are taken from website [19].The data consist of 900 instances with 7 attributes, which are used to make a binary decision on the variety of raisin: Kecimen(450 pieces) or Besni(450 pieces).
To perform the classification by SVM, we use sequential selection of 700 training data and 200 testing data, in which the two classes of the raisin grains are equal.
In Reference [20], Cinar et al. have given the benchmark by applying the Logistic Regression (LR), Multi-Layer Perceptron (MLP) and SVM with Polynomial kernel.In this paper, we perform the classification using SVM with Gaussian kernel and Bessel-class kernels.The parameters here, are c = 4096, σ = 8.00 for Gaussian kernel, c = 0.5 for Bessel-class kernels with v = 0,1/2, c = 1000 for Bessel-class kernel with  The performance measures of accuracy, sensitivity, specificity and precision are listed in Table 2.It is seen that the performances of the SVM utilizing Bessel-class kernels (especially, v = 16) are fairly good for the training set and test set, respectively.It is noted that Gaussian kernel is an infinite smooth case of Bessel-class kernels, which can be found in Fig. 1.The Gaussian kernel (σ = 8.00) is in case of Bessel kernel with parameter v = 32 under relationship σ = √ 2v, if v ≫ 1.Therefore, it is seen in Table 2 that, the performance of SVM with Bessel-class kernel (v = 32) is the same as one with Gaussian kernel (σ = 8.00).Additionally, it can be found that the performance of Bessel-class kernel with parameter v = 16 is the best for the specified train and test data set in this numerical experiment.Some of the performances based on the testing set are not so better than those of training set.It is interpreted that the results obtained in Reference [20] are based on the training set of 900 instances.In this paper, the instances are divided in training set and testing set, in which the number of training samples in this paper is much smaller than that of Reference [20].

B. IRIS CLASSIFICATION (3 CLASSES)
In this experiment, a flower iris classification is given to verify the validation of the presented Bessel-class kernel in SVM.The datasets from the UCI machine learning repository [21] include 150 instances with 4 attributes, which are used to make the 3 classes decision: Setosa (50 pieces), Versicolour (50 pieces) and Virginica (50 pieces).In the numerical experiment, 90 training data (30 pieces for each class) and 60 (20 pieces for each class) testing data are used to train and test the model by SVM based on ''one vs one'' approach.
We perform the classification using SVM with Gaussian kernel and Bessel-class kernels, comparing with the Bayesian-class classifiers [22]  The performance measures of accuracy are shown in Table 3.The accuracies of Bessel-class kernel are 100% for training set, 93.33% and 96.67% for testing set, respectively.It can been found in Table 3 that, Bessel-class kernel can achieve 100% accuracy in train set with different v. Furthermore, the accuracy increases to 96.67% when the parameter v is larger than 16.Additionally, one can see that the parameters σ = 5.65 and v = 16 satisfy the relationship σ ≈ √ 2v.Therefore, it is illustrated in Table 3 that the performance of Gaussian kernel (σ = 5.65) is the same as that of Bessel-class kernel (v = 16), since that Gaussian kernel is a special case of Bessel-class kernel.It is seen that the SVM utilizing Bessel-class kernels give the excellent performs for all of the training set and testing set.

C. ONE-DIMENSIONAL FUNCTION REGRESSION
This numerical experiment will show the validation of the Bessel-class kernels in the one-dimensional function regression.Consider a polynomial equation as follow The relative error for this function regression is defined by where x k is testing point, ŷ(x k ) is the numerical result obtained by SVM.
In the numerical experiment, 20 training data and 50 testing data are set uniformly in the interval x ∈ [−5, +5], respectively.To compare the performances of different kernels, we test the problem with different kernels, i.e., Gaussian kernel, Bessel-class kernels with v = 0,1/2, and 1.The parameters here, are c = 32768, σ = 8.00 for Gaussian kernel, c = 2200 for Bessel-class kernels, respectively.It is shown in Fig. 2 that, the Bessel-class kernels have significantly lower error than Gaussian kernel.Furthermore, the relative error decreases when the parameter v increases.It implies that the Bessel-class kernels have advantages over the Gaussian kernel in some problems of function regression.

D. ALE REGRESSION IN SENSOR NODE LOCALIZATION PROCESS
The last numerical experiment including ALE regression is proposed to show the performance of the presented Bessel-class kernel in SVM.The datasets given by repository on the website [23] include 107 samples with 4 features.To compare the results with those in reference [24], we have used 75 data for training and the left 32 for testing.In the numerical computing, the Gaussian kernel and Bessel-class kernels are used to solve this problem.The parameters for this experiment are c = 32, σ = 2.00 for Gaussian kernel, c = 300 for Bessel-class kernels with v = 0,1/2, 1, and c = 330 for Bessel-class kernels with v = 2, 4, respectively.
After training the model, the predicted ALE results have been given by 32 testing data with Gaussian kernel and different Bessel-class kernels.From Fig. 3, it is found that predicted results agree well with the simulated results and gathered along the straight regression line with mild scattering points.The most scattering points are in 95% Confidence Interval (CI), which implies that the regression line has a strong positive correlation (R) and relative small Root Mean Square Error The detailed value on R and RMSE are list in Table 4.The predicted results obtained by Bessel-class kernels have stronger positive correlation R than those of polynomial kernel.Furthermore, it is shown that the predicted results obtained by Bessel-class kernels are better than the first two results by polynomial kernel.It is illustrated in Table 4 that the performances of Bessel-class kernel with v = 2 and 4 are much more suitable to ALE regression than polynomial kernel and Gaussian kernel.As mentioned before, the Gaussian kernel is the ultimate case of the Besselclass kernel, the prediction given by Bessel-class kernels are nearly close to that obtained by Gaussian kernel.Therefore, it is found in Fig. 3(a) and (e) that, the prediction results for ALE are similar by using Gaussian kernel (σ = 2) and Bessel-class kernel (v = 2) with the relationship σ = √ 2v.In these cases (σ = 2, v = 2), RMSEs are not the same, since that the parameter v is not large enough.Furthermore, it is emphasized that R increases and RMSE decreases, when the parameter v increases which can be found in Table 4.

V. CONCLUSION
A new type of Bessel-class kernels are presented for SVM in this paper.These new kernels are continuous and satisfy the satisfies Mercer's condition.It is proved that the Gaussian kernel is an infinite smooth case of Bessel-class kernels, if v ≫ 1.This class of Bessel kernels can be flexibly used with different v.For a fixed v, there only exists one tradeoff parameter c to be adjusted, which greatly simplifies the computation in machine learning procedure.Four simulation experiments including classifications and regressions, have been performed to evaluate the validation of the Bessel-class kernels.These simulated results obtained by these kernels have performed significant well in those simulation experiments.Since that there is no best kernel for all the applications, the studies of Bessel-class kernels are still an open subject.The further work for this type of kernels will be done by authors in the future.

FIGURE 2 .
FIGURE 2. Relative errors for the regression based on different kernels.

FIGURE 3 .
FIGURE 3. Prediction results for ALE using different kernels.

TABLE 4 .
Benchmark of the results obtained by different kernels.