Using Hierarchical Likelihood Towards Support Vector Machine: Theory and Its Application

The H-likelihood method proposed by Lee and Nelder (1996) is extensively used in a wide range of data. In terms of direction, repetitive measured data within classification can be examined employing hierarchical generalized linear models (HGLMs). Whether we are concerned in multiple endpoints which are correlated, instead Multivariate Double Hierarchical Generalized Linear Models (DHGLM) can be taken into consideration. This article addresses the implementation of this principle to vector selection and support machines. Based on the analysis with the fish morphology class Sardinella lemuru (Bali sardinella) and setting the best epsilon 0.7 cost 4 parameter reaching best performance: 0.2327401. Predictive value of fish sex was calculated 0.997319 and Region under the curve: 0.8967. At the same time, we extend the large-scale case studies for stress testing of the SVM method by using three datasets from UCI machine learning repository including the bank marketing dataset, the car evaluation database and human activity recognition using smartphones dataset. In a nutshell by employing SVM-DHGLM increased the accuracy, precision, recall, for feature selection and classification. Long story short, the $H$ -likelihood provides an excellent and usable structure for statistical inference of the unobservable general deterministic model, while preserving the advantages of the original probability structure for fixed parameters. We presume that more new groups of models will be created and that the $H$ -likelihood will be commonly used for their inferences and the application in big data and machine learning.


I. INTRODUCTION
Data mining, also referred to as knowledge discovery in databases (KDDs), is a procedure that involves processing as well as using historical data to identify patterns and relationships, associations or relationships in massive amounts of data [1]. Data mining implementation could be used to support future decision-making [2]. Machine learning is a branch of artificial intelligence or data analytics that deals with the development of algorithms that can be configured to learn from previous information. Machine learning is indeed a computational method for data mining. Data mining The associate editor coordinating the review of this manuscript and approving it for publication was Md Asaduzzaman . required to learn processes are frequently divided into two main types, respectively supervised and unsupervised [3]. During the first, unsupervised learning approach, the learning method is applied without any training stage without any target or learner, throughout this case the data label [4].
There are already several learning algorithms that have already been established, which include K-Nearest Neighbor [5], Artificial Neural Network [6], Naïve Bayes [7], Support Vector Machine [8] and many more [9]. This algorithm has strengths and weaknesses [4]. However, all implementations have had the same concept, in other words learning because then, at the end of the training, the model can clearly identify -input data mostly on output class label. In this method, to properly assess the decision function, the VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ separator function or indeed the regression function, the data which has labels or goals is used to learn. Forward to training with sufficient data, machine learning, instance SVM, could be used to predict or identify what decisions have been made if new data is entered and also the result is unspecified [10]. Unless the performance either at time of testing by itself doesn't perform as expected, the parameters of the machine learning function can be modified to boost the accuracy of the model. Throughout the case of SVM, the parameters almost always configured are cost, epsilon, gamma parameters [11], [12]. Forecasting or predictive processes in data mining can be classified into two major groups, respectively classification and ability to apply knowledge or regression [13]. In classification, the output of each data is an integer or discrete number, while being in regression, the output of each data has been said to be continuous [14]. The construction of a classification model in SVM is based on risk minimization, which contributes throughout the ability to explain problems well though and overcome overfitting [15]. Including its generalizability, SVM is capable of producing high precision and a comparatively modest error rate. In its development, SVM has been successfully used to solve problems in various fields, including hyperspectral data [16], Alzheimer's disease [17], [18], thyroid disease [19]. Besides in ecology [20], including coral reef fish detection and recognition in underwater videos [21] and remote sensing [22]. The ability of SVM as a classification method can be compared with other classification methods [23].
The SVM approach has some limitations, particularly regarding it has difficulty determining the optimal parameter value. In fact, setting the parameter values correctly will improve the classification accuracy of the SVM model [24]. In order to get the parameters that generate the best Classification models, the optimization of the parameter is performed in the SVM model [25]. Optimization of these parameters means the determination among the most efficient hyper parameter for the SVM model and the implementation of the SVM model with the best classification performance [26]- [28]. The Grid Search method is the most commonly used method of optimizing the process parameters [29], [30].
The consideration of a good and sufficient statistical method for the needs of the study is a basic principle in the conduct of statistical analysis [31], [32]. GLM, which also has a random effect as a predictor, can indeed be modified to HGLM [33]. The whole strategy does indeed have a lot of advantages that GLM doesn't provide [34]. The aim of inferential statistics is to derive from the results the findings of the analysis accompanying the population. HGLM is a statistical modeling that appears to offer consistency in modeling and broad application, including the use of penalized-likelihood scenarios [35], factor analysis ordinal data that can be used for structural equation modeling [36], modeling longitudinal outcomes time to event [37].
Lee and Nelder [38] address inferential statistical models which mostly combine unobserved random variables, with the exception of output value, missing data, latent variable, component, and potential outcome. Although there are benefits of using H -likelihood [39]. With that being said, we can develop fast and efficient computational techniques for model fitting. Furthermore, we can integrate inference statistical models for unobserved samples and then predict the model's output value by including these unobserved variables. This same marginal probability function is used for the inferential fixed effects of both the classical approach and the h-likelihood. Even so, for only certain case studies, the Laplace approximation [40]- [42].
Therefore, this article use H-likelihood to establish a support vector machine by using two case studies. First, the feature selection morphology of Sardinella lemuru (Bali sardinella) and the second case by using three datasets from UCI machine learning repository including the Bank marketing dataset, the car evaluation database and Human Activity Recognition Using Smartphones Dataset. Section 2 will address the principle of H likelihood, explicitly and in depth how to derive it, as described in Section 3. In line with this, the actual data set implementation is in Section 4. Finally, the conclusion and possible studies are discussed in Section 5.

II. THE PRINCIPLE OF H-LIKELIHOOD
We aim to have a paradigm shift from the probability-based reasoning to the extended likelihood-based reasoning. Each paradigm constitutes a world (normal science) characterized by basic propositions. Normal science will be very fruitful in describing by deductive reasoning till anomalies (paradoxes) arise.
In modelling techniques random effects perhaps with a flat prior, the H -likelihood is equivalent to a Bayesian posterior distribution as well as is not controversial for a Bayesian statistician. Figure 1 represents either Bayesian and frequentist may readily embrace the notion of statistical probability [43], [44]. It enables most Bayesian credible interval interpretations and frequentist confidence interval interpretations. As a result, the H -likelihood method attempts to combine the two worlds of frequent statistics and Bayesian statistics, which may be conflicting in most [45].
Hierarchical GLMs (HGLMs) had already formed by [46], [38] which seem to be the GLMs in which the linear predictor includes either fixed and random effects which described as conditional on random effects u, reactions y obey the GLM family of distributions [39] which represents in Eq 1.
To begin with, we can write the double hierarchical GLM by Eq2 below: where is a diagonal matrix of φ ij = exp (G T ij ϒ + F T ij b i ) The HGLM model is confirmed by additionally setting various models to the random effect. It can be lengthened. Lee and Nelder set a model for the mean and scatter of random effects Introduced the model of Double HGLM. The multilevel generalization model is Expanded into a three-element dual multi-level generalized linear model. Under the conditions given random effects (a, u), the response variable y satisfies E (y|a, u) = µ and var (y|a, u) = φV (µ). Under the condition given random effect u, linear prediction function for µ (linear predictor) is given in the form of the following HGLM in Eq3.
The scattering parameter λ for the random effect u has the following HGLM form.
Eq4 describes where h() is the link function and G and F are model matrices. Random monotony for number g D () with b = g D (a) is a random effect, and γ is a fixed effect. Random effect. The parameter α for a has the form of the following GLMs. The dispersion model for φ and ϕ is in the form of HGLM in Eq5.
where h D is the GLM connection function and γ D is the fixed effect. Likelihood functions for dual multi-level generalization models (DHGLMs) can defined in Eq 6.
Therefore, Eq7 represents H -likelihood type for inference from dual multi-level generalization models (DHGLMs) where function of f (y|v, b; β, φ), function of f (y; λ), and function of f (b; α) each (v, b) are given y, u and is the conditional distribution of b respectively. Besides, Eq8 with y express as marginal likelihood L v,b which can be written as follows: Then, the marginal likelihood L v,b provides a rational reasoning process for fixed parameters. But,In the case of general reasoning, there is no information about random parameters In line with this, Lee and Nelder are the evaluation criteria for

III. NEW FRAMEWORK A. EMPLOYING IN SUPPORT VECTOR MACHINE
The basic objective of the support vector machine is to find a separating hyperplane that optimizes the link between different classes. It is also established the new information, where it would be included in the predictions of a classification purpose, or this condition may well be called a best generalization. The model created by the SVM depends on only the subset of the training, although the cost function of developing the model does not directly affect the training data further than the margin. Eq9 as one of the separation boundary planes, the hyperplane.
where β is weight coefficient vector β 0 as a bias β = 1. Besides, Classification criteria centered on this hyper- The distance between the support vector and the optimal hyperplane is C = 1 β = . Therefore, it can be represented as the largest of the vertical distances between the training vector X i and the hyperplane is called margin. And the only hyperplane that maximizes the margin C > 0 of the training materials. Find it. In other words, to solve the SVM, the following block optimization. It becomes a problem.
However, the above-mentioned formula is applicable only to linearly separable sets. All. However, it is hard to find hyperplanes that perfectly separate these vectors [12]. Therefore, considering the vectors included in the domain of other groups, the slack variable ξ 1 , ξ 2 , . . . , ξ n this method Produces places constraints like this: ξ Re-expressing the constraints for the convenience of calculation, the expression in the form of consensus.
Eq10 represents C tuning parameter plays the role importance. C points to be considered as the support vector you will get a lot of numbers. Using the Lagrange multiplier β i , β 0 , ξ i solving the problem about in Eq11.
Substituting can be expressed as: Variables to be estimated according to Kuhn-Tucker theorem β i , β 0 , ξ i be made of From the saddle point of space. Besides Eq12 explains the relationship between Lagrange multiplier and conditional expression is shown below.
The values satisfying the above conditional expression are the relationship between ξ i and support vector seen earlier. It is the basis for contentment and a unique solution. Now L D is a regular convex.

Similar to the SVM estimation equation obtained Eq13 the fixed effect of the Bernoulli distribution and Double
H -likelihood with random effect of multivariate normal distribution and chi-square distribution. More instance v = (v 1 , v 2 , . . . , v 1n ) and for each point w.
Eq13 represents of h is h-log likelihood and p v (h) = is adjusted profile log likelihood. In addition, λ and α corresponds to the tuning parameter of SVM. In line with this, β 0 and v to maximize h, which estimate w. To run the estimation can be perform by The iterative weighted least squares and Newton-Raphson method are used for estimation. Repeat the process below until convergence. h estimates the w t , p w (h) can estimate by v 0(t) , p w (h) can estimate by β 0(t) , w t , v t , β 0(t) Estimate w t+1 , v t+1 after update, respectively. Also, Estimation of β 0(t+1) with p w+1,v+1 (h).

A. EMPLOYING IN SARDINELLA LEMURU (BALI SARDINELLA) DATASET
Sardinella lemuru (Bali sardinella) such as a species of ray-finned fish as in genus of Sardinella discovered throughout the Eastern Indian Ocean as well as the Western Pacific Ocean, inside a neighbourhood which reaches from southern Japan through all the Malaya Archipelago to Western Australia. To begin with, Table 1 and Figure 2 are describes descriptive statistics regarding morphology. the standard deviation is a measure of both the amount of variance or summarizes features of a collection of variables. A small standard deviation indicates that the values tend to be identical to the mean value, while a high standard deviation indicates that the data are distributed over a broader variety. More clearly, total length, fork length, Standard length with sex type contain the differences compares other variables. The suitability judgment of the model is confirmed through two indicators. First, different Find the sum of the numbers of misclassified colors. Second, distribution of data The probability of the region in error is calculated by using a random  sample that knows. These two indicators are named Misclassification and Test error, respectively. Misclassification = N i=1 Iˆy i (y i ). Test error = ∫ y=0 P ŷ i = 1|y = 0 P (y) dy + ∫ y=1 P ŷ i = 0|y = 1 P (y) dy. The smaller the misclassification and test error, the better the model fit. The SVM model using Hinge loss uses the svmpath R package [47]. It was fitted and compared with the model of the DHGLM method. First, the Radial kernel and set each d and tuning parameter the model is estimated by varying the levels of λ and α a . We now define the conditional Akaike information VOLUME 8, 2020 criterion (AICc) for double hierarchical generalized linear model to be used in model selection. Assume the true conditional distribution of y is f 1 (y | u, a) and that u are the true random effects vector for mean and dispersion with distributions f 2 (u) and f 3 (a), respectively. The prediction dataset is y * such that y * and y are independent conditional on u and a, and from the same distribution f 1 (. | u, a) In other words, y and y * share the same random effects u and a, but differ in their error term in Eq 14.

AICc
Assume, the data y are generated from the DHGLM. Then, the AICc is given as follow in Eq15.
In this article, using 11 independent variables and prediction of the sex status of fish will be carried out, namely Male (1) and Female (0). In the first stage, feature selection can be performed by finding the mean value for sex as described in Table 2. Fish sexuality is split into different, male and female. Secondary and primary sexual features are used to differentiate between male and female fish. Primary sexual characteristics are characterized by the presence of organs that are directly connected to the reproductive process, including ovaries and vessels in female fish and testicles and vessels in male fish. Meanwhile, secondary sexual characteristics are useful for distinguishing male and female fish which can be seen from the outside or in terms of morphology. Secondary sexual characteristics can be divided into two parts, namely secondary sexual which is temporary which only appears during the spawning season and secondary sexual which is permanent which occurs at the time before and after spawning. Sex comparisons can be used to predict the success of spawning, namely by looking at the ratio of the number of male and female fish in the waters, it also affects the reproduction, recruitment and conservation of these fish resources.
Features are classified as raw measuring operations and are executed on a group of objects that encourages that classification to be separated from many other categories in the same broad class. Although features may not have been easily accessible in data sets given the high dimension of raw data, two priorities became commonly regarded for both the extracting features in classification tasks. Reducing the dimension in the sequence space might well reduce the size of the original data by selecting the features that still represents the data adequately and precisely.
Rendering yet most effective features for ensuring classification performance the feature selection is necessary and also has a profound effect on classier application. Unless the features indicate significant differences through one class to the, this same classifier can indeed be designed quite quickly and efficiently with performance improvement. In this case Transform based dimensionality reduction uses a transformation function to extract features from raw data. Linear transformations are popular and widely use due to its convenience, whereas nonlinear transformations have increasingly become recommended and examined.
A linear feature extraction method stems from Y = A.X , where Y R n and R n stands for all vectors with n components, each of which is a real number. A is generally a n × m matrix taking a vector with n components and generating a vector with m components. There are several transformations which are used very commonly. Transformation technique has a certain application domain, including benefits and drawbacks.  Transformation functions A can usually been divided into two main categories.
In some cases, we may describe transformation on the basis of inhibitory understanding. However, the theories of likelihood at a certain time did not attract sufficient enthusiasm in the statistical community. That principle of likelihood was indeed a result of the much more accepted principles for sufficiency, that even a reasonable fixed comprehension all of the factual information from of the study as well as the circumstance that perhaps the experimental studies never carried out that are insignificant to its implication. Although since, the concept of likelihood has become one of the fundamental principles of inferential statistics. Possible future methods have been suggested in areas including the calculation of nuisance parameters Based on Table 2, it can be explained that the only variable that can be used to classify the sex of fish is Body depth because the value is below 0.1. However, the variable Head length and Pre-orbital length are tentative because their values are below 0.50. Thus, the remaining other variables can be used for sex classification of Sardinella lemuru (Bali sardinella).
From the expert system, it was explained that visually what can distinguish the sex of fish is based on the Total length, Head Length, Eye Diameter, and Fork length so that Figure 3 and Figure 4 explain the relationship of all these variables. The size ranges shown in Table 2 describe the measurement results that did not differ greatly for each sex in the same habitat. This is because the environmental conditions that take place in it are relatively homogeneous, so that this condition does not have a big influence on the size difference between male and female fish.
In addition, the proportion of food availability in the fish cage is relatively secure and has the opportunity to be divided into the same proportion, so that the opportunity to compete is relatively small. It is appropriate that the internal factor affecting fish size is the availability of food.
There are differences in morphometric characters that are also seen in each gender in Table 2, due to differences in age or sex. Environmental conditions in nature tend to fluctuate, especially in terms of competition, so that in order to defend oneself from predators, natural pressure or capture, it forces oneself to grow rapidly. Fish increase body size, especially limb parts such as fins.
This occurs because in this environment there is same-sex or different-sex competition in getting food, so that the increase in body size makes fish more agile in moving considering the limited amount of food available. In addition, a larger size is also useful for surviving other predators such as defending or avoiding predators.
The presence of a fish population in a place and the distribution of these fish species is always related to the problem of their habitat and resources. The proportion of food availability is sufficient for the population units that live in it. So that the growth is relatively good. This means that the increase in body shape is followed by the addition of other morphometric characters, especially in total length and body weight. Then based on the DHGLM-SVM technique obtained 0.997319 by sampling method: 10-fold cross validation best parameter epsilon 0.7 cost 4 with best performance: 0.2327401. Then obtained the predictive value of fish sex with 0.997319 and Figure 5 represents the area under the curve was 0.8967.

B. EMPLOYING IN UCI MACHINE LEARNING REPOSITORY DATASET
For the next case study, we use three datasets from UCI machine learning repository. First, the bank marketing dataset published in 2012 with 45.211 instances and 17 features. Second, the car evaluation database in 1997 with 1728 instances and six features. Third, human activity recognition using smartphones Dataset. For more detail, see [4]. Table 3 and  Table 4 uses seven predictors and two classes (No and Yes) with 36170 samples. Then we get the best model is SVM-DHGLM with accuracy 0.927 and 16 features. Therefore, we run the combination of random forest (RF) and SVM-DHGLM, the value of Precision was 0.937, and Recall was 0.982 and 7 best features.
Long story short, the same instrument by using the Car Evaluation Dataset. In previous study [4] using the resampling cross-validation setting of 10-fold and the tuning  parameter ''sigma'' was held constantly at a value of 0.07348688, C = 0.5 reach the accuracy 0.8346161, and kappa 0.6319634. For this cases, we run the different step by employing double HGLM, the construction step can be seen in the appendix. Table 5 explains the best model classifier is SVM-DHGLM with 6 features of importance and accuracy was 0.966. In the previous research [4], the best model was RF+RF with an accuracy of 0.9336. After we added the DHGLM accuracy was 0.966. Therefore, this technique provides higher accuracy.       it was found that by employing DHGLM increase the accuracy which represents in Figure 6. Meanwhile, DHGLM also increase precision, recall, for feature selection and classification, respectively.

V. CONCLUSION
The classified problem is defined though in a dataset of findings consists of input, predictor, and output, containing class labels, this same classifier seeks determine the relation between the predictors and the response that will require the classification of a new observation X R d towards one of A z hands to form. The true class is represented by a A = {1, 2, . . . , A z } . Therefore, the objective of label-ing is to mitigate classification errors or even the estimated consequences of misclassifications in regard for input data where some categories of mistake are more significant than some others. The cost matrix z (u, t) , u, t = 1, 2, . . . , A z which describes the cost of misclassifying a representative of the class to the class. Going smoothly, Support vector machine might essentially read and understand the training set rather than learn a more general classification technique. Future research can use robust adjusted likelihood. For the likelihood ratio statistics, it is well known that, under certain regularity conditions, the classical Fisher information has the equivalent form F (θ) = D (θ) = A (θ) where F (θ) represents fisher information, D (θ) is the variance of the fisher score and A (θ) is expected Fisher Information. Then to get new information A (θ) .D (θ) = L g u (θX) u T (θ; X) ∂θ ∂θ T . This article describes a new concept on h-likelihood for support vector machines. The classification system performs on increasing the strength of conventional likelihood function is equivalent to the SVM rule in sample space. Performance increase as well as scale parameters could be determined analytically through training results. Further research can be used and applied to image processing [25]- [48], Wi-Fi sensing [49], wide area including environment [50], [51] and disaster analysis [52]. Besides, after run UCI machine learning dataset we ensure that the comparison between the proposed SVM method and the comparative methods is fair.

APPENDIX C
Besides, Eq 17 represents the hierarchical generalized linear models defined suppose that we have response y = (y 1 , . . . , y n ) t and unobserved random variables u = u 1 , . . . , u q t , having E (y i | u) = µ i and Var (y i | u) = φ i V µ i with the conditional log-likelihood for y given as u has the generalized linear model with form.
where φ i is the dispersion parameter and θ µ i denotes the canonical parameter which can be written with this form the Eq3. Where g() is the link function, X is the n × p model matrix for fixed effects β and Z is the n × q model matrix for random effects v = g 1 (u) for some strictly monotonic function g 1 (). In addition, in Eq 18 the random effects u i are independent with dispersion parameter λ i . In line with this, for inferences Hierarchical Generalized Linear Models (HGLMs) we proposed the h-likelihood.
More detail, the likelihood for y|v and l(λ; v) is that for v represents l (β, φ; y | v). The critical issue is to select the scale of random effects in h-likelihood. Following Lee and Nelder [38] we may extend their h-log likelihood for Double Hierarchical Generalized Linear Models (DHGLMs) to the model Double Negative (DN) with NIM in Eq19 and Eq20.
VOLUME 8, 2020 Therefore, let l be a likelihood with nuisance effects α. At the same time, In Eq 21 we can considered a function p α (l) can be defined by: p α (l) = l − 1 2 log det D (l, α) 2π α=α (21) For fixed effects β the use of p β (L) is the equivalents to conditioning onβ to eliminate the fixed nuisance parameters β while random effects v the use of p v (h) is equivalent to integrating them out by using the Laplace Approximation. Therefore, this function can be used to eliminate fixed and random effects simultaneously. We showed that p v,β (h) in mixed linear models and represent as restricted likelihood.

CODE DATA AVAILABILITY
The analysis code datasets used in this article available from the corresponding author upon reasonable request. Also. The testing data sources come from three datasets publicly available from the UCI machine learning repository.