A Learning Objective Controllable Sphere-Based Method for Balanced and Imbalanced Data Classification

Imbalanced data classification is one of the most important tasks in the field of machine learning because abnormality, which is usually of our interest, appears less frequently than normality in real-world systems. Learning classifiers from imbalanced data can be troublesome due to no absolute standard as to how much imbalance can be said to be imbalanced or balanced. To address this issue, this research proposes a new sphere-based classification method named LOCS (learning objective controllable sphere-based classifier), which is designed to maximize AUC (area under ROC curve). The AUC learning objective was adopted from the fact that it approximates the accuracy as class distribution becomes balanced. Therefore, the proposed method properly performs a classification task for both imbalanced and balanced data. It constructs a classification model by a single training, whereas existing cost-sensitive learning and resampling methods usually attempt different parameter settings. In addition, the learning objective can be easily modified within LOCS for each of application domains by setting different importance levels for positive and negative classes, respectively. Numerical experiments based on 25 real datasets with several investigational settings showed the effectiveness and the intended strengths of the proposed method.


I. INTRODUCTION
One of the most important challenges in the research field of machine learning and pattern recognition is the imbalanced data learning problem, which is an issue in various practical fields, such as software defect prediction [1], medical diagnosis [2], disaster information [3], industrial maintenance monitoring [4], financial trading order [5], and customer churn prevention [6]. The problem of learning from imbalanced data is attributable to the skewed distribution of the class. Rare instances represented by minority classes are relatively difficult to detect owing to infrequency and casualness [7]. However, these minority classes -such as cancer, fraud, and faults -are considered more important than the majority class in the various practical problems listed above, and the risk of misclassification of minority class instances is higher than the risk of misclassification of majority class instances correspondingly. Traditional classification techniques such as decision trees, support vector machines, and neural networks have been developed under the assumption that the class distribution of data is balanced [8]. As the majority class in imbalanced data is composed of a relatively larger number of instances than the minority class, it has an overwhelming influence on the minority class. Therefore, a relatively small number of minority classes are underestimated during the training process, which may lead to the failure of classifiers to accurately learn the pattern or distribution of minority classes [9]. This problem becomes more severe as the degree of imbalance increases, resulting in the failure to detect a minority class instance [10]. These traditional classification methods, trained with learning objectives that maximize overall accuracy [11], can become useless if they cannot be used for classification even though high accuracy is achieved. For example, a model that classifies all instances into majority classes with an accuracy of 99% cannot be used in the case of imbalanced data with ten minority classes and 1000 majority classes. Therefore, a classification method that maximizes accuracy or minimizes error rate can be irrelevant as it results in low classification performance in imbalanced data classification [12]. A number of studies have been conducted to address this imbalanced data classification problem. These studies can be divided mainly into two approaches: data level approaches and algorithm level approaches [13]. The data level approach is a method of resizing data through resampling [14]. By doing so, the performance of the classifier can be improved by training after solving the imbalanced distribution of data. It can be divided mainly into over-sampling minority class instances and under-sampling majority class instances. Cost-sensitive learning (CSL), which sets misclassification costs differently for the majority and minority classes, is a representative algorithm level approach method.
Over-sampling is a method of reinforcing the influence of minority class instances, which consists of a process of randomly extracting and replicating instances from a set of minority instances in the data and adding them to the existing set of minority instances. In this method, the size of the minority instances set increases as much as the number of duplicated minority numbers shows a balanced distribution with the set of majority instances. A synthetic minority oversampling technique (SMOTE) generates synthetic data using more advanced over-sampling method [15]. This method does not replicate minority instances but creates new synthetic minority instances by interpolating between minority instances and their k-nearest neighbors and combining them with original minority instances to form a set of minority instances. Variants of SMOTE such as border-line SMOTE [16], adaptive synthetic sampling (ADASYN) [17], safe-level SMOTE [18], and MWMOTE [19] have been proposed since this study was published.
Under-sampling method is a method of randomly extracting instances from the majority class and removing them from the original set of majority instances. As a result, the total data size is reduced as much as the number of removed majority instances. Several strategies have been introduced for the rebalancing of minority and majority distribution by reducing majority instances more effectively. One of them is cluster-based under-sampling for the majority class that forms a cluster [20]. Furthermore, a study was conducted to substitute representatives of majority class instances with centers of clusters [21]. However, in general, resamplingbased imbalance problem-solving techniques have a weakness in that original data cannot be used. Although resampling techniques create balanced distribution data for classifier training, it is difficult to be free from information loss caused by changes in the original distribution owing to synthetic data or by excluding informative majority instances [22].
Algorithm level approaches have been attempted to solve the imbalance problem without changing the data distribution. While the resampling approach focuses on a balanced class ratio of the original data, the algorithm level approach is designed to prevent damage to the original data and to use the original distribution, without changes, for training to solve the imbalance problem in the learning process rather than pre-processing. CSL is one of the most widely used algorithm level approaches to solving imbalance problems [10], [23], [24]. CSL is a method of minimizing the overall expected cost by allocating different misclassification costs to each class [10], [25], and the misclassification cost are usually determined by domain experts. In the past decades, CSL has received great attention as a problem-solving method for skewed class distribution [26], and many studies have proved that CSL is effective in addressing class imbalance problems [13], [27]- [29]. Representative studies that applied CSL to existing classification algorithms include cost-sensitive kNN [30]- [32], cost-sensitive SVM [33], [34], and cost-sensitive ANN [35], [36]. Although CSL has attempted to solve the imbalance problem by imposing different misclassification costs for each class using the domain knowledge of experts, it is usually difficult to determine the optimal cost for both the majority and minority classes [12], [14]. In addition, in the case of highly imbalanced data, the CSL may be biased towards a minority class given a high cost, and conversely, neglect majority classes, resulting in poor classification performance [37].
On the other hand, the sphere covering method for classifying instances using spheres has been developed as a method of finding prototypes that can represent instances of each class and determine a radius that evaluates the area that can be covered by the prototypes. The class cover problem was introduced in [38]. The class cover problem is to find a small number of sets covering, i.e. containing, points from one class without covering any points from the other class. Greedy sphere covering [39] used the class cover catch digraph to solve this problem and applied it in classification. In other words, the spatial area that can be covered by the prototype was regarded to be the distance to other class data located at the shortest distance using the nearest neighbor rule, which was set as the radius of the sphere. In addition, an instance containing as much data as possible was selected as a prototype. Interpretable prototype selection [40] has translated the optimization problem to select a minority of prototypes, including all possible training data, to a set cover optimization problem. To this end, a greedy algorithm was introduced that independently selects a prototype for each class. Randomized sphere cover (RSC) [41], [42] randomly selects the center (prototype) of the sphere from the training data and constructs a sphere with the radius as the shortest distance between the selected center and another class instance. A classification model consisting of a set of spheres constructed by repeating this process was introduced. This method constructs many spheres by repeatedly selecting centers randomly and allows α instances of the same class in each sphere. It is intended to improve classification accuracy by classifying test instances using spheres that only cover instances of the same class. However, in the case of binary classification, although the aforementioned spherical classification methods have been developed under the premise that the class distribution is balanced and is suitable for increasing the accuracy, minority class instances cannot be accurately classified in imbalanced distribution since the number of minority instances is small, thereby decreasing the number of minority spheres. In other words, they are not suitable for the classification of imbalanced data as in the existing traditional classification methods.
The aforementioned methods have two drawbacks: first, additional parameters need to be set in advance, and second, the learning objective and the evaluation measure are inconsistent. In particular, the second drawback is a serious problem in machine learning. Sampling rates need to be determined in advance for resampling methods, and misclassification costs need to be determined in advance for CSL methods. In general, since the exact values of these parameters are unknown, a number of previous studies have tried different values, and the classifier with the best performance among the classifiers thus trained was selected based on Gmean, F1-score, or area under ROC curve (AUC), which is a measure used to evaluate a classifier for imbalanced data. At this moment, there is a discrepancy between the learning objective and the evaluation measure. This study proposes a new method in which the evaluation measure for classification performance evaluation of imbalanced data and the learning objective of the classifier are consistent in solving these problems. The proposed method is similar to the conventional RSC where the classification model is expressed as a sphere. However, there is a significant difference in the fact that the learning objective is set as AUC ROC and is designed to maximize its value during the learning process of the sphere, which is the main idea of this study. The contributions of this study in the research field of imbalanced data classification are as follows. First, a novel sphere-based classifier for the classification of imbalanced data is proposed. Second, the learning objective and evaluation measure for the classification of imbalanced data are matched as AUC. Third, the proposed algorithm has the advantage that the user can train the classifier by controlling the value of the true positive rate (TPR) and the false positive rate (FPR) according to the application problem.
The remainder of this paper is organized as follows. The RSC, which is the basis of the proposed classifier, and the classification performance evaluation indicator of imbalanced data will be reviewed in Section II. The algorithm of a learning objective controllable sphere-based classifier (LOCS) proposed in this study will be described with an illustrative example for ease of understanding in Section III.
The proposed method will be tested with 25 real data sets, and its performance will be compared with the conventional methods in Section IV. Finally, this study will be concluded in Section V.

II. BACKGROUND
This section briefly describes the RSC classifier and the basic performance evaluation measures used in the imbalanced data classification as background information to explain the proposed method. As in other literature, "majority" and "negative", and "minority" and "positive" have been used interchangeably hereafter.

FIGURE 1.
A set of spheres constructed by RSC from a binary class data. In this example, α was set to three. A constructed sphere thus contains at least three instances.

A. RANDOMIZED SPHERE COVER
The RSC is one of the sphere covering methods introduced in [41], [42]. RSC constructs sphere B i from training data D = , where x i represents a vector of observation i, and y i indicates the class of the ith instance. The sphere B i has a specific class C Bi and consists of a center c i and a radius r i . Therefore, the sphere is defined by the following 4-tuple, [38]. The radius r i of the sphere B i is defined as the distance between the center c i and the closest instance which class is different from the class of c i , and it is given below.
The αRSC algorithm uses an input parameter α, which represents the number of minimum instances within a sphere, which means that the sphere is not constructed if the number of instances included in the sphere is less than α when constructing a sphere. Informally, the training process of αRSC is as follows: Repeat the process below until all training data is covered or discarded.
1) Randomly select an instance regarded as a center and add it to the set of covered instances. VOLUME 4, 2016 2) Find the closest instance that has a different class with the instance selected as the center. 3) Set the distance between the closest instance and the center as the radius of the sphere. 4) Construct a sphere with the center and the radius. 5) Find all instances within the sphere in the training data. 6) If the number of instances inside the sphere is greater than α, add all of them to the set of covered instances and the store sphere details (center, class, and radius). Otherwise, discard the instances.
The detailed pseudo code is described in [42]. Through the above training process, αRSC constructs the set of spheres, and for example, spheres are constructed as shown below in Fig. 1.
Through the set of spheres constructed during the training process, new instances are classified in the prediction stage according to the following rules.
• Classification rule 1: The test instance, which is covered by a sphere, takes the target class of the sphere. If there is more than one sphere of different target class covering the instance, the instance will take the target class of the sphere with the closest center. • Classification rule 2: In the case where an instance is not covered by a sphere, the classifier selects the closest spherical edge.
Classification rule 2 is reasonable as it can classify test instances -mainly outliers -in areas not covered by spheres [41]. It is better for the spheres constructed in RSC to contain as many instances as possible, as this can lead to an increase in accuracy during the prediction stage. However, in the imbalanced data in which the number of minority instances is less than that of the majority instances, the number of spheres of the minority class constructed in the training stage will be less than that of the majority class. In addition, the fact that the number of minority class spheres is small implies that the probability that new minority class instances will be covered in the sphere in the prediction stage is reduced compared to the majority class instances, thereby degrading the classification performance. Therefore, to mitigate the minority class from being overwhelmed by the majority class, this research proposes to extend the radii of the constructed spheres to ensure that the AUC, an evaluation measure suitable for the classification of imbalanced data, is maximized. In other words, the evaluation measure and the learning objective will be matched by setting the evaluation measure, AUC, as a learning objective for the training of the classifier. This will induce the influence of the minority class spheres to increase. After completion of the training, the two classification rules of reasonable RSC will be applied as they are in the classification step.

B. EVALUATION MEASURES FOR BINARY CLASSIFICATION
In binary classification, the measure for evaluating the predictive performance of the classifier is generally computed based on the confusion matrix in Table 1. In this matrix, true positives (TP) is the number of positive instances classified correctly, false negatives (FN) is the number of positive instances classified incorrectly, false positives (FP) is the number of negative instances classified incorrectly, and true negatives (TN) is the number of negative instances classified correctly. The accuracy defined in (2), below, using a confusion matrix, refers to the proportion of the total data that is correctly classified and used as a general performance measure of a classifier.
However, in the case of imbalanced data, the accuracy cannot properly express the performance of the classifier as the positive class is overwhelmed by the negative class. For example, the accuracy of the classifier is 95% even if all instances are predicted as negative class for data composed of 5% positive class and 95% negative class. However, it can be said that if a classifier cannot detect positive class instances at all, then it is completely ineffective. Therefore, the classification performance needs be evaluated by two measures rather than one measure in the imbalanced data classification problem, and the two measures defined in (3) and (4) below are one of the most used pairs of measures.
TPR refers to the rate at which the classifier was correctly classified as positive class instances, also referred to as sensitivity. FPR refers to the rate at which the classifier was incorrectly classified as negative class instances. When evaluating the classification performance of imbalanced data, it is generally necessary to consider both TPR and FPR at the same time so that the receiver operating characteristics (ROC) graph makes it possible to organize and visualize the performance of the classifier [43]. The ROC graph can visualize the performance of the classifier based on the two indicators by plotting TPR and FPR on the vertical and horizontal axes, respectively. The area under this ROC curve is the AUC. Although the AUC calculation is complicated in the case of a soft classifier, the AUC value in the case of a hard classifier is defined below by (5). In the end, in the imbalanced data classification, the AUC makes it possible to evaluate the performance of the classifier as a single indicator instead of considering both TPR and FPR indicators at the same time.
The AUC has been widely used as an indicator for evaluating the performance of classifiers in the classification of imbalanced data [12]. However, few studies have used this indicator as a learning objective for the training of a classifier [14]. As mentioned above, this study proposes a classifier training method that AUC itself as a learning objective.

III. PROPOSED METHOD
In this section, LOCS proposed in this study is described. As conventional classifiers were designed to increase the classification accuracy, it was recognized that they were weak in the classification of imbalanced data. Therefore, the main idea of the proposed algorithm is to directly set the AUC, which is a classification performance evaluation indicator in the imbalanced data classification, as an objective function to induce the classifier to maximize this and to design a spherebased classifier that allows users to control the objective function according to their intentions by assigning different weights to TPR and FPR at the same time.
denotes the radius of a sphere and a set of instances in the sphere 1: Find min To construct a sphere in the training process of the classifier, a center and a radius are required, and a class representing the constructed sphere needs to be specified. Sphere class evidently follows the center class. New data can be classified according to classification rules through this set of spheres. In the same context as the previous studies, the radius of the sphere is defined as the distance from the center to the closest instance that has a different class from the center. In addition, the instances within the radius of the sphere are defined as covered instances. More formally, the radius r = min where c denotes the center, y c denotes the class of the c, and D denotes the dataset. Therefore, all covered instances within the radius have the same class as the center. The process of constructing a sphere when an instance x is selected as the center is as shown in Algorithm 1.
Spheres are constructed to cover as many instances as possible, ultimately to construct as few spheres as possible, and to reduce the computational time of the subsequent Algorithm 4. When an instance is selected as the center, the radius of the sphere to be constructed and the number of instances to be covered are reviewed to determine the center of the sphere that can cover most of the instances in the current situation, as described in lines 5-11 of Algorithm 2. Note that Algorithm 1 is invoked in Algorithm 2. If there are many covered instances inside the constructed sphere, this means that the class has a high density. The aim of constructing a dense sphere is to improve the classification performance of the final-trained classifier. As shown in lines 12-16 of Algorithm 2, even though a sphere containing as many instances as possible is constructed, the sphere is no longer constructed if the number of instances in it is smaller than the predetermined minimum sphere size α. Otherwise, it stores the data of the constructed sphere, center, and radius, and the instances covered by the constructed sphere are excluded from the training set T . This process is repeated until the algorithm termination condition is satisfied.

Algorithm 2 PreCreateSpheres
if |D i | > maxCardinality then 8: The process of constructing a set of spheres in which target class cl is positive is described through an illustrative example. The binary and imbalanced data containing 17 positive instances and 60 negative instances in which the two classes are divided into overlap states are shown in Fig.  2. All positive instances become candidates for the center to construct the first sphere. Instance A is the one with the largest number of instances covered by the radius of each central candidate. Therefore, the first positive sphere with instance A as the center can be constructed in a dotted line, as shown in Fig. 2 (a). As these instances covered by the constructed sphere are excluded from consideration, the VOLUME 4, 2016 center of the sphere containing the next largest number of instances becomes instance B, as shown in Fig. 2 (b). In this example, it was assumed that the minimum size of sphere α is set to three. As spheres with the cardinality of three or above can no longer be constructed, no more spheres are constructed after the construction of two spheres. Fig. 2 (c) shows that the center of the third sphere is instance C, and as two instances are covered, the sphere is not constructed.
The LOCS proposed in this study is described in Algorithm 3. As the spheres for cl = + are constructed with the PreCreateSphere() function in Algorithm 2, the same process is repeated for cl = − as shown in line 4 of Algorithm 3. In other words, spheres for negative class are constructed. However, for the purpose of preventing too many negative class spheres from being constructed, the minimum number of instances that the sphere needs to cover is computed as β = α × IR, which is the value obtained by multiplying α by imbalance ratio (IR) (line 3 of Algorithm 3). In the example of Fig. 2, since α=3, IR=3.5, the minimum number of instances that will be covered by the sphere that will be used when constructing negative class spheres is β=11.
It should be noted in advance that the LOCS() function calls the PostExpandSpheres() function described in Algorithm 4, in which the EvaluateFitness() function described in Algorithm 5 is invoked. Likewise, the PredictClass() function described in Algorithm 6 is invoked in the Eval-uateFitness() function. After constructing the initial sphere for both positive and negative classes, the sphere will be expanded to maximize the AUC of the sphere classifier when the training is completed. This corresponds to line 5 of Algorithm 3. After line 4 of the LOCS() function has been completed, instances of the same class are covered in each of all spheres. Consider the sphere D in Fig. 2 (d). When there are instances of the same class as the sphere outside the sphere, it has been observed that there is an opportunity to increase the TPR by expanding the sphere and incorporating the instances into the sphere. This also means that there is an opportunity to increase AUC. In other words, in the case of imbalanced data with severe overlap, if there are class instances such as spheres outside the sphere but around the sphere boundary, expanding the sphere and incorporating them into the sphere can be better for increasing AUC. Fig.  2 (d) shows two positive spheres of class D and E. In the case of a positive sphere D, five new instances around the sphere can be covered if the radius is extended to D . In this case, although FPR is increased by misclassifying three negative instances, an increase in TPR by covering two more positive instances will result in an increase in AUC. On the other hand, in the case of positive sphere E, there is no reason to expand the radius because there are only negative instances around it even if the radius is expanded. Consider the case where Sphere D is not extended to D , but sphere E is extended to further cover the closest positive instances. In this case, the loss to FPR is greater than the gain from TPR due to the negative instances covered by the addition of sphere E. In other words, whether to expand the sphere, if so, how far to expand it will be determined using the AUC for the training set, and as AUC is a suitable measure for evaluating performance in imbalanced data classification as mentioned above, this is a reasonable approach to setting the learning objective.
The abovementioned is implemented in Algorithm 4. The genetic algorithm (GA), which is one of the most widely used evolutionary algorithms, is employed to find the new radius of all spheres that can maximize AUC. In the process of fining the optimally expanded radii of spheres in terms of AUC, lines 1-3 are intended to secure feasible regions. The lower bound of a solution is set to the radii of the given spheres be-fore expansion r = (r + 1 , . . . , r + a , r − 1 , . . . , r − b ), and the upper bound set to the distance from the farthest instance among instances with the same class as the center of each sphere to the distance to the center (m + 1 , . . . , m + a , m − 1 , . . . , m − b ). As shown in lines 4-12, Algorithm 4 precisely follows the general processes of GA, such as fitness evaluation, selection, crossover, and mutation. The updated set of spheres (S + * , S − * ) returned by this algorithm has the same centers as the set of spheres (S + , S − ) used as input but has different radii.
Generate an initial population ranging from lower to upper 5: while number of generations is less than maximum number of generations do 6: Compute the fitness of each individual as radii of S + and S − using EvaluateFitness with w 1 , w 2

7:
Order the population and perform selection process 8: Perform crossover with crossover probability 9: Perform mutation with mutation probability and (lower, upper) 10: Update the population for next generation 11: end while 12: Select the best individual corresponding to the best fitness 13: Update S + and S − by replacing their radii with the best individual 14: Let S + * and S − * denote the updated sets of positive and negative spheres 15: return (S + * , S − * ) The process of evaluating the fitness of a solution in GA is described in Algorithm 5. The modified AUC (mAUC) of Equation (7) is computed using the sets S + and S − , TPR importance w 1 , FPR importance w 2 , and the training set T , which are the inputs of the EvaluateFitness() function. In the case of w 1 = 1, w 2 = 1, it is the same as the AUC of Equation (5). w 1 and w 2 will be described in detail in Section III-C.
To compute AUC, the prediction classŷ needs to be computed based on the given sphere sets. This is performed by the PredictClass() function described in Algorithm (7),

Algorithm 5 EvaluateFitness
Input: a set of positive spheres S + , a set of negative spheres S − , training set T = {(x i , y i )}, TPR importance w 1 , FPR importance w 2 Output: evaluated fitness value(mAU C) 1: Let y = (y i ) 2: Letŷ = (ŷ i ← PredictClass(x i , S + , S − )) 3: Compute mAU C in Equation (7) using y,ŷ, w 1 , w 2 4: return mAU C which is the Classification rule 1 and Classification rule 2 mentioned in Section II-A. In brief, the class of the test instance follows the class of the spheres if the instance is only covered by the spheres of the same class; it follows the closest center class if covered by different classes of spheres, and it follows the class of the sphere with the closest distance to the sphere face if it is not covered by any sphere.

Algorithm 6 PredictClass
Input: test instance x, a set of positive spheres S + , a set of negative spheres S − Output: predicted class labelĉl 1: if x is covered by a sphere in S + then 2:ĉl ← + 3: else if x is covered by a sphere in S − then 4:ĉl ← − 5: else if x is covered by spheres in both S + and S − then In this study, the proposed LOCS method sets AUC as the learning objective of classification. In fact, this works not only for imbalanced data classification but also for cases where the class distribution is balanced. This is because the AUC measure itself is insensitive to class distribution [14]. There are different degrees of class imbalance, and there is no absolute standard as to how much imbalance can be said to be imbalanced or balanced. Setting the AUC as the objective function for training the classifier works for both balanced and imbalanced data as the AUC approximates the accuracy as the number of positive instances TP+FN becomes closer to the number of negative instances FP+TN as shown in Equation (6). Therefore, if the classifier is trained to maximize the AUC, the advantage is that the degree of imbalance when training the classifier need not be considered as it can be used not only for the classification of imbalanced data but also for balanced data. Empirical evidence will be provided through an experiment in Section IV-B.

C. IMPORTANCE OF TPR AND FPR
In real-world problems, the importance of TPR and FPR may be different for each domain. For example, a precise classification of the positive class may be more important than misclassification of the negative class in the diagnosis of diseases in the medical field or for the detection of defects in the manufacturing industry. Another advantage of directly setting the AUC as an objective function is that although the AUC is composed of the same importance of TPR and FPR (1:1), the importance within the objective function can be freely controlled by assigning different weights such as (2:1) and (3:1) to TPR and FPR as needed by simply modifying the AUC. As shown in Equation (7), mAUC that assigns priorities w 1 and w 2 to TPR and FPR, respectively, is suggested as a learning objective of the proposed LOCS. As mentioned above, in the case of w 1 =1, w 2 =1, the mAUC is the same as the previously known AUC of Equation (5). Note that, with w 2 =1, although the importance of TPR can increase and the TPR may increase correspondingly as w 1 is increased to a number greater than one, the FPR may also increase, and the AUC value of Equation (5) may decrease. With the proposed method, users can appropriately control the decision boundary of the classifier to suit their domain.
The decision boundary of the model generated by the proposed classifier was observed through a graphical example to examine whether the proposed method works as designed. The decision boundary of the proposed classifier and conventional classifiers is as shown in Fig. 3. The graphic example data was imbalanced data, with the number of positive and negative classes being 60 and 300, respectively. The positive class of the data consisted of three sub-clusters, and three datasets were prepared by designing such that 0%, 25%, and 50% of the number of positive instances overlap with the negative class. 'a%' means that 60 × (a/100) positive instances are located in the negative class region. For comparison of decision boundaries, kNN (k=5) and decision tree (CART) were selected as conventional classifiers. In addition, in the objective function of the proposed LOCS, the importance of TPR and FPR varied from w 1 :w 2 =1:1 to w 1 :w 2 =3:1 so as to observe how the decision boundary changes.
According to Fig. 3 (a) and (d), it was found that conventional classifiers (kNN, CART) classified well even in imbalanced data in areas where there was no overlap between the two classes. However, conventional classifiers could not properly classify positive classes as overlaps between classes increased when comparing Fig. 3 (b)(e) and (c)(f). However, in the case of LOCS, the decision boundary widened even in the class overlapped area to maximize the objective function mAUC as shown in Fig. 3 (g)(h)(i) where the overlap gradually increased. From another point of view, the change in the decision boundary based on the importance of LOCS is as shown in Fig. 3 (g)(j)(m), (h)(k)(n), (i)(l)(o). In particular, the decision boundary did not change significantly even when the TPR importance was increased because there was no overlap, as shown in Fig. 3 (g)(j)(m). On the other hand, the LOCS took a decision boundary towards the positive instances to bear the loss of FPR and increase the TPR in regions with severe overlap as the importance of TPR in the objective function increased, as shown in Fig. 3 (i)(l)(o). Therefore, to train the classifier by varying the importance of true positive in a specific situation, the decision boundary of the model can be precisely varied by adjusting the TPR importance of LOCS.

IV. NUMERICAL EXPERIMENTS
In this section, the performance of the proposed classifier LOCS was verified through experiments. The main aim of this experiment is to examine the design intent of the proposed method for various distributions of real-world data sets. The various environments of experiments conducted in this study are described in Section IV-A. The effect of maximizing the AUC for balanced data on the increase in accuracy was examined in Section IV-B. The performance of the proposed classifier LOCS was verified by comparing it with the conventional methodologies for imbalanced data classification such as resampling methods and CSL methods in Section IV-C. The importance of TPR and FPR within the mAUC of LOCS was varied as mentioned above to examine whether the performance changed as intended based on various real data sets in Section IV-D.

A. EXPERIMENTAL SETTINGS
The total experimental environments are described in this section. The characteristics of binary class real data sets to be used in the experiment, that is, IR, class description, total number of instances, number of attributes, number of positive instances, and number of negative instances, are as shown in VOLUME 4, 2016 Table 2. All datasets were obtained from UCI data repository [44] and KEEL data repository [45] and sorted in ascending order based on IR. The experimental data were selected by considering the various number of instances (214-5,820), attributes , and IRs (1. 1-87.8). In this experiment, five data with less than two IR (#1-#5) were defined as balanced data, and 20 data with two IR or more (#6-#25) were defined as imbalanced data to be used in the experiment. For the datasets with originally more than two classes, we chose the classes with fewer instances as the positive class and integrated the other classes as the negative class. Therefore, the classification task involved discriminating specific classes from the others. In addition, all data were normalized. The benchmarking classifiers for comparison were random sphere cover (RSC) [42], classification and regression tree (CART) [46], and support vector machine (SVM) [47]. Random over-sampling (ROS), random under-sampling (RUS), SMOTE [15], adaptive-SMOTE (AS) [48], under-sampling using sensitivity (USS) [49], and clustering and densitybased hybrid (CDBH) [50] were used as resampling methods to solve the imbalance problem, and the cost-sensitive CART (CS-CART) and cost-sensitive SVM (CS-SVM) were used as CSL methods. SMOTE used k=5, and the kernel of SVM was a radial basis function. Although LOCS was designed to maximize AUC, G-mean and F1-score, which are given below, with AUC were used for performance evaluation.
F1-score = 2T P 2T P + F P + F N Fifty models were trained after assigning one to the majority instance and different values from 1 to 50 to the minority instance as misclassification cost for CSL. Among them, the cost with the best performance according to AUC, G-mean, and F1-score was respectively selected as the misclassification cost for the minority. For the parameter α of the spherebased classifier RSC and the proposed method LOCS, the highest performance accoridng to the evaluation measures selected its value from 1 to 10 with consideration of various data sizes. The crossover probability, the mutation probability, and the population size, which are the GA parameters in the LOCS, were set to 0.8, 0.1, and 200, respectively. The 10-fold cross validation was applied in all experiments, the accuracy of Equation (2) was used for balanced cases, and the AUC of Equation (5), G-mean, and F1-score were used for imbalanced cases to evaluate the experimental results.

B. BALANCED CASES
The performances of classifiers for relatively balanced data were compared. The accuracy of CART, SVM, RSC, LOCS, and AUC results of LOCS for data #1 to #5 with less than two IR are summarized in Table 3. In other words, LOCS (Acc) is the accuracy of the classifier trained with the proposed algorithm, and LOCS (AUC) is the AUC performance of the same classifier. For the five data, although there was a slight difference in accuracy performance for each of the four (a) phishing data. (b) pima data. classifiers, it showed that there was no significant difference between the AUC value and the accuracy value of the LOCS that maximizes AUC. It experimentally supports the fact that the proposed algorithm works well for balanced data because the AUC maximization approximated the accuracy maximization as the number of positive instances approximated the number of negative instances, which implies as IR approximated one as the aforementioned theoretical analysis. The robustness of the proposed method for changes in IR was compared with the conventional sphere-based classifier RSC through an experiment. A comparison of the accuracy and the AUC of RSC and LOCS by removing 20 and 10 positive instances from each phishing data (#1) and pima data (#5) repeatedly is shown in Fig. 4. As a result of repeatedly removing positive instances until no positive spheres were constructed, the IRs of the two data increased to 14.04 and 3.62. For both data, the accuracy of RSC increased as IR increased, while the AUC value of RSC decreased. This is because the classifier was trained mainly on negative classes to increase accuracy as the IR increased as expected. On the contrary, as the influence of positive instances gradually decreased, the AUC value continuously decreased. In contrast, LOCS was similar to the RSC in terms of both accuracy and AUC when the IR of data was low, and the AUC performance remained robust even when the IR increased. As a result, it was found that setting the objective function as AUC in LOCS was effective in classifying both imbalanced and balanced data.

C. IMBALANCED CASES
The performance of LOCS was compared with representative methods of two approaches, resampling approaches and CSL approaches, to solve the imbalanced classification problem. To this end, 20 imbalanced datasets ranging from glass0 (#6) to wine-quality-red-8 (#25) were used. The comparison of AUC, G-mean, and F1-score performances of the resampling methods and LOCS is summarized as shown in Table 4, Table 5, and Table 6, respectively. ROS, RUS, SMOTE, AS, USS, and CDBH were used as comparison methods, and RSC classifier was used as the base learner of each comparison method. The data number and IR, the rank of the performance in the right column of each method, and the average rank in the last row are as shown in the tables In addition, the results of the method with the highest performance for each data are emphasized in bold.
The AUC results are shown in Table 4. Although one of the resampling methods showed better results than LOCS in #6, #8, #9, #10, #11, #13, #14, #16, #18, #22, #23, #24, and #25 data, LOCS showed the best result in all other data. The average rank of all data showed that LOCS ranked 2.4, ROS, RUS, SM, AS, USS, and CDBH ranked 5.45, 3.9, 3.8, 4.8, 4.15, and 3.5, respectively. To make further comparisons, we conducted statistical analyses of the experimental results by adopting the Wilcoxon Signed-Rank test [51]. From the bottom of Table 4, we can see that LOCS has significantly better performances than others because the p-value are very small.
Likewise, the G-mean results and the F1-score results are shown in Table 5 and Table 6 respectively. The G-mean results are very similar to the AUC results. According to the average rank, the best performing method was LOCS, which was followed by CDBH, RUS, USS, SM, AS, and ROS in sequence. The Wilcoxon Signed-Rank tests supported the significant better performance of the proposed method. However, the F1-score results, which are shown in Table 6, were different from the AUC and G-mean results. The best performance came with SM which average rank was 2.3. LOCS was comparable with ROS, AS, and CDBH, which can be seen from the insignificant p-values of the Wilcoxon Signed-Rank tests. These are due to that F1-score evaluates VOLUME 4, 2016    T P and F P , i.e. number of instances, while AUC and Gmean evaluate the rates T P R, F P R, and T N R. Although F P is small, F1-score is small if T P is not large enough. This is why the F1-scores tend to decrease as IR increases, which can be seen from Table 6.
A demonstration of the AUC values of ROS, RUS, SM, AS, USS, and CDBH subtracted from the AUC values of LOCS in each data from the results of Table 4 is shown in Fig. 5 (a). If the difference is greater than zero, it implies that the performance of LOCS was better. Except for some data, the AUC value difference was found to be greater than 0 in most of the data, and in particular, the AUC difference was found to be greater in the data with a large IR. The same demonstrations for G-mean and F1-score are shown in Fig.   6 (a) and Fig. 7 (a) respectively. The G-mean graph provides the same interpretation with the AUC case, whereas the F1score graph shows that LOCS was worse than RUS, SM, and AD for several datasets. Considering that the proposed method was designed to maximize AUC, we believe that these results are reasonable.
The results of comparing LOCS with CSL for the same 20 imbalanced data are shown in Table 7, Table 8, and Table 9, where each table corresponds to AUC, G-Mean, and F-score, respectively. CART and SVM, which are widely used, were selected as the base learners of CSL. As mentioned above, the classifier was trained by fixing the misclassification cost of the negative class to one and varying the misclassification cost of the positive class from 1 to 50. Among 50 classifiers, However, as the FPR values also increased to 0.373, 0.521, and 0.558, the AUC values decreased to 0.711, 0.687, and 0.682. This means that FP increased as the TPR importance increased to classify positive instances more accurately. The increase in the TPR value as the TPR importance increased was also found in data #6, #7, #10, #11, #13, #14, #15, #16, #19, and #22. In some cases, a further increase in AUC was found as the degree of increase in FPR was relatively small compared to the degree of increase in TPR with an increase in w 1 . The TPR value did not change even when w 1 was increased in the case of #9, #12, #17, #18, #20, #21, and #24 data. On the other hand, it was found that the FPR value gradually increased and the AUC value gradually decreased.

V. CONCLUSION
In this study, a new classification method, called LOCS, was proposed to solve the imbalanced data learning problem based on a sphere-based classifier. The proposed algorithm was designed to set the AUC, which is a widely used evaluation measure in imbalanced data classification, as a learning objective to construct a sphere classifier that maximizes this parameter. The advantage of the proposed method is that it can be applied regardless of the degree of class imbalance because the closer the two classes are to the balanced state, the closer the AUC is to accuracy. In addition, it can be modified appropriately for the application domain and utilized by setting different importance levels for TPR and FPR. The effectiveness of LOCS was verified in numerical experiments based on 25 real datasets. The best performance was shown in 13 out of 20 imbalanced datasets in comparison experiments with conventional resampling approaches, and the best performance was shown in 12 out of 20 datasets in comparison experiments with conventional CSL methods. The CSL took a long time to learn because different misclassification costs had to be tried, while the proposed algorithm showed satisfactory performance with single learning. It was found that LOCS can produce robust classification results, even with changes in class distribution through experiments that varied in IR. In addition, experiments have shown that the proposed algorithm can be effectively used in practical domains by controlling the importance of TPR and FPR. Further studies are required to extend the proposed method to multi-class problems, because this study was limited to binary classification. A promising future research direction is to employ, instead of AUC, another metrics such as G-mean and F-scores as learning objectives in the proposed learning framework.