Fuzzy and SVM Based Classification Model to Classify Spectral Objects in Sloan Digital Sky

The Sloan Digital Sky Survey (SDSS) comprises about one billion objects classified spectrometrically. Because astronomical datasets are so enormous, manually classifying them is nearly impossible—a huge dataset results in class imbalance and overfitting. We recommend a framework in this research study that overcomes these constraints. The framework uses a hybrid Synthetic Minority Oversampling Technique + Edited Nearest Neighbor (SMOTE + ENN) balancer. The balanced dataset is then used to extract features via a non-linear algorithm using Kernel Principal Component Analysis (KPCA). The features are then passed into the proposed Int-T2-Fuzzy Support Vector Machine classifier, which uses a modified type reducer and inference engine to achieve more precise categorization. Using the Sloan Digital Sky Survey dataset and a number of evaluation metrics, the SMOTE+ENN model’s performance is measured. The research shows that the model does a good job.

cal data and objects. Astronomers begin the categorization 22 process by carefully scanning the dataset and categorising 23 The associate editor coordinating the review of this manuscript and approving it for publication was Mouloud Denai . them into likely quasars, stars, and galaxies. Transients like 24 asteroids, gamma rays, and supernovae that appear for a 25 concise volume of time in space can also be found in imaging 26 data. Many challenges arise when processing these data with 27 a large number of bands, including image calibration noises, 28 spatial distortion, and restricted or unbalanced labelled train-29 ing samples, i.e., Hughes phenomenon and dimensionality 30 reduction-related artefacts such as overfitting, redundancy, 31 spectral variability, loss of significant features between the 32 channels, and so on. 33 Significant efforts are invested in investigating the idea of 34 applying Machine Learning (ML) techniques that automate 35 the knowledge discovery process and astronomical informa-36 tion extraction within these massive unprocessed datasets, 37 methods for automatic classification of star spectra, such as 94 χ2-minimization and Artificial Neural Network (ANN), are 95 proposed [10]. Singh et al. [11] describes a rapid and reliable 96 method for identifying an optical stellar spectrum library 97 ranging from O-to M-type stars. To automate the classifi-98 cation process, the technique uses two tools: (a) Principle 99 Component Analysis (PCA) to reduce the data dimensional-100 ity; and (b) a Multilayer Back Propagation Network (MBPN) 101 that relies on ANN for automation of the classification 102 process. 103 The ANN method using a backpropagation based super-104 vised learning algorithm was used to categorise Calgary's 105 Infrared Astronomical Satellite (IRAS) spectra in the area 106 of 8 µm to 23 µm, which contains 2000 bright sources [12]. 107 Bora et al. [13] uses an ANN for star classification. The 108 training set used is synthetic spectra in the ultraviolet (UV) 109 area range of 1250-3220Å and the International Ultraviolet 110 Explorer (IUE) of the low-resolution test set. Bazarghan and 111 Gupta [14] proposed a Probabilistic Neural Network (PNN), 112 which automatically classified approximately 5000 SDSS 113 spectrum into nearly 158 reference library spectral types 114 ranging from O-to M-type stars. 115 The Support Vector Machine (SVM) is the most common 116 classification method used in ML and data mining. Scien-117 tists focus more attention on SVM and suggest several new 118 improvements. For multitask learning, the proximal SVM is 119 used [15]. Datta and Das [16] presents the Near-Bayesian 120 Support Vector Machine (NBSVM) to handle the problem of 121 unbalanced classification. Liu et al. [17] proposed Ramploss 122 Non-Parallel SVM (RNPSVM), a nonparallel hyperplane that 123 is sparse and resilient [18]. Nonparallel SVM [NPSVM], 124 a new non-parallel classifier, is proposed. SVM can also be 125 widely used in astronomical research, particularly in the field 126 of automatic spectral categorization. SVM is employed to 127 categorise spectra using the dimension reduction approach 128 PCA [19]. In another method, ISOMAP was used to reduce 129 the number of dimensions, and SVM was used to classify star 130 spectra [20], [21]. 131 ML is used extensively in cosmology and also in astro-132 physics [22]. A non-exhaustive list of applications includes 133 (i) supernova photometric classification [

203
First, the total number of observations for oversampling N 204 is determined. In general, it is selected so that the distri-205 bution of binary classes is 1:1; however, it can scale back. 206 Iteration starts by choosing a random class instance that is 207 positive. Then, the k-Nearest Neighbor (KNN)'s value (by 208 default 5) is obtained for such cases. Lastly, N of those K 209 occurrences were selected to create synthetic models through 210 interpolation. The difference between the feature vector and 211 its neighbours is calculated using a distance metric. Such 212 variation is then multiplied by any number in the range (0, 213 1), and summed up with an earlier feature vector. It can be 214 depicted graphically as below:

215
Although the above method is beneficial, it does have a few 216 limitations.

217
a) The generated synthetic instances point in a similar direc-218 tion and are linked with artificial lines connecting diago-219 nal models. As a result, the generated decision surface by 220 some classifier algorithms becomes more complicated. the separation of the classes is more evident and briefer.

262
The following explains the SMOTE-ENN process:

263
Step 1: From the minority, the class selects random data.

264
Step 2: Find the distance between the randomly generated 265 data and also its KNN.

266
Step 3: The difference is multiplied by the random values 267 0 and 1, and the result is added to the synthetic 268 sample of the minority class.

269
Step 4: Repeat until the appropriate proportion of the minor-270 ity class is reached (Step 2-Step 3).

271
Step 5: The nearest neighbours are determined as K.

272
Assume K as Step 3 if K cannot be estimated.

273
Step 6: Calculate KNN for the class of observation from 274 the remaining dataset observations, and then, from 275 KNN, return the majority class.

276
Step 7: If a class of observation and the majority class KNN 277 differ, the statement and KNN are eliminated in a 278 dataset.

279
Step 8: It is repeated until the necessary proportion of every  of data represented by 2 nd -order correlations that change 291 linearly or come from a Gaussian distribution. The variations 292 of the accurate data, on the other hand, are widely known to 293 be non-linear as well as highly non-Gaussian; correlations 294 of 2 nd -order could not represent the majority of the data. 295 As a result, if PCA is used, it would give a bad performance. 296 Here, for our work, we propose ''KPCA'', a modified PCA 297 approach that is non-linear and depends on functions of the 298 kernel by inherently constructing a mapping from input space 299 to feature space F, which is non-linear via non-linear transfor-300 mation ( ) as well as achieves PCA that is linear in feature 301 space F. Among the two input samples, say (x, y), that is in the 302 primary universe, it is possible to avoid non-linear mappings, 303 and by using the Kernel Function (KF) given below, we can 304 calculate the dot products in feature space: The KPCA approach conceptual structure is depicted 307 schematically in Fig. 2  Mercer's theorem constantly satisfies the polynomial and 320 the radial-based kernels, but the sigmoid kernel only helps 321 it for specific β 0 , β 1 values (Equation 2). For its better per-322 formance, the radial basis function is frequently used as a 323 KF in KPCA; hence, the radial-based kernel is used as the 324 KPCA-KF in this research (Equation 3). The radial basis 325 function is often used as a KF in KPCA because it works 326 VOLUME 10, 2022 better. Because of this, the radial-based kernel is used as the Providing an input data set (with a '0' mean X (x 1 , . . . , x N ) 329 ∈ R m where N is the number of samples and m is the 330 measurement variables dimension) and the covariance matrix 331 calculated by the PCA and KPCA algorithms, such as (i) PCA 332 covariance and (ii) KPCA covariance, Equation 5 within a 333 linear feature space F rather than a non-linear input space: and Equation 6 336 where it is assumed that Equation 7 Here,

351
Equation 10 can be rewritten as the kernel eigenvalue 352 problem: of the kernel matrix. When reconstructing input data from 357 feature space, we use Equation 11. into HD feature space, where hyperplanes can be separated.

370
The location of categorised items within the separated hyper-371 planes determines the output of a classifier.

FIGURE 3. Linearly non-separable hyperplane and margin.
We provided a dataset S with labelled training points, 373 Equation 12 where the training point is denoted by a vector x i , the label 376 is also denoted by a vector y i , and the number of samples is 377 denoted by 'N'.

378
Vector x i is allotted to any of the two classes, which are 379 denoted with <Class Label> yi ∈ {−1, 1}. A hyperplane 380 can be optimally positioned in the middle, separating the two 381 classes. Data points closest to the margin serve as the foun-382 dation for such a definition and are referred to as ''Support 383 Vectors'' (SV).  Misclassification penalization, ξ i ≥ 0, is proportional 387 to the distance between the misclassified point of x i and 388 canonical hyperplane restricting its class. Objective functions 389 associated with margin maximization are denoted by Equa-390 tion 13 and Equation 14: C is weighted to account for classification errors. During 395 classification errors that are unavoidable due to the linearity 396 of the separating hyperplane, minimization of the objective 397 function (1) with constraint (2) offers the maximum possible 398 margin. By arranging the Lagrange function, the optimal 399 hyperplane is initiated. The Lagrange function for the primary 400 problem is as follows: Equation 15: where α i ≥ 0 and µ i ≥ 0 are Lagrange multipliers. The 405 primaeval problem is expressed as Equation 16: In this situation, first-order conditions are revealed in 408

426
As the last term is 0, the first level dual problem results in 427 Equation 20: The following is the initial decision function: Equation 21: computed for unbounded SV is assumed; equation 23: where U denotes a collection of unbounded SV indices.

439
SVM solves the classification problem by mapping the 440 inputs 'x into a HD space by mapping non-linear features 441 φ(x) separated by complicated decision boundaries in the 442 input space. Because of this, the problem becomes a situation 443 in the feature space that can be separated in a linear way.

444
x T i x j substituting just a scalar product by KF, The Gaussian kernel is the most common, and its most 453 common definition is Equation 26 KKTT is significant to the SVM's development. Accord-458 ing to the theorem, the answer must meet the following 459 requirements:

Equation 27 and Equation 28
imply that only non-zero 463 values 'α i ', meet the requirements. SVs are the values 'x i ' that 464 corresponds to the solution 'α i '. When 'x i ', it corresponds to 465 α i = 0 and a sufficient distance from the decision margin, the 466 instance is appropriately classified.

467
In order to build the best possible hyperplane ω · z + b, 468 we would require that Equation 29 The scalar bias b should be calculated using the KKTT con-471 ditions. The decision function can hence be obtained from 472 Equation 30 and Equation 31 as follows: where sgn(·) is the sign function that determines the sign 475 (+/−) of a real value. Since we lack data for feature space 476 of higher dimension ϕ(·), the calculations in EQU (31) are 477 impractical because of their complexity. A beneficial feature 478 of the SVM is that it does not require determining'ϕ(·). 479 Complexity is resolved using a KF that can compute data 480 points as dot products in the 'z' feature space. Before these 481 functions can be used to figure out the dot products, they must 482 prove Mercer's theorem.
Here, K x i , x j = ϕ (x i ) · ϕ x j KF is used for mapping 485 onto a feature space of a higher dimension. KFs can be either 486 VOLUME 10, 2022 The Decision function can finally be illustrated as Equation

493
34:  Nonetheless, there is a need to simplify and optimise the 541 classification of unambiguous MFs in this space. IT2FLS 542 practices are used in a wide range of science and engineering 543 fields due to the increased practicability within the compu-544 tations. If the MF position cannot be determined precisely, 545 the degree of membership cannot be taken as a fixed range 546 of (0, 1), and Type-2 fuzzy sets are the best option. If all 547 A are assigned to their distribution, the Type-2 3-D FL-MF 548 specifies the formation of Type-2 fuzzy set features. The 549 Footprint Of Uncertainty (FOU) is defined as a union of pri-550 mary memberships bounded by the upper and lower Type-1 551 MF, referred to as upper MFμÃ (x) and lower MF µÃ (x). 552  uncertainties in a fuzzy system. Because a type-2 FS's FOU 588 adds a dimension of mathematics, type-2 FSs are likely to 589 outdo their counterparts of type-1. Contradict to the Type-1 590 example, in which the grade of membership is a single value, 591 the membership grade of Int-T2-FIS is a range. Int-T2-FIS is 592 limited at the two extremities of the period to yield LMF and 593 UMF, both of which are fuzzy sets of Type-1.

594
The construction of the Int-T2-FIS details the relationship 595 between input and output. The Int-T2-FIS is made up of 596 five primary modules: (1) Fuzzifier; (2) Fuzzy Rules; (3) 597 Inference Engine; (4) Type Reducer; and (5) Defuzzifier. The 598 output unit of an Int-T2-FIS is made up of 2 blocks: (a) type-599 reducer and (b) defuzzifier. Because fuzzy settings activate 600 the rule basis, rather than numbers, in the fuzzifier block, 601 crisp inputs are initially converted to FS. Once measurements 602 are excellent, input is preserved as a crisp data set in the 603 fuzzification step; once the measurements are chaotic but 604 stable, input is represented as a Type-2 fuzzy interval set. 605 A set of fuzzy inputs is mapped onto fuzzy outputs with the 606 help of a fuzzy inference engine after the input has been 607 fuzzified. This is accomplished by quantifying every rule 608 using the fuzzy set theory and then applying the mathematics 609 underlying the theory of the fuzzy set to produce an output 610 favouring every rule. The fuzzy inference block's result now 611 has one of many sets from fuzzy production. With the help 612 of output processing units, the fuzzy output collections are 613 turned into crisp output.

614
Provided an Int-T2-FIS with n inputs x i ∈ X i , . . . , x n ∈ X n 615 to produce a single output ∈ Y . This Int-T2-FIS 's rule base 616 is made up of K IT2 fuzzy rules, written as follows: Equation 617 36 618 R k : If x 1 isF k 1 and · · · and x n isF k n THEN y isG k (36) 619 k = 1, . . . , K ,F k n andG k , epitomizes Type-2 fuzzy sets. 620 VOLUME 10, 2022 The Karnik-Mendel (KM) iterative approach using the center 622 of sets is a prominent type-reducer. Unfortunately, such a 623 type-reduction approach is mathematically demanding, espe-624 cially when many MFs have a considerable rule base. Fig. 8  The inference engine is responsible for applying the inference 675 rules to the fuzzy input and producing the output. The infer-676 ence rules, in particular, are engaged in assessing linguistic 677 values and mapping them to fuzzy sets, which then need 678 defuzzification to be transformed into crisp values. Inference 679 rules that give the system's calculation functionality are one 680 of the primary principles of the Mamdani method [47]. The inference engine's fuzzy output is mapped to a crisp 708 value that gives the exact fuzzy set representation during 709 defuzzification. In this proposed fuzzy methodology, the crisp 710 production is generated by employing the centroid method, 711 which is defined below, Equation 40: The centroid approach determines a single scalar value by 714 using the centre of mass, denoted as z, in the distribution of 715 fuzzy output. The fuzzy set membership is represented by u c , 716 while the membership value is presented by z j .

718
The SDSS DR14 data collection is used in this study. The 719 SDSS is one of the largest spectroscopic surveys, having 720 begun observations in 1998 and completing three phases. 721 SDSS-IV, the fourth phase, is already in progress [49]. The 722 tral lines, as described by [51]. We increased the sample of 738 the spectra earlier for this purpose, resulting in 5748 points 739 for each spectrum. After that, each spectrum was normalized 740 by dividing it by its average value between 4250 and 5150Å.

741
To minimize the dimensionality of the data array, we used   study, which is to differentiate between astrophysical objects 776 because there are three types (stars, galaxies and QSO). 777 Since the hyperplane can only tell the difference between two 778 classes, more SVMs are needed if there are more than two 779 classes.

780
As in Fig. 10, the block of Int-T2-FSVM can be reproduced 781 and utilised to segregate the unique objects separately. We can 782 recommend three Int-T2-FSVM blocks for identifying three 783 classes [52]. 784 1. Int-T2-FSVM1 can tell the difference between the phases 785 of a star and a galaxy. A label of ''−1'' means that the data 786 is from the star class, and a label of ''1'' means that it is 787 from the galaxy class. 2. Int-T2-FSVM2 can tell the difference between the Star 789 and Quasar classes. An input data label of ''−1'' means 790 that the data fits the Star class, and an input data label of 791 ''1'' means that the data fits the Quasar class.   shape.

817
As represented in Fig. 11   A defuzzification technique may then be used to obtain Int-832 T2-FSVM k's output k. A rule-based class determiner would 833 make the final class selection.

835
There are various features in the SDSS dataset (Tab. 2). The 836 following are the features required to make a classification in 837 our work [54].

838
• RED SHIFT: Redshift is the essential attribute that dis-839 tinguishes quasars. Quasar's distance is calculated by its 840 redshift, a measurement by which the universe's expan-841 sion stretches the wavelength of its light before reaching 842 Earth. The greater the redshift, the greater the distance; 843 the further back in time, astronomers view the object.

844
• RIGHT ASCENSION: The eastward angular distance 845 of a particular location is measured along the celestial 846 equator from the sun at the March equinox to the (hour 847 circle of the) place in the question above the earth. This 848 attribute can be derived from the image table.

849
• When combined with right ascension, declination is an 850 astronomical coordinate system that indicates the point 851 location on the celestial sphere in an equatorial coordi-852 nate system.

854
The measures we use to evaluate the performance of the 855 classifiers are discussed now.
In conjunction, ''Recall'' is represented as TPR, indicating 929 completeness. An AUC can be used to summarise the perfor-930 mance of a classifier. It takes a value between 0 and 1. 1 is 931 the value an ideal classifier brings, and an average classifier 932 takes the value of 0.5.

933
We present the results of the unrefined proposed model in 934 Tab. 3. The results are compared with and without the use of 935 SMOTE + ENN for all the metrics; the results show that the 936 model performance to correctly predict the class label is get-937 ting better by using SMOTE + ENN to balance the data. The 938 results are comparable with other existing models in terms of 939 all the metrics. The adoption of KPCA as the feature extrac-940 tion scheme reflects greater efficiency as the adopted model 941 proves its credibility by effectively reducing the dimension 942 of the dataset. The SDSS dataset we chose proves to be a 943 difficult platform for our proposed classification model [55]. 944 The proposed model's training and validation accuracy is 945 displayed (Tab. 4).

946
It is common for many classification models to generate 947 poor representations of the labelled data for datasets that 948 provide a thinner training set than the generalisation task 949 requirement. But the ''SMOTE + ENN'' effective balancing 950 model proposed in this research work helps solve this prob-951 lem, as shown by its ROC in Fig. 12.

952
Following the training and testing of the proposed model 953 and observing the accuracy of training and loss, we can 954 conclude that the model performed well since the training 955 VOLUME 10, 2022   accuracy is more than 97% after 30 epochs and the training 956 loss is relatively low, as shown in Fig. 13. A high gener-957 alisation model prevents overfitting and gives useful results 958 when dividing astronomical image data into real and fake 959 objects [52], [53], [54], [55]. 960 Because the two major classes in our data (real and non-961 real objects) are similar in size, we considered accuracy 962 and recall to be the most important performance metrics in 963 our solution and benchmark model (Fig. 14). Accuracy is role in future astronomical surveys. Fuzzy-based approaches 969 seem to be as good as, if not better than, human scanners in 970 this sector. However, unlike astronomers, they can categorise 971 thousands of transients in a single second. Unlike traditional 972 ML algorithms, Int-T2-FSVM does not involve the creation 973 of sophisticated and case-specific features. Fuzzy SVMs use 974 simple data augmentation during training to come up with 975 abstract features for categorising on their own.

976
DL models, particularly the proposed Int-T2-FSVM, are 977 critical for future astronomical sky surveys like the SDSS. 978 In contrast to human scanners, deep models can produce 979 continuous-valued classification certainty ratings that can be 980 tweaked for maximum recall and precision. Furthermore, 981 they can handle the enormous data throughput generated by 982 the different sky surveys.

984
Most previous research work related to this paper uses stan-985 dard supervised learning techniques to achieve the goal of 986 automatic classification. The ML categorization of SDSS 987 transient survey images is a baseline model for the proposed 988 work. The same dataset was used in this research study, but 989 several learning techniques were used, including (i) Random 990 Forest (RF), (ii) k-Nearest Neighbors (k-NN), (iii) Adaboost, 991 (iv) Support Vector Machine (SVM), (v) Easy Ensemble and 992 (vi) Naïve Bayes (NB). The same dataset was used in this 993 research study, but several learning techniques were used, 994 including (1) RF, (2) KNN, (3) NB, and (4) SVM. And then 995 match their performance using the same measures using DL-996 CNN and compare the proposed work to the past work. In the 997 very different image data (g, r, I, z, u), they should also use 998 the PCA algorithm to pull out features like shape, location, 999 FWHM, and objects near a local object.

1000
Our proposed model uses KPCA as the feature extraction 1001 model and the recommended Int-T2-FSVM classifier. The 1002 benchmark model achieved the results shown in Fig. 15, and 1003 it is evident that none of the other models improved more than 1004 our proposed model.

1006
In this section, we go over potential threats to our experi-1007 ment and how we mitigated them. Validity assesses whether 1008 ing ''SMOTE+ENN''. The balanced dataset is subjected to 1052 ''K-PCA'' for feature extraction. The extracted features are 1053 fed to the proposed classifier ''Int-T2-FSVM''. The model 1054 employs an enhanced type reducer and inference engine to get 1055 better accuracy in classification. The experiment results show 1056 that the proposed model produces better accuracy and preci-1057 sion for the SDSS dataset when compared to other existing 1058 models.