Fusion of Orthogonal Moment Features for Mammographic Mass Detection and Diagnosis

Masses are mammographic nonpalpable signs of breast cancer. These masses could be detected using screening mammography. This paper proposed a system utilizing orthogonal moment invariants (OMIs) features for mammographic masses detection and diagnosis. In this work, three sets of OMIs features were extracted. These OMIs features are Gaussian-Hermite moments (GHMs), Gegenbauer moments (GeMs), and Legendre moments (LMs). The extracted features are fused and presented to the particle swarm optimization (PSO) algorithm for feature selection. The classification step is achieved using the support vector machine (SVM). The proposed system is evaluated using 400 regions, extracted from the DDSM dataset. The obtained results reveal the promising application of OMIs features for masses detection and identification. It shows that fusing the OMIs features produces an acceptable detection performance where the area under the receiver operating characteristics (ROC) curve is <inline-formula> <tex-math notation="LaTeX">$Az=0.969\pm 0.01$ </tex-math></inline-formula> and the best performance of OMIs features is <inline-formula> <tex-math notation="LaTeX">$Az = 0.856\pm 0.053$ </tex-math></inline-formula> for characterizing the malignancy of masses.


I. INTRODUCTION
Breast cancer is the most frequent cancer among women worldwide. It is impacting 2.1 million women each year. In 2018, it is estimated that 627,000 women died from breast cancer. It is approximately 15% of all cancer deaths among women [1].
Early detection of breast cancer is the key to reduce mortality rates. The treatment is easy to handle when breast cancer detected at an early stage, whereas late detection decreases treatment options.
Mammography is an effective tool that helps in the early detection of breast cancer. It has the ability to detect the abnormality before physical symptoms appear. Regular screening may help to detect breast cancer in its early stage, before it is developed into a systemic disease and may even The associate editor coordinating the review of this manuscript and approving it for publication was Yakoub Bazi . invade other body parts. The most common symptom of breast cancer is a lump or mass in the breast. Mammogram interpretation is a difficult task even for skilled radiologists due to the overlapping of subtle signs of breast abnormalities and tissue. Computer Aided Detection/Diagnosis (CAD) systems were developed to assist the radiologist in pining out the suspicious regions in mammograms. Therefore, there is a pushing need for developing CAD systems to identify and classify mammograms in order to reduce the high missdetection rate [2].
The CAD system is consists of four steps, namely, image segmentation, feature extraction, features selection, and classification. The image segmentation step is used to identify the region of interest (ROI). Feature extraction aims to calculate the features that able to determine whether the ROI is normal or abnormal. Feature selection is the step that identifies the high significant features that able to distinguish between different classes. The classification step is to determine the VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ ROI class according to the presented feature vector. The overall performance of any CAD system is affected by the achievement obtained from each different step. The feature extraction step is a significant step in developing CAD systems; however, the complexity of breast tissues makes it is difficult to find the prominent features that able to distinguish between different classes [3], [4]. Several attempts have been done to overcome the challenging issue of extracting an efficient set of features for mammogram classification [5]- [8]. The studies indicated that the feature space of ROIs is vast and complicated because of the wide diversity of healthy tissues and the abnormalities. The use of excessively many features may degrade the classification methods' performance and increase the classifier's complexity. Therefore, the extraction of features is an essential task for classifying breast tumors. Moreover, with the advances in information technology, the problem size and search space increase exponentially for any image processing problem.
Dahabi et al. [8] applied the t-test to rank the curvelet and moment features according to their capabilities [4]. Their results indicated that curvelet moments are efficient and useful for breast cancer diagnosis.
Gardezi et al. [9] presented a feature extraction method based on decomposing the mammogram image using a curvelet. They calculated the grey level co-occurrence matrix (GLCM) of the curvelet. They reported an accuracy of 88.6%, the sensitivity of 76.53%, and specificity of 91.3%.
Jiang et al. [10] proposed a CAD system to define the masses in digital mammography using scale-invariant shift transform (SIFT) features. They reported an accuracy of 86.9%. Eltoukhy and Faye [6] applied a statistical method to maximize the distance between two different mammographic classes. Their proposed method was applied over wavelet and curvelet coefficients. The reported results showed an accuracy of 91.19% classification rate.
Metaheuristic algorithms have been growing for optimization problems. These algorithms are driven by simulating the behaviors of natural phenomena. They have the capabilities of extracting information from a set of features and often generate the best features in practice. Therefore, several algorithms have been developed to address optimization problems. Among them, some metaheuristic search algorithms with population-based frameworks such as GAs, simulated annealing (SA), ant colony optimization (ACO), and PSO [11], [12] have proved adequate capabilities for managing high-dimensional optimization problems.
Ramos et al. [13] proposed a system to classify mammographic images as normal or mass. They compared the performance of Haralick, wavelet, and ridgelet features. A genetic algorithm (GA) was employed to select the prominent features, and the random forest was used for the classification task. Their results demonstrated that wavelet-based features using GA achieved an area under curve (AUC) = 0.90.
Rouhi et al. [14] proposed to use Zernike moment features to identify the benign mass from malignant. GA is used as a feature reduction method to improve the accuracy and decrease the computational cost. They concluded that multilayer perceptron (MLP) classifier is promising compared with the existing methods.
Zyout et al. [7] proposed a false positive reduction algorithm, including the PSO model selection, to define an appropriate set of features from wavelet and GLCM features. SVM based classification method is used to determine the class of the suspicious region. Their results recommended using PSO to reduce the false-positive rate.
Chen et al. [15] proposed to develop a CAD scheme building an initial feature pool containing four different groups of features. Next, a particle swarm optimization (PSO) algorithm was applied to select optimal features so that redundant features can be removed from the feature pool. Finally, an SVM was used to determine either the case is benign or malignant.
Other meta-heuristic algorithms have been used to solve different problems, including mammographic mass segmentation [16]. Khehra and Pharwaha [17] developed a comparison study between three different metaheuristic methods, namely GA, PSO, and biogeography based optimization (BBO), to select an optimal set of features from 50 features. In their work, they reported a 91% classification rate is using PSO-SVM. The classification accuracy rate has been achieved with half of the total features.
From the literature, it is concluded that metaheuristic algorithms have great potential for selecting an optimal subset of features from a set of features extracted from mammograms. It has been observed that these biologically motivated metaheuristic algorithms perform better than the classical optimization approaches. The main advantage of these algorithms is domain-independent nature. Secondly, these algorithms could find optimal or near-optimal solutions in an ample search space.
On the other hand, a group of studies focused on the fusion of texture features of mediolateral oblique (MLO) and cranial-caudal (CC) views of mammograms to improve the performance of mammography CADx. Several features extraction techniques were used including local binary patterns (LBP) [18] and [19], binary Gabor patterns (BGP) [20], K-Gabor filters [21], gray-level co-occurrence matrix (GLCM), Law's texture, GRLCM, and gray level difference method (GLDM) [22]. The feature vector obtained using either serial or parallel approach were further engineered and improved by applying principle component analysis (PCA), Genetic Algorithm (GA), Particle swarm optimization (PSO), binary bat algorithm (BBA), binary Firefly algorithm (BFA), and canonical correlation analysis (CCA). The performance of LBP fused features was further processed, and the performance improved after applying feature selection based on Firefly and optimum-path Forest classifier [19]. In [20], BGP texture features were found more efficient than LBP features. For classification, classical machine learning algorithms based on K-nearest neighbor, support vector machines and optimum-path forest classifiers were applied. Reported results, from the DDSM (screen-film) and INbreast (digital full-field) mammography, showed that the fusion of textures of both mammographic views had improved the performance compared to pertinent studies.
Wang et al. [23] fused 20 deep features from CNN with five texture features, five morphological features, and seven density features of the 400 mammographic mages. Patil and Biradar [24] applied a hybrid deep classifier based on a combination of convolutional neural network (CNN) and recurrent neural network (RNN) namely (CRNN). They extracted the Grey level co-occurrence matrix (GLCM) and gray level run-length matrix (GLRLM) features. Their system produced an accuracy of 90.59%. The deep convolutional neural networks (DCNN), indeed, surprisingly solve many image classification problems. The fantastic trait of DCNN could be the seamless approach for feature engineering and pattern classification.However, the screen-film mammography data, available for the research community, is limited. We believe that the use of a classical machine learning approach by focusing on the feature extraction through cascade fusion and embedded feature selection remains an adequate approach for successfully solve the mammography CADe and CADx tasks.
Eltoukhy et al. [25] utilized exact Gaussian-Hermite moments with K-NN, random forests and AdaBoost classifiers. Their system is evaluated using mammographic image analysis society (MIAS) [26] and image retrieval in medical applications (IRMA) [27] datasets. The obtained accuracies are 90.56% for MIAS and 93.27% for IRMA. The successful utilization of orthogonal Gaussian-Hermite moments [25] motivated the authors to fuse three kinds of orthogonal moments, GHMs, LMs, & GeMs, for extracting the fine features from the input images. The orthogonal moments, GHMs, LMs, & GeMs have the following attractive characteristics: 1) These moments are orthogonal, which means their ability to represent digital images with minimum information redundancy. 2) These moments are invariant with respect to rotation, scaling and translation, which enables the computerbased systems to discriminate between similar images, whatever their orientation, location and distance to the camera. 3) These moments are computed with highly accurate methods that reflect the accuracy of extracted features from the input images. 4) These moments are robust against the well-known kinds of noise.
Thus, in this study, we hypothesized that it is possible to identify and fuse the global and local features computed from the ROIs of mammographic masses. These moment-based features could produce a high-performance CADe/CADx. Besides, applying the PSO could reduce the requirement of an extensive training dataset as the conventional deep learning approach. Thus the objective of this study is to analyze the combination of three moment-based features to find out the best set of features that have the capability to distinguish between different mammographic masses, Either normal or abnormal, and the abnormal class is distinguished into benign or malignant.
Feature Fusion is applied to combine the advantage of different moments features and enable the extraction of coarse and fine features from the input images. The idea is to combine the substantial information of several moment features to ensure that no details are lost. Hence, extracting different texture descriptors from each image will produce a good representation of the processing image. We proposed combining three-moment features to investigate the performance of their combination because of the increasing need to integrate different moments features. In the following we summarize the contributions of this work: 1) We exploit the behavior of three distinct moments features for breast mass detection CADe and breast mass diagnosis CADx, where each feature type has its characteristics. A further goal of an advanced fusion method is to increase the classification accuracy rate by combining different features sets. 2) A more comprehensive system based on diverse characteristics of various moment features produced. The obtained results encourage the claim that combing distinct features will gain the advantage of each type and remove their drawbacks. 3) Extracting and identifying an adequate set of features that have high capability to distinguish between the different mammographic images' types, either mass or normal, and then benign or malignant. 4) We are applying embedded feature selection using PSO-SVM to accomplish: hyper-parameter selection, feature dimensionality reduction, and SVM-classifier parameter-tuning and performance optimization.
Orthogonal moments in cartesian coordinates such as LMs, GeMs, and GHMs are defined by multiplying two basis functions in the x-and y-directions. The x-direction basis functions of LMs, GeMs and GHMs are computed for different orders, 0, 1, 2, 3, 4, & 5. For simplicity, the computed basis functions for the order 5 are plotted and displayed in the Figure (1.a), (1.b) and (1.c), respectively, while the fused basis functions are plotted and displayed in figure (1.d). The plotted curves clearly show that both LMs and GeMs are very close and oscillates at a lower frequency and uniformly distributed over the interval (−1 ≤ x ≤ 1) while the GHMs are oscillates at higher frequencies and non-uniformly distributed over the same interval.
This paper proposes a system to identify the mammographic mass from the normal regions, and then distinguish the benign from malignant regions. Particularly, the proposed approach takes advantage of combining the theory of orthogonal moments with the power of the parameter selection based on PSO-SVM algorithms. Three sets of OMIs features, GHMs, GeMs, and LMs, are extracted where the extracted features are presented to an SVM classifier. The proposed system is evaluated using 400 regions obtained from the DDSM dataset [28]. To the best of the authors' knowledge, there is no previous study has applied orthogonal moments feature and PSO-SVM for implementing the detection and diagnosis tasks of mammographic masses.
The rest of this paper is organized as follows. Section II briefly presents the preliminaries of OMIs for feature extraction. Section III discusses the PSO-SVM algorithm. Section IV explains the proposed CAD system. The results are presented and discussed in Section V. Finally, the presented work is concluded in Section VI.

II. ORTHOGONAL MOMENT INVARIANTS
This section presents the preliminaries of proposed orthogonal moment invariants (OMIs) features. Since the images of the DDSM dataset are captured and rasterized using Cartesian pixels, orthogonal moments in Cartesian coordinates are preferable where no need for coordinate conversion nor image mapping. Orthogonal moments in Cartesian coordinates of order (n + m) are: (1) where n ≥ 0, m ≥ 0; C n and C m are the normalization factors; P n (x) and P m (y) are the orthogonal polynomial functions; w(x) and w(y) are the weight function in x− and y− directions, respectively. Table 1 gives a concise and explicit mathematical formula for the utilized three orthogonal moments.
Where (·) and σ are gamma function and the standard deviation and n 2 is: , for even n For moment order, Max, the total number of independent moments is: The Gegenbauer polynomials are generic polynomials with the scaling parameter α > −0.5. Chebyshev polynomials of the second kind and Legendre polynomials are special cases of Gegenbauer polynomials when α = 0.5 and α = 1, respectively. Pawlak [29] shows that the scaling parameter enabled Gegenbauer moments to extract both local and global features of an image.

A. ACCURATE COMPUTATION OF ORTHOGONAL MOMENTS
Orthogonal moments as defined in Table 1 are computed accurately by using the following form: where Double integration represented by equation (5) The Equation (4) could be rewritten as follows: where the kernels are defined as follows: Representing the orthogonal polynomials and the weight functions, P n (y j ), P m (x i ), w(x i ) and w(y j ). Using Table 1 yields six kernels in x− and y− directions for computing GHMs, GeMs and LMs, respectively. The kernels, IX n (x i ) and IY m (y j ), are exactly computed by using the principle of Calculus. Details for this computational approach for the three sets of orthogonal moments are presented in LMs [30], GeMs [31] and GHMs [32].
Based on the extreme importance of the invariance to rotation, scaling and translation (RST) in pattern recognition applications. Highly accurate methods were proposed for Legendre moment invariants (LMIs) [33], [34], Gegenbauer moment invariants (GeMIs) [35] and Gaussian-Hermite moment invariants (GHMIs) [36]. Easily use of the OMIs, required converting the 2D matrices of the three kinds of moment invariants into 1D vectors which achieved by the pseudo-code [37].
Each one of the orthogonal moments, LMs, GeMs & GHMs, are used individually to extract the features from the input images. Based on equation (3), the total number of the extracted features is the same where the length of each feature vector is Total. The fusion of the features is carried by combining the feature vectors of three moments for each ROI. The extracted features are fused in one feature vector of the length, 3 × Total.

III. PSO-SVM ALGORITHM
Feature engineering aiming at extracting a few but discriminative features, which is the essential components of any machine learning algorithms, including SVMs. To improve the generalization capacity of the non-linear SVM classifier, hyper-parameters need to be optimized. The SVM hyperparameters include the input features, the kernel type and its control parameters. The straightforward and optimal solution is through the exhaustive or even the grid search to find the optimal parameters. However, such search methods are impractical and the computational complexity is extreme. A suboptimal but efficient alternative of the grid search and exhaustive search is the metaheuristic algorithms such as PSO and GA. Among these algorithms, PSO has been shown to be very efficient with many features; simple structure, easiest to implement, and its ability to avoid local minima. Since PSO algorithm was introduced by Eberhart and Kennedy in 1995, many studies have reported various PSO modifications and successfully used it [7], [38].
Particle Swarm Optimization (PSO) is a biology-inspired metaheuristic search approach. The critical concept of PSO is the swarm intelligence in which members of the swarm collaborate to solve the optimization problem. Each member (candidate solution or particle) of the population belongs to the multidimensional parameter space of the objective function of the problem. PSO algorithm, in particular, accomplish the global optimization task by utilizing both local fitness of individual particles and the experience and fitness so far achieved by the entire population. The fitness of each particle is continuously (each step of the search process) evaluated and its characteristics are updated using the best experience achieved so far by other members in the swarm. Unlike genetic algorithms, PSO uses a velocity operator rather than the cross-over and mutation to control and update the location and search direction of each particle. Also, use a large population helps PSO to escape from local minima. For a better generalization capacity and higher classification accuracy, both classifier's parameters and hyper-parameters (kernel function and feature spaces) need to be optimized. For this purpose, we adopted the PSO-SVM algorithm, an embedded feature selection and model selection approach from [7]. The classifier type and the feature extraction methods determine the dimensionality of the PSO search space to be used for optimizing the performance of PSO-SVM algorithm. The procedure describing the proposed PSO-SVM is presented in Algorithm 1.

IV. THE PROPOSED CAD SYSTEM
This section presented a description of the proposed CAD system. This work is primly applies the OMIs features, first, for classifying a mammographic region into normal and abnormal classes by characterizing the presence of masses in the ROI, and second, for classifying an abnormal region into benign or malignant. The CAD system is consists of four steps, segmentation, feature extraction, features selection, and classification.
In this work, the image segmentation is achieved manually using the given center of the suspicious region as given in the dataset by the radiologists. Extracting and identifying adequate features is a vital step to achieve a high classification performance. For each ROI, three sets of OMIs features were extracted. Each one of the orthogonal moments, LMs, GeMs & GHMs, are used individually to extract the features from the input ROI. The total number of the extracted features for each ROI is 231 features. The fusion of the features is carried by combining the feature vectors of three moments for each ROI. The extracted features are fused in one feature vector of length 693 features.
The extracted features are input to an SVM classifier. Common data partitioning and cross-validation procedures are applied to the validation datasets to obtain training and testing sets that will be used to evaluate the discriminative power of the moment based features.
To further improve the overall performance of the proposed CAD system, we used the PSO-SVM embedded feature selection approach, for both determining the most important features and also for tuning the parameters of the SVM classifier. As for the settings of the PSO algorithm, we followed the work in [7]. The proposed PSO-SVM parameter selection, OMIs features extraction, and SVM based classification methods were all compiled in MATLAB. Figure 2, illustrates the steps of the proposed CADe/CADx system.

A. PSO-SVM SETUP
In this work, the PSO-SVM settings include the fitness criterion based on the Az-value of the ROC curve, the swarm size is set to 100 particles structured as described in Section III, and the maximum number of iteration of 50. Further, the population of the swarm is initialized assuming that each parameter belongs to a random variable that is uniformly distributed Algorithm 1 PSO-SVM Algorithm 1: Initialize population of dimension L = N + 2 is represented as P = [P 1 , P 2 , . . . , P M ] T with size M . The dimension N + 2 consists of the N features to be optimized and the two dimensions are used for tuning the classifier parameters γ and C. Also T denotes the transpose operator; 2: Set two random numbers g 1 , g 2 generated among [0 1], an inertia weight ω, and parameters c 1 , c 2 ; 3: Initialize positions P x = [ P x,1 , P x,2 , . . . , P x,L ] T of each particle P x where x = 1, 2, . . . , M of population; 4: Initialize velocity V x = [ V x,1 , V x,2 , . . . , V x,L ] of each particle V x where x = 1, 2, . . . , M of population; 5: Evaluating the fitness of each particle F i P = f (P i x ), ∀x, and get the best particle of population up to iteration i; 6: Set iteration count i = 1; 7: Comparing the fitness values and select the local best particle Lbest i P = P i x , and global best particle Gbest i = P i ; 8: Compute an inertia weight ω; 9: Update of each particle the position and velocity where the index x = 1, 2, . . . , M and the index y = 1, 2, . . . , L. 10: Evaluate updated fitness F i+1 x ), ∀x (of each particle) and get the best particle; 11: Update Lbest ∀ x (of each particle) 13: If i < max of iteration then i = i + 1 and go to step 7; else go to step 14; 14: Get optimum solution: print the outputs of generation as Gbest i ; 15: Retrain SVM with optimum parameters and features; then identify unknown samples on testing dataset.
in the corresponding search space. As for the control parameter PSO parameters controlling the search process: c1 and c2 were both set to 2, inertia ω monotonically decreased from 1.2 to 0.4 as the number of iterations increased. The PSO search process was terminated either if the maximum number of iterations of 50 was reached or perfect Az-value achieved.

B. PERFORMANCE EVALUATION
For evaluating the discriminating ability of the OMIs features, each dataset was randomly partitioned into training and testing sets. With the two-third of the dataset used for training and the remaining one-third was held for testing. To achieve a better generalization capacity of the C-SVM classifier, 10-fold cross-validation was further applied to the training set to accomplish SVM learning and parameters optimization. VOLUME 8, 2020 Namely, the cross-validation and heuristic search were used to tune the classifier regularization C constant and the radial basis function control parameter γ . With the use of the SVM classifier for solving the two-class classification problem, the classifier's decision value (or the class membership) was used to produce the receiver operating characteristics curve (ROC). The area under the ROC curve or Az-value was estimated. Additionally, we included, for each ROC curve, the corresponding accuracy, false-positive fraction (FPF), and false-negative fraction (FNF). For the final evaluation, we used the average of all trials of the crossvalidation and testing stages.

V. EXPERIMENTAL RESULTS AND DISCUSSION
For evaluating and obtaing the results of the applying the proposed OMIs features to distinguish between abnormal and normal classes (CADe or detection) and to differentiate malignant from benign classes (CADx or diagnosis), we used a set of mammographic regions of interest (ROIs) extracted from the digital dataset for screening mammography (DDSM) [28]. The follwing subsection is briefly describe the used validation datasets.

A. MAMMOGRAPHY DATASETS (DDSM)
The DDSM dataset is the largest, public and free mammography dataset, which is commonly used by the mammography image analysis research community. Two ROI datasets, as described in Table 2, named detection and diagnosis sets were formed. The dataset used for validating the detection task contained 200 ROIs depicting annotated masses and 200 regions representing normal breast parenchyma. Each case, in the DDSM dataset, includes radiologists annotation (a chain code representing the radiologist delineation of the abnormality), which we have used to extract a rectangular patch containing the mass in the center.
On the other hand, normal suspicious regions, each of size 512 × 512 related to normal breast parenchyma, were manually extracted, form normal mammograms, such that a selected ROI does not overlap with pectoral muscle or radiographic background. Moreover, for normal regions were obtained from Normal 11 and Normal 09 volumes of the DDSM dataset. The sample of the regions used are shown in Figure 3. We excluded normal ROIs and formed another dataset, CADx dataset, from the 200 abnormal (100 benign and 100 malignant) ROIs to apply the proposed features to classify benign and malignant masses. The summary of the regions in the detection and diagnosis datasets are described in Table 2.

B. RESULTS OF THE PROPOSED CADe SYSTEM
Three different OMIs features were first used for implementing the CADe system (i.e. classification of abnormal and normal regions). In other words, we have applied the different feature sets, namely, GeMs-features, GHMsfeatures, and LMs-feature for distinguishing between normal breast parenchyma related ROIs and mass depicting (or abnormal) ROIs. Table 3 presents average crossvalidation and test results of applying different OMIs features to the CADe dataset (i.e. the dataset with 200 abnormal and 200 normal ROIs). Obtained results, namely, ROC analysis (Az-value) of SVM classification results show an excellent performance of using various moment features for characterizing the presence of masses. GeMs-features produced average test classification results of 0.929±0.021 (0.935±0.014) which are a little higher than 0.926 ± 0.021 (0.935 ± 0.014) and 0.918 ± 0.021 (0.926 ± 0.011) respectively produced by GHMs-features and LMs-features.
To further examine the adequacy of applying moment features for implementing both CADe and CADx systems, we have examined whether the early fusion (combining) of different feature subsets can boost the classification performance. The results of different feature combinations are also presented in Table 3. The obtained results, from Table 3, show that combining the three sets of features, (GeMs + GHMs + LMs) produced an average classification performance of 0.935 ± 0.022 (0.948 ± 0.013). These results are higher than the best classification performance produced by individual feature sets, namely, GeMs-features. However, the cost of such a small performance improvement was obviously higher dimensionality of the features space of (231 × 3 = 693) compared to 231 features in case of using an individual features subset form classification. We have also combined features pair-wise that led to features space with the dimensionality of 462 in each combination. The combination of GeMs-features and LMs-features, called GeMs/LMsfeatures, produced the highest classification results 0.943 ± 0.027 (0.95 ± 0.013). This is slightly higher than the performance achieved using GHMs/LMs-features. Consequently, the use of either individual feature subset or combined feature subsets is promising for the detection of abnormalities.

C. RESULTS OF THE PROPOSED CADx SYSTEM
We also applied OMIs features for characterizing the malignancy of masses in mammographic regions. Table 4 presents the results of classifying ROIs in the CADx dataset. This dataset consists of regions depicting 200 masses of which 100 are benign and 100 are malignant. The overall average Az-value obtained using various feature sets show that the performance of the proposed features, for implementing CADx systems (i.e. distinguishing between ROI depicting malignant masses and those regions depicting benign masses), is not as efficient as for the CADe systems (i.e. characterizing the presences of masses). Our justification for such a performance of CADx algorithms is that the task of characterizing the malignancy of abnormality, even when done by expert radiologists is much more difficult than accomplishing the detection task. This is mainly due to the obscured and vast variation of the appearance of malignant masses on mammograms.
From the results presented in Table 4, the highest classification performance, in terms of average Az-value, of CADx algorithms using different OMIs features is 0.765± 0.065 (0.798 ± 0.052) which was produced by GHMsfeatures, which slightly outperformed GeMe-features and LMs-features as well. Combining different features subsets, including pair-wise combinations, was not useful and even provided inferior classification performance. For instance, results of combined features, also included in Table 4, show that the ABC-features with 693 features provided an average classification performance of 0.764 ± 0.054 (0.803 ± 0.040). The obtained classification performance, on the test dataset, was almost equal to that produced using B-features (GHMs) and A-features (GeMe) individually applied.
Moreover, obtained classification results demonstrated that different features sets (including combined features) produced FPF results that is significantly higher than FNF   results, which implies that using various CADx system benign masses were misclassified at higher rate than malignant masses.

D. RESULTS OF APPLYING PSO-SVM FEATURE SELECTION
To examine whether applying the feature selection can improve the classification performance of the proposed features, we have adopted the embedded feature selection and classification system based on PSO-SVM algorithm from [7]. Such an approach provided a simultaneous framework for accomplishing the parameters and feature selection tasks such that both dimensionality reduction of the feature space and higher classification performance can be attained. The classification results, after applying PSO-SVM for parameters and feature selection, produced from individual and combined OMIs feature sets are presented in Tables 5 and 6.
For distinguishing between abnormal and normal regions, as shown by results in Table 5, using the feature selection based on PSO-SVM, we were not only able to significantly reduce the dimensionality of the feature space but also we have improved the classification performance. For instance, using individual feature sets, the highest  classification results of 0.933 ± 0.022 was produced from GeMe-features with the average size of the optimized feature space of 88.35 ± 10.69 features (original feature space is 231). When combined feature sets were used for classification, the fused set of features provided the best classification performance of 0.969 ± 0.01 with the feature space reduced from 693 to an average of 177.72 ± 30.55 features, which is almost 25% of the original features set.
For classifying regions with benign or malignant masses with the PSO-SVM approach used for selecting the most relevant OMI features and for performance optimization, results from Table 6 show that applying the PSO-SVM optimization, indeed, reduced the dimensionality but, unexpectedly, degraded the classification performance provided by most features sets. The highest classification performance of 0.817 ± 0.04, however, produced by ABC-features that slightly outperformed previous performance attained without applying the feature selection step. The dimensionality of the feature space, in case of using all features, was reduced from 693 to an average of 275.2 ± 45.473 features, which is almost 39.6% of the total features set.
The high dimensionality of the feature space, namely, the fused space, presented a practical and main computational challenge as it is expected to require larger population swarm size or higher number of iterations to enable the PSO-SVM escape from the local or suboptimal solution. The next subsection presents a comparison study of the proposed system for solving the detection (CADe) and diagnosis (CADx) tasks of masses in mammographic images against state of the art.

E. COMPARISON STUDY
This work proposed to combine the advantages of orthogonal moments features with the PSO-SVM to achieve high performance CAD system. The results of the proposed detection and diagnosis system, as shown in Table 7, are promising and competitive when compared with the state of the art. Moreover, the PSO based SVM classifier was not only able to select the appropriate input features but also optimized the SVM performance. However, the straight comparison of the proposed system results with those from existing systems cannot be direct because of differences, among systems, in terms of datasets used for the training and validation of the VOLUME 8, 2020 algorithm. The community of mammography CAD research mostly used two common and public datasets of screen mammography: the DDSM [28] and MIAS [26] datasets. Another difference is the performance metrics used for the evaluation. For this, we reported our results using both Az-values of the ROC curve and accuracies. In addition to the mammography data. Other differences as shown in Table 7, however, including the machine learning algorithms used for classification and feature selection approach, including the PCA and genetic algorithms.
Considering the key difference in the literature of mammographic mass CAD systems, namely, the validation datasets and for providing meaningful and fair comparison, the focus will be on those studies [13], [14], [40] used DDSM dataset for the evaluation of the CAD systems. For accomplished mammography detection task, the proposed work, by combining PSO-SVM model selection and orthogonal moments had outperformed, in terms of the classification accuracy or the Az-value, existing systems [13], [40]. For instance, the proposed CADe system provided an average accuracy of 93.5±0.01 that higher than 90 achieved form combining GA and random forest classifier [13]. As for accomplishing the CADx task, the proposed system is competing very well with FLDA approach form [40] but inferior to the performance obtained from the GA and MLP system [14].

VI. CONCLUSION
This paper proposed a CAD system combining orthogonal moments for feature extraction and the PSO-SVM for hyperparameter and parameter optimization to accomplish the two CAD tasks. The first task classified regions of interest (ROIs) of digitized mammograms into mass and normal breast tissue regions. The second task focused on characterized the malignancy of mammographic masses. This paper focused on the examining whether the use of OMIs features is adequate or not. Three sets of OMIs features were extracted from mammographic ROI dataset with 200 masses. On the CADe or detection's dataset, the combined OMIs features achieved an average Az value of 0.969 ± 0.01. As for the application of OMIs features for solving the CADx task, the classification performance was good but not as high as for the detection task. However, obtained results show that the potential application of OMIs features for solving the feature extraction task in CADe and CADx systems.