Fuzzy Logic System Application for Detecting SNP-SNP Interaction

The identification of interactions between single-nucleotide polymorphisms (SNP–SNP interactions) is crucial for determining human genetic disease susceptibility. With rapid technological advancements, multiobjective multifactor dimensionality reduction (MOMDR) measurements have achieved high detection success rates. However, the classification of high- or low-risk groups is central to MOMDR and has yet to be extensively studied. To address limitations in binary classification, we propose an improved fuzzy sigmoid (FS) approach that uses membership degrees in MOMDR, thus denoting it as FSMOMDR. For determining the interval of membership, our improved FS approach assesses the distance between the $i^{\mathrm {th}}$ multifactor class and outcome (cases and controls). Thus, the improved FS approach enables MOMDR algorithms to determine the membership degrees of high- and low-risk groups in each multifactor class because the two-element set is extended to a specified membership interval. Moreover, the improved FS approach can handle uncertain information, which thus enables the effective detection of the $m$ -locus combinations with similar distributions. FSMOMDR measurements can also distinguish similar frequencies among genotype combinations, thus enabling the detection of more significant SNP–SNP interactions. On the basis of the classification accuracy rate of MOMDR and results obtained from the analysis of several test data sets, we determined FSMOMDR to be superior to other MDR-based methods with respect to detection success rate. The results indicate that binary and fuzzy classifications involving MOMDR can provide insight into uncertainty in risk classification. Thus, FSMOMDR could successfully detect SNP–SNP interactions in coronary artery disease in a large data set obtained from the Wellcome Trust Case Control Consortium. We could successfully reduce uncertain information in MDR and thus suggest that membership based on the improved sigmoid function can be used to identify SNP–SNP interactions as well as obtain content knowledge.


I. INTRODUCTION
The genome-wide association study (GWAS) has been extensively used to detect associations between complex genes [1]; the approach aims to reveal factors associated with a particular disease. These factors include single nucleotide polymorphisms (SNPs) and other DNA-related The associate editor coordinating the review of this manuscript and approving it for publication was Nadeem Iqbal .
factors. However, if researchers only use a single factor for disease identification in a GWAS, then they do not identify other factors that are significantly associated with a particular disease [2]. With respect to variability associations between complex genes, SNP-SNP interactions may explain the absence of inheritance [3]. SNP-SNP interaction is a major factor for identifying many genetic diseases [4], [5]. Consequently, SNP-SNP interaction detection has become important in multifactorial disease analysis [6]. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Effective calculation is a powerful tool for improving the recognition of SNP-SNP interactions in genetic association research [7]- [9]. Many methods have been proposed for the detection of SNP-SNP interactions. One such method is the Bayesian epistasis correlation map (BEAM) [10]. A BEAM introduces a Bayesian mark partitioning model and a Markov chain Monte Carlo sampling approach to maximize model posterior probability. Another method is AntEpiSeeker, which introduces a two-stage ant colony optimization to detect SNP-SNP interactions [11]. Similarly, Wan et al. introduced a Boolean Operation-based Screening and Testing (BOOST) approach for examining all pairwise interactions in genome-wide case-control studies [12]. Another method is SNPRuler, which introduces a branch and bound algorithm to determine the chi-square-based maximum rule utility metric for detecting SNP-SNP interaction [13]. Similarly, Ritchie et al., introduced a multifactor dimensionality reduction (MDR) method, based on statistical evaluation, for detecting SNP-SNP interaction. MDR can characterize nonadditive interactions between discrete factors in case-control studies [14].
Unlike traditional statistical approaches, such as logistic regression, MDR uses nonparametric and genetic model data in case-control studies. MDR reduces the dimensionality of multifactor information by distinguishing genotype combinations into high-and low-risk groups. This process detects nonlinear or nonadditive interactions between the original variables. MDR uses the k-fold cross-validation (CV) approach to avoid overfitting in MDR-based predictions of disease status. Several MDR-based extension methods have been proposed [15], including MDR-ER [16], particle swarm optimization-based MDR (PBMDR) [17], class-based MDR (CMDR) [18], MOMDR [19], IMDR [20], and the empirical fuzzy MDR (EFMDR) [21]. Recently, MDR has had demonstrably superior implementation in the SNP-SNPinteraction detection of cardiovascular diseases [22], breast cancer [23], and facial emotion perceptions [24].
Let A be a classic binary set; its membership function yields outputs of only 1 or 0, depending on whether x belongs to A. Zadeh [25] proposed fuzzy set theory, representing a class of objects with continuous rank membership.
represents the membership degree of x in the fuzzy set A. Accordingly, µ A (x) is reduced to û, the indicator function I A (x) of set A. A classical set is thus considered a special fuzzy set where the indicator function is a membership function. This fuzzy logic extension of classical set theory has been used in various fields, including bioinformatics and medicine [21], [26]. In the detection of SNP-SNP interactions, classification into high-and low-risk groups is a problem of uncertainty. Accordingly, fuzzy logic is an approximation approach based on the representation of linguistic knowledge. It entails the use of fuzzy rules to address uncertainties [27]- [29]. When fuzzy logic is used, MDR can better distinguish high-and low-risk groups and thus increase detection success rates in SNP-SNP interaction detection processes. Both fuzzy set-based generalized MDR [30] and EFMDR [21] are fuzzy-based MDR approaches. Moreover, FGMDR detects SNP-SNP interaction using fuzzy set-based generalized linear models for improved covariate adjustment. When using FGMDR, selecting suitable parameters is difficult. Thus, EFMDR uses the MDR-based empirical fuzzy set that does not require the selection of suitable parameters. EFMDR is effective for identifying intergene SNP-SNP interactions in particular diseases. Moreover, a quicker version of EFMDR has been proposed [31]. Despite the recently increasing focus and investment of resources in MDR classification, studies on this topic have been limited.
This paper proposes an improved fuzzy logic system that is based on MOMDR to estimate the membership degree for the epistasis detection data set. The fuzzy sigmoid (FS) is favorable for SNP-SNP interaction detection because it allows local features to belong to multiple groups. Many studies have successfully applied the FS to improve algorithm performance [21], [32]- [34]. In particular, FSMOMDR has been used on coronary artery disease data sets. The results were obtained through simulation using a real big data set from the Wellcome Trust Case Control Consortium (WTCCC). The results demonstrated the superiority of FSMOMDR relative to other algorithms with respect to success detection rate.
The remainder of this paper is organized as follows. The relevant approach is summarized in Section II, where we define an MO function based on fuzzy membership degrees and present FSMOMDR. Experimental evaluations and result analyses are provided in Section III. In Section IV, we discuss the advantages of the FSMOMDR algorithm. Finally, Section V concludes this paper.

A. MDR PROCESS
MDR detects SNP-SNP interactions by evaluating each m-locus combination using the distribution of cases and controls [14]. Specifically, the m-locus combination is such that an SNP-SNP interaction is represented by a set {s 1 , . . . , s m | s ∈ SNPs, s i = s j }. Because each SNP contains the three genotypes, an m-locus combination has 3 m genotype combinations. In MDR, each genotype combination is called a multifactor class. A dimension reduction approach is introduced in MDR for converting a high dimension into a 2 × 2 confusion matrix in which the actual class contains cases and controls and the predicted class contains high-and low-risk groups. Subsequently, a k-fold CV operation generates k CV subsets. In each CV operation, a CV subset is used as a testing data set and other k − 1 CV subsets are combined to form a training data set. The purpose of the testing data set is to evaluate the trained model, which is trained using the corresponding training data set. An optimal trained model (denoted as an i-fold CV model where i = 1, 2, . . . , k) is selected according the highest correct classification rate in each CV operation. Thus, the k-fold CV models can be obtained, and the CV consistency (CVC) operation is used to count the occurrence frequency of a fold CV model among k-fold CV models. The model with the highest CVC is regarded as the best model in an MDR implementation. MDR comprises the follow steps: 1) perform the k-fold CV operation, 2.1) generate the training and testing data sets according to the k-fold CV operation, 2.2) generate all m-locus combinations, 3.1) assign cases and controls into multifactor classes, 3.2) calculate the ratio between cases and controls within each multifactor class of the m-locus combination, 3.3) classify all multifactor classes into a high-risk group and a low-risk group, 3.4) evaluate the m-locus combination using the correct classification rate (CCR), 3.5) select the best model with the highest CCR in each CV operation, and 4) perform CVC operation.

B. EFMDR PROCESS
MDR entails the application of binary classification to determine membership to high-or low-risk groups using the frequencies of multiple genotypes in cases and controls. Binary classification methods cannot address uncertainty, which results in the loss of key information [26]. Empirical fuzzy MDR (EFMDR) is an extension of MDR using the empirical fuzzy (EF) approach to address the limitations of binary classification [21]. MDR differs from EFMDR in Steps 3.2, 3.3, and 3.4 in the aforementioned MDR process. In EFMDR, the EF approach is used to evaluate the membership degrees of high-and low-risk groups within each multifactor class [denoted as H(w H ) and L(w L ), respectively] through Step 3.2; however, Step 3.3 of the MDR process is omitted from EFMDR. In Step 3.4, the CCR is evaluated on the basis of H(w H ) and L(w L ). EFMDR comprises the following steps: 1) perform a k-fold CV operation, 2.1) generate training and testing data sets according to the k-fold CV operation, 2.2) generate all m-locus combinations, 3.1) assign cases and controls to multifactor classes, 3.2) calculate the membership degrees of high-risk H(w H ) and low-risk L(w L ) groups within each multifactor class of the m-locus combination, 3.3) evaluate the m-locus combination using the CCR based on H(w H ) and L(w L ) (denoted as CCR fuzzy ), 3.4) select the best model with the highest CCR fuzzy in each CV operation, and 4) perform a CVC operation.

C. MOMDR PROCESS
Using the Pareto set operation, MOMDR was introduced by Yang et al. in 2018 [19]. MDR-based methods use a single classification measure (usually the CCR) as an objective function to detect SNP-SNP interactions, whereas MOMDR uses multiple objective functions. MOMDR introduces the maximized multiobjective (MO) function as follows: where functions f 1 and f 2 are the likelihood rate (LR) [35] and CCR [14] measures, respectively. In the Pareto set oper- for all objective functions, then x 1 dominates another solution x 2 . In the Pareto set X * , others do not dominate each x * ∈ X * . The Pareto set and Pareto set filter operators record and determine nondominated SNP-SNP interactions. For a k-fold CV, the number k of Pareto sets (X * ) is generated in the evaluations of all m-locus combinations. Finally, optimal SNP-SNP interactions can be determined through the CVC operation. The MOMDR comprises the follow steps: 1) perform k-fold CV operation, 2.1) generate training and testing data sets according to the k-fold CV operation, 2.2) generate all m-locus combinations, 3.1) assign cases and controls to multifactor classes, 3.2) calculate the ratio between cases and controls within each multifactor class of the m-locus combination, 3.3) classify all multifactor classes into a high-risk group and a low-risk group, 3.4) evaluate the m-locus combination using the MO function, 3.5) perform the Pareto set operation in each CV operation, and 4) perform the CVC operation.  extends the MOMDR by using an improved FS approach to extend binary classification into a fuzzy classification; it is based on membership degree in the MO measure. FSMOMDR is similar to EFMDR in that both are used to calculate the membership degree in the high-risk and low-risk groups within each multifactor class, H(w H ) and L(w L ). Thus, an MO function, based on the H(w H ) and L(w L ) VOLUME 8, 2020 membership degrees, can be formulated as follows.
where the functions f 1 and f 2 are fuzzy CCR and LR measures, respectively, that are both based on the membership degree. FSMOMDR is illustrated in Fig. 1; it comprises four steps (Algorithm 1) as follows.
Step 1: Perform the k-fold CV operation 1-1: Randomly sort the data set. All cases (samples for a given disease) and controls (samples for the normal population) are randomly shuffled.
1-2: Stratified random k-fold. The ratio between cases and controls is calculated, and k CV subsets comprising cases and controls are generated according to the ratio between cases and controls.
Step 2: Generate the training and testing data and all m-locus combinations 2-1: In each CV operation, a CV subset is used as testing data to evaluate the best model, and other CV subsets are combined to form the training data to determine the best model (defined by the m-locus combination having the highest value of measure).

2-2:
Generate all m-locus combinations. Among all SNPs, all m-locus combinations are generated and assigned to a set.
Step 3: Evaluation of m-locus combination 3-1: The m-locus combination can generate 3 m multifactor classes according to all combinations of genotypes. Each multifactor class contains case and control groups. Each sample in the training data is assigned to a particular multifactor class. When a sample matches a given multifactor class, then it is assigned to the case group if it belongs to a case and to the control group otherwise.

3-2:
The membership degree of the multifactor class is measured using the improved FS approach. In FSMOMDR, each sample can have a partial membership degree for both H(w H ) and L(w L ) groups. The case group to control group ratio within the i th multifactor class is transformed into the interval [−1, 1] by using (3).
where n i1 and n i0 are the sample frequencies matching the i th multifactor class in the case group and the control group, respectively. An improved FS approach is introduced to evaluate the H(w H ) group and L(w L ) group within the i th multifactor class and is formulated as follows: In all multifactor classes, the TP f value is the sum of H(w H ) with frequency n 1 , and the FN f value is the sum of L(w L ) with frequency n 1 . Similarly, both the sum of H(w H ) with frequency n 0 and the sum of L(w L ) with frequency n 0 are the values of FP f and TN f , respectively. The formulae for TP f , FP f , FN f , and TN f are as follows:

3-3:
Consequently, the dimensions of 3 m multifactor classes are reduced into 2 × 2 dimensions by considering the membership degrees for the high-risk and low-risk groups.

3-4:
The m-SNP combination is evaluated using the MO function, which is based on the H(w H ) and L(w L ) membership degrees. LR Fuzzy and CCR fuzzy are calculated on the basis of the two-way contingency table from step 3-3.
1) Objective function 1: LR Fuzzy consists of observed frequencies in the 2 × 2 contingency table, including expected frequencies under the null hypothesis of no association [35]. LR Fuzzy is formulated according to (7).
2) Objective function 2: CCR fuzzy assesses the proportion of correctly classified individuals with an m-locus combination. CCR fuzzy is formulated as per (8).

3-5:
Pareto set operation. The Pareto set operation determines candidates (X * j = (x * 1 , . . . , x * i )) and adds each candidate into Pareto set j, where j ∈ {1, . . . , k} and k is the number of CVs. No candidate in the Pareto set dominates any another. Suppose an m-locus combination x q is currently evaluated in the j th CV and x q is compared with all x * in X * j ; if x q is not dominated by any x * , then it is added to X * j . When x p in X * j is dominated by x q such that f 1 (x q ) ≥ f 1 (x p ) and f 2 (x q ) ≥ f 2 (x p ), then x p is omitted from X * j .
Step 4: All candidates in the Pareto set j, where j ∈ {1, . . . , k}, are evaluated using the j th testing data in the CV. In each CV operation, all m-SNP combinations are evaluated using step 3 to generate the Pareto set. Ultimately, k Pareto sets are obtained, in which each candidate is counted according to its number of occurrences (denoted as CVC) in the k Pareto sets. The highest CVC of all candidates represents the optimal SNP-SNP interactions, in which the medians of the objective values of the testing data are SNP-SNP interaction measures (i.e., Step 4).

A. EPISTATIC MODELS WITHOUT MARGINAL EFFECTS
The data sets were simulated using 40 epistatic models without marginal effects, according to the multilocus penetrance [13]. In the 40 epistatic models, heritability (h 2 ) was set between 0.025 and 0.2 to control the phenotypic variation of the epistatic model. In the data set, the specific target (optimal SNP-SNP interaction) was generated through a minor allele frequency (MAF) of either 0.2 or 0.4 [36], and other SNPs were generated through an MAF uniformly selected from [0.05, 0.5]. GAMETES software was used to generate data sets according to the aforementioned settings [37]. In each data set, only one specific target exists. We randomly simulated 100 data sets in each epistatic model. The detection success rate of an epistatic model was calculated by counting the number of specific targets detected by the algorithm in 100 data sets.
We compared FSMOMDR with AntEpiSeeker [11], BOOST [12], BEAM [10], SNPRuler [13], MDR [16], PBMDR [17], CMDR [18], MOMDR [19], IMDR [20], and EFMDR [21] across epistatic models without marginal effects (Fig. 2). For epistatic models 1 to 10 (h 2 ≥ 0.2), all methods had a strong ability to accurately detect the specific targets in each data set. For epistatic models 11 to 40 (h 2 ≤ 0.1), FSMOMDR exhibited a superior detection success rate compared with those of MDR and EFMDR. However, the performance of FSMOMDR was inferior to that of SNPRuler (for model 34), BOOST (for models 25, 33, 34, and 40), and CMDR (for model 25). The performance of FSMOMDR in 40 epistatic models was evaluated using the Wilcoxon signed-rank test. A p value of <0.05 indicated significantly superior performance of FSMOMDR compared with its ten counterparts. Thus, as indicated in Table 1, FSMOMDR offers superior performance to that of its counterparts. Although p > 0.05 for FSMOMDR compared with BOOST, a trend of superiority was evident when comparing both methods. As for computation time, FSMOMDR took an average of 32.7 s to run a complete process in 40 epistatic models, including 1000 SNPs with 400 samples.

B. EPISTATIC MODELS WITH MARGINAL EFFECTS
The six multilocus penetrances were used to simulate epistatic models with marginal effect (models 1-6) [38]. We used GAMETES software [37] to simulate 100 data sets in each epistatic model, with the MAF evenly set at [0.05, 0.5]. In 100 data sets, the detection success rate was calculated by counting the number of specific targets detected by the algorithm. Fig. 3 illustrates the detection success rates of AntEpiSeeker, BEAM, SNPRuler, BOOST, MDR, PBMDR, CMDR, MOMDR, IMDR, EFMDR, and FSMOMDR in six epistatic models. FSMOMDR was superior to other algorithms in six epistatic models with marginal effects. For the six epistatic models with marginal effects, a Wilcoxon signed-rank test indicated significant superiority in the detection success rate of FSMOMDR relative to that of the other nine algorithms (Table 2). A trend of superiority was evident in comparisons of FSMOMDR with CMDR. Our results suggest that the improved FS approach effectively enhanced MOMDR with respect to considering the uncertainty in the H/L classification of disease sites. As for computation time, FSMOMDR spent an average of 41.1 s running a complete process for each of the six epistatic models, including 1000 SNPs with 400 samples.

C. EXPERIMENT WITH REAL DATA
A real coronary artery disease (CAD) data set from the WTCCC database was used to evaluate the ability of FSMOMDR to detect SNP-SNP interactions. The WTCCC database was constructed in 2005 by 50 British research Detection success rate was calculated as the proportion for 100 data sets in which specific disease-associated epistasis was detected. teams [39]. The CAD data set comprises 23 sub-datasets (chromosomes 1 to 22 and X), with 500569 SNPs in total. Each sample comprised 1988 patients with coronary heart disease and 1500 healthy people in the United Kingdom. All SNPs were genotyped using the Affymetrix gene chip 500 K mapping array. Table 3 displays the FSMOMDR-detected SNP-SNP interactions. SNP information was determined by dbSNP at the National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov/snp/). Any SNP not on a gene was labeled ''UNKNOWN.'' The chromosome had more than one SNP-SNP interaction, according to its multiobjective characteristic. We used raw data sets to evaluate the significance level of an SNP-SNP interaction, with the p values obtained from a chi-square test (χ 2 ). Among the 23 chromosomes, all SNP-SNP interactions detected by FSMOMDR yielded a p value of <0.0001, indicating a significant SNP-SNP interaction. The CVC indicated the degree to which optimal SNP-SNP interaction was detected across a 5-fold CV, with CVC = 5 reflecting the highest degree [40]. In MDR, a high CCR value (>0.5) potentially reduces the frequency of chance [41]. A high LR value can reduce uncertainty in the disease model. High CCR and LR values indicated a strong contrast between cases and controls. As presented in Table 3 Table 3 lists the duration of operation of FSMOMDR for the WTCCC data set. We noted that the duration of operation increased proportionally with the number of SNPs.

IV. DISCUSSION
MDR was demonstrated to be a nonparametric approach for detecting nonlinear interaction between SNPs. MDR transforms the multifactor nonlinear combination from a high dimension to a low dimension. This may be an explanation for why 3 m multifactor genotypes could be transformed by binary classification into 2 × 2 contingency tables to improve the evaluation of SNP-SNP interaction [14]. Binary classification determines membership to high-or low-risk groups using the frequencies of multiple genotypes in cases and controls. However, binary classification may result in the loss of key information due to uncertainty [26]. Assuming that the balanced data set (i.e., where the classification threshold is 1) in the 2-SNP combination consists of nine multifactor genotypes, it is divided into a high-risk group (dominance ≥ 5.5) and low-risk group (dominance < 5.5). For a low multifactorial genotype, the membership degree is 2.5. MDR cannot distinguish between the two multifactorial genotypes. Although other studies have explored the shortcomings of MDR [21], [26], research in this field remains limited.
We determined the effectiveness of FSMOMDR by evaluating its performance in several epistatic models compared with the performance of other algorithms. For 40 epistatic models without marginal effects, FSMOMDR outperformed AntEpiSeeker in 39 models, BEAM in 39 models, SNPRuler in 36 models, BOOST in 19 models, MDR in 28 models, PBMDR in 39 models, CMDR in 21 models, MOMDR in 37 models, IMDR in 26 models, and EFMDR in 28 models. Furthermore, for six epistatic models with marginal effects, FSMOMDR outperformed AntEpiSeeker, BEAM, SNPRuler, MDR, PBMDR, MOMDR, IMDR, and EFMDR in all six models and outperformed BOOST and CMDR in five models. The results indicate that FSMOMDR has superior detection ability to AntEpiSeeker, BEAM, SNPRuler, BOOST, MDR, PBMDR, CMDR, MOMDR, IMDR, and EFMDR. Regarding the core principles and theoretical advantages of FSMOMDR, our algorithm handles uncertainty information through the FS approach, and fuzzy logic enables MOMDR to assign two membership degrees in each multifactor class because the two-element set {0, 1} is extended to the membership interval [0, 1]. For the interval [0, 1], our improved FS approach assesses the distance between the i th multifactor class and outcome (cases and controls). This effectively improves the membership degrees of high-and low-risk groups in each multifactor class. This enables the effective detection of the m-locus combinations with similar distributions, thus enabling the detection of more significant SNP-SNP interactions. Jung et al. [26] introduced the original fuzzy-based MDR. The limitation of fuzzybased MDR lies in its selection of the sigmoid function's parameters. Leem et al. introduced an EF function without such parameter selection to overcome the limitations of fuzzy-based MDR [21]. FSMOMDR used a ratio of cases to controls to map any region to the interval [−1, 1]. This strategy can reduce imbalance between the cases and controls [16], [42]. Thus, FSMOMDR does not select the parameter value of the fuzzy set. Moreover, in the 2 × 2 contingency table, our improved FS approach was superior to EFMDR with respect to the difference in the four cells (i.e., TP f , FP f , FN f , and TN f ). Simulation experiments demonstrated that FSMOMDR has a higher detection success rate than EFMDR does. Moreover, the FSMOMDR can be extended by the neutrosophic set [43].
In addition to its retention of MDR's advantages, FSMOMDR has three other characteristics. First, it applies multiobjective measurement, which is based on the improved FS approach, to increase the distinction between multifactor classes for improved detection of potential SNP-SNP interactions. Second, to understand the distribution of multifactor classes associated with a particular disease, FSMOMDR can use the membership degree to graphically represent SNP-SNP interactions. Third, FSMOMDR has no need for selecting the parameters of the fuzzy sets.
The computation time of FSMOMDR is k × n m ×s×3 m for the evaluation of the m-locus combinations between kfold CV subsets in n SNPs with s samples. Specifically, for 100 data sets containing 1000 SNPs and 400 samples, the average computation times for MDR, EFMDR, and FSMOMDR are approximately 26, 28, and 31 s, respectively. For large data sets used in a GWAS, FSMOMDR has an approximate computation time of 29.14 h for 23 chromosomes. We recommend choosing from a large existing suite of computational methods, including parallel operation [44], graphics processing units-based MDR [45], the greedy search strategy [46], and differential evolution-based MDR [18], to improve FSMOMDR runtime.

V. CONCLUSION
We designed a powerful FS approach, FSMOMDR, for detecting SNP-SNP interactions. FSMOMDR is based on the improved sigmoid function, which allows it to yield better information on MOMDR uncertainty. Each multifactor class can be evaluated with respect to its membership degree in the high-risk and low-risk groups, thus enabling FSMOMDR to detect more potential SNP-SNP interactions. FSMOMDR was demonstrated to have satisfactory power on real GWAS data sets; it can be used for SNP-SNP interaction detection. The findings of the present study suggest that the membership, based on the improved sigmoid function, can be used to identify SNP-SNP interactions in addition to obtaining content knowledge, thus reducing the information uncertainty in MDR.