NNDSVD-GRMF: A Graph Dual Regularization Matrix Factorization Method Using Non-Negative Initialization for Predicting Drug-Target Interactions

Accurate drug-target interactions (DTIs) prediction can significantly speed up the process of new drug design and development. Recently, many matrix factorization methods have been used to predict DTIs. However, most of them use heuristic and iterative strategies, and their convergence and performance can not be guaranteed. In order to accurately predict DTIs, we propose a new algorithm, NNDSVD-GRMF, our method is based on graph dual regularization non-negative matrix factorization (GDNMF) and non-negative double singular value decomposition (NNDSVD), which considers both the initialization stage of the non-negative matrix factorization and the structural information of the data and features. At the same time, in order to improve the adaptability of the algorithm, the extension of the NNDSVD-GRMF (NNDSVD-WGRMF) is also proposed. Extensive experimental results show that our methods have better performance than other state-of-the-art methods. In the case studies, among the 10 highest-scoring drugs predicted to interact with androgen receptor, 9 drugs have been validated, and among the 10 highest-scoring target proteins predicted to be targeted by the drug nicotine bitartrate, 9 targets have been validated.


I. INTRODUCTION
Drug-target interaction (DTI) identification plays important role in the process of new drug discovery and development. However, all wet experiments to identify DTIs require expensive equipments and costly chemical reagents, and their process is time consuming [1]. With the rapid accumulation of related information of drugs, target proteins and validated DTIs, more and more computational methods have been proposed to predict new DTIs. Based on 3D structures of proteins (targets), Cheng et al. [2] used docking simulations to predict compound-protein interactions. Campillos et al. [3] introduced a method using drug side-effect similarity to infer the possibility that a drug interacts with a target. The above The associate editor coordinating the review of this manuscript and approving it for publication was Shubhajit Roy Chowdhury . two methods only apply to drugs with known side-effects or proteins with known 3D structure, and could not perform large scale screening of potential DTIs.
Recently a variety of approaches have been proposed based on the known DTI network, drug (or compound) chemical structures, and target protein sequences. For example, Yamanishi et al. [4] used a DTI bipartite graph to represent the known DTI network, and proposed a bipartite graph learning method to predict DTIs. They constructed a similarity matrix according to the shortest distances between all drugs and targets in the the DTI bipartite graph, and used the eigenvalue decomposition of the matrix to embed all drugs and targets into a unified feature space. Two kernel regression models were learned to map drugs and targets to the unified space from their chemical structure space and their sequence space, respectively. The inner product of the VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ feature vectors in the unified space of a drug and a target was used to measure their interaction possibility. Based on the same data, Bleakley and Yamanishi [5] used bipartite local models (BLM) to predict target proteins for a given drug, and the drugs targeting a given protein. Therefore they obtained two predictions for each drug-protein pair, which were then combined into a prediction score.
Using the interaction profiles of targets (and drugs) from the known DTI network, Laarhoven et al. [6] introduced a method called the Gaussian Interaction Profile (GIP) kernel to calculate the similarities between targets (and drugs), and used Regularized Least Squares (RLS) to predict DTIs. Perlman et al. [7] integrated multiple types of information including the chemical structures, side effects, ATC codes and related gene expression profiles of drugs, and the sequences and PPI network of targets, then used logistic regression to score DTIs. Mei et al. [8] integrated neighborbased interaction-profile inferring into BLM and proposed a method BLM-NII to make DTI predictions. Based on GIP, Laarhoven and Marchiori [9] proposed a weighted nearest neighbor method to predict DTIs. With the rapid development of machine learning, machine learning based DTI prediction methods have also been proposed. For example, Wang and Zeng [10] used restricted Boltzmann machine to model multiple types of DTIs. Lan et al. [11] set unknown interactions as unlabeled samples, and used Random walk with restarts, K nearest neighbor and heat kernel diffusion to divide them into reliable negative samples(RN) and likely negative samples(LN). Based on the positive samples (known DTIs), RN and LN, a weighted support vector machine was chosen as classifier to predict DTIs. Rifaioglu et al. proposed DEEPScreen [12], which used a deep convolutional neural network to learn drug features from their 2-D structural images and to predict large-scale DTIs.
In bioinformatics, matrix factorization method has a wide range of applications, such as clustering and feature selection [13], lncRNA-disease associations prediction [14], tumor classification [15] and marker extraction [16]. In DTI prediction, the known DTI network is usually represented by a matrix. In most matrix factorization based methods, the interaction matrix is decomposed into two matrices of low ranks which represent the interaction between a drug and a target as the inner product of their feature vectors. For example, Gönen [17] proposed a kernelized Bayesian matrix factorization method to predict DTIs. Zheng et al. [18] proposed a DTI prediction model named Multiple Similarities Collaborative Matrix Factorization (MSCMF), which used an alternating least squares algorithm to decompose the interaction matrix into two low-rank feature factor matrices that are consistent with the similarity matrices of drugs and targets. Liu et al. [19] combined logistic matrix decomposition with neighborhood regularization to predict DTIs. Bolgár and Antal [20] proposed a variational Bayesian multiple kernel logistic matrix factorization method using Laplacian regularization, multiple kernel learning, and a variational Bayesian inference process to infer drug-target interaction possibilties.
Ezzat et al. [21] used graph regularization into matrix factorization to learn drug-target interaction models. Based on L 2,1 norm graph regularization, Cui et al. [22] used matrix factorization to predict DTIs. Liu et al. [23] used a graph convolutional network (GCN) followed by a random walk with restart (RWR) to obtain features of the drugs and targets from the related heterozygous data, and a matrix factorization model (DistMult) to predict DTIs. Gao et al. [24] proposed a collaborative matrix factorization method with soft regularization to predict DTIs. Although many methods have been proposed for DTIs prediction, the prediction performances are far from satisfactory. The key issue of DTIs prediction is how to efficiently use the existing validated DTIs and exploit the useful information hidden among drugs or targets.
Being simple and practical, the singular value decomposition (SVD) algorithm is usually used to provide an initial solution to matrix factorization, however the negative values in the component matrices make the results hard to explain. To provide better and explainable initial component matrices for matrix factorization, Boutsidis and Gallopoulos [25] proposed an algorithm non-negative double singular value decomposition (NNDSVD), which can enhance the initialization stage of nonnegative matrix factorization. In DTI prediction, the high-dimensional data are in fact sampled from a nonlinear low-dimensional manifold embedded in the high-dimensional space, and according to [26], the model learning performance can be greatly improved if the intrinsic geometrical structure of the manifold have been taken into account. To capture the structural information from both data and the features, Shang et al. [27] proposed a graph dual regularization non-negative matrix factorization method (GDMF) for clustering, and their experimental results showed GDMF had better clustering performance and more discriminating power than the general non-negative matrix factorization and graph regularization non-negative matrix factorization. In the paper, we propose a DTI prediction method called NNDSVD-GRMF based on the graph dual regularized non-negative matrix factorization and the non-negative double singular value decomposition. At first a DTI matrix, a drug similarity matrix and a target similarity matrix are constructed based on known DTIs, drug chemical structures and target sequences. Then DTI prediction is transformed into non-negative factorization of the DTI matrix with graph dual regularization terms. The graph dual regularization terms are used to integrate the information from the drug similarity matrix and the target similarity matrix, in order to take the intrinsic geometrical structures of the related manifolds into account. NNDSVD-GRMF can be used to predict novel drug-target interactions for drugs without any known targeted proteins and for proteins without any known drugs targeting them. To improve the adaptability, a weighted extension (i.e. NNDSVD-WGRMF) of the NNDSVD-GRMF is also proposed. Experimental results show that our methods have better performance than other state-of-the-art methods on four datasets under two different experimental scenarios. In case studies involving the target androgen receptor and the drug nicotine bitartrate, among the 10 highest-scoring drugs predicted to interact with androgen receptor, 9 drugs have been validated by wet experiments, and among the 10 highest-scoring target proteins predicted to be targeted by the drug nicotine bitartrate, 9 targets have been validated by wet experiments.

II. MATERIALS
We used the same data as in [4]. The known DTI data, the amino acid sequences of protein targets and the chemical structure data of drugs were downloaded from KEGG [28]. Protein targets include the following four classes: nuclear receptor (NR), G protein-coupled receptor (GPCR), ion channel (IC) and enzyme (E). The validated DTIs were considered the gold standard data. We used the same known DTI data as in [4], which have 4 datasets whose names are NR, GPCR, IC and E according to their target classes. The information of the 4 datasets are shown in Table 1. In the NR dataset, there are 90 known interactions between 54 drugs and 26 nuclear receptors; in the GPCR dataset, there are 635 known interactions between 223 drugs and 95 G proteincoupled receptors; in the IC dataset, there are 1476 known interactions between 210 drugs and 204 ion channels; and in the E dataset, there are 2926 known interactions between 445 drugs and 664 enzymes. The known DTIs were denoted as a n × m matrix X , where n and m are the numbers of drugs and targets, respectively. If the ith drug is validated to interact with the jth target, X ij = 1; otherwise, X ij = 0.
The structural similarities between drugs were calculated from the chemical structure data using SIMCOMP [29] according to the size of the common substructures between two drugs. The similarities between drugs were represented by a n × n matrix S d . The sequence similarities between two target proteins were computed from the sequence data using the normalized Smith-Waterman alignment score [30]. Let SW (p 1 , p 2 ) be the original Smith-Waterman alignment score of p 1 and p 2 . The similarity between p 1 and p 2 s(p 1 , . The similarities between targets were represented by a m × m matrix S t .

A. NON-NEGATIVE MATRIX FACTORIZATION
For a n × m non-negative matrix X ∈ R n×m + , the general form of the non-negative matrix factorization (NMF) of X is to decompose X into two low rank nonnegative component matrices, say A ∈ R n×k + and B ∈ R m×k + , such that their product well approximates X , i.e. X ≈ AB T . The component matrices A and B are also called the latent feature matrices of X . When the square of the Frobenius norm of the difference between X and AB T is used as the cost function, NMF of X is to solve the following optimization problem: (1) The most commonly used method to obtain a local optimal solution to the problem is the iterative process proposed by Lee and Seung [31] which uses the following update formulas: To avoid overfitting, the L2 regular term (||A|| 2 F + ||B|| 2 F ) is added into the cost function of NMF, and we get the L2 regularized NMF: (3)

B. GRAPH REGULARIZED NON-NEGATIVE MATRIX FACTORIZATION
Each column of X can be considered as a data point in a n dimension space. In real applications, k is usually much smaller than n and m, and the NMF of X tries to find a low dimension space such that the data points in X could be well represented as linear combinations of k base vectors. Based on the assumption that the two data points close in the latent geometry space will be close in the low dimension spaces, Cai et. al. [32] proposed a graph regularized non-negative matrix factorization (GRMF) by adding a graph regularized term. The cost function of GRMF is as follows: where Tr(. ) is the trace of a matrix, W is the weight matrix representing a neighbor graph on the data points, and D is a diagonal matrix that D jj = l W lj . The matrix D−W is called graph Laplacian and denoted by L in the following. Furthermore, by adding another graph regularizer of the feature space, Shang et. al. [27] proposed a graph dual regularization non-negative matrix factorization (GDNMF), whose cost function is: matrix S t . We construct a p-nearest neighbor drug graph whose adjacent matrix N d is defined as follows.
N d is used to make the similarity matrix S d sparse by using Equation (7).
The graph Laplacian ofŜ d is defined as Similarly, we construct a p-nearest neighbor target graph whose adjacent matrix is N t according to Equation (8).
Using N t , we make a sparse target similarity matrixŜ t from S t according to Equation (9).
The graph Laplacian ofŜ By adding the L2 regular term (||A|| 2 F + ||B|| 2 F ) and two graph regular terms Tr(A T L d A) and Tr(B T L t B), the non-negative matrix factorization of the drug-target interaction matrix X is formulated as the optimization problem in Equation (10).
Since the normalized graph Laplacian performs better in many actual applications, we furthermore normalize L d and L t as Equations (11) and (12), respectively.
Finally, the drug-target interaction prediction problem is transformed into the non-negative matrix factorization of X with the L2 regular term and the dual normalized graph regularization terms, which is formulated as Equation (13). The inner product of the ith row of A and the jth row of B is used to predict the interaction between the ith drug and the jth target.

D. ALGORITHM
To solve the problem in Equation (13), we design an algorithm NNDSVD-GRMF. NNDSVD-GRMF uses NNDSVD [25] to get the initial values of A and B, and uses an iterative process to update A and B until they converge. The iterative process uses Equation (14) and Equation (15) to update A and B.
Since the iterative process can only reach a local optimal solution, choosing good initial values for A and B will enhance the possibility that the process reach a global optimal solution. We use NNDSVD to calculate the initial values of A and B, which could lead to rapid convergence of many NMF algorithms [25]. NNDSVD is based on the SVD of X : X = U V T , where is a diagonal matrix whose diagonal entry at the ith row and the ith column is the ith largest singular value σ i of X , and the ith columns u i and v i of U and V are the left and right singular vectors corresponding to σ i , respectively. For a n × m matrix X with rank k, according to the property of SVD, X equals to the sum of k leading singular factors, i.e. Table 2 for the details.

E. WEIGHTED GRAPH-REGULARIZED MATRIX FACTORIZATION
In order to get some latent features of A and B, weight matrix W is added to the graph regularization matrix factorization. The form is where represents Hadamard product. Weight matrix W ∈ R n×m + , if X has a known drug-target interaction W ij = 1, otherwise, W ij = 0, (i = 1, · · · , n, j = 1, · · · , m). Using alternating least squares as with NNDSVD-GRMF, the solution is obtained as where I is identity matrix and diag is diagonal matrix. A schematic of NNDSVD-GRMF is shown in Fig. 1.

IV. RESULTS
In this section, we analyze the performance of NNDSVD-GRMF and NNDSVD-WGRMF in the following three aspects. First, we analyze the performance of NNDSVD-GRMF and NNDSVD-WGRMF under the two important evaluation indicators through 10-fold cross-validation. Second, we compare the performance of NNDSVD-GRMF and NNDSVD-WGRMF in the case of parameter changes. Third, we evaluate the reliability of NNDSVD-GRMF using case studies.

A. CROSS VALIDATION EXPERIMENTS
To evaluate the performance of DTI prediction methods, 5 repetitions of 10-fold cross-validation is performed in the experiment. In each repetition of 10-fold cross-validation, the dataset is divided into ten parts, 1 part is used for validation, and the remaining 9 parts are used for training, and this procedure is repeated 10 times. The final result is obtained by averaging the 5 repetitions of 10-fold cross-validation results. In order to evaluate the quality of the prediction results, the area under the receiver operating characteristic curve (AUC) and area under the precision-recall curve (AUPR) are used as the main evaluation indicators. In the table, the best AUC and AUPR under each data set are bolded and the standard deviation derived from the experiment is shown in parentheses. At the same time, receiver operating characteristic (ROC) curve and precision recall (PR) curve for each method are plotted.
To fully test the validity of the method, the cross-validation experiments are conducted under the following two scenarios [33]. 1) CVd: the complete rows in the DTI matrix X are left out for the testing set. It tests the ability to predict interactions for new drugs, 2) CVt: the complete columns in the DTI matrix X are left out for the testing set. It tests the ability to predict interactions for new targets.

3) PREDICTION RESULTS UNDER CVt
The AUC and AUPR values for each algorithm in the CVt scenario are given in Table 5 and Table 6, respectively. From Table 5, it can be found that the AUC of our method has better performance than other state-of-the-art methods on NR, GPCR, IC datasets. From Table 6, we can find that the AUPR values of NNDSVD-WGRMF outperform the other methods on NR and IC datasets. Fig. 6 and Fig. 8 show the histograms of AUC and AUPR with error bars for each algorithm under the CVt scenario, respectively. Fig. 7 and Fig. 9 show ROC curves and PR curves of different algorithms on the four datasets under CVt, respectively.

C. SENSITIVITY ANALYSIS
The values of the parameters λ d , λ t and λ l are obtained using grid search. In cross-validation, the training set for each fold is different, which complicates the sensitivity analysis of the parameters.
To investigate the effect of changes in parameters λ d , λ t and λ l on the prediction results, we set λ d = 0, λ t = 0   and λ l = 0 under CVd and CVt, respectively. The results are shown in Table 7-10. The results show that NNDSVD-GRMF performs best when λ d , λ t , and λ l are not 0. In the CVd scenario, compared to the change of λ t = 0, the change of λ d = 0 and λ l = 0 has a greater impact on    the AUC and AUPR values. In the CVt scenario, compared to the change of λ d = 0, the change of λ t = 0 and λ l = 0 has a greater impact on the AUC and AUPR values. When λ d , λ t , λ l is other parameters, the performance of NNDSVD-GRMF are shown in Fig. 10 and Fig. 11 under CVd and CVt, respectively. Therefore, we can conclude that graph dual regularization and L2 regularization can improve the learning ability of the model, capture more structural information of the graph, and improve the prediction effect.

D. CASE STUDIES
To further demonstrate the effectiveness of NNDSVD-GRMF for predicting DTI, we perform two different types of case studies. In the first case study, we select the target VOLUME 10, 2022     (Androgen receptor) in the NR dataset as the object of the case study. The androgen receptor, as a steroid receptor, not only achieves hormonal regulation of target cells, but it has also been shown to play a role in the proliferation of androgen-refractory cells and in androgen-refractory prostate cancer [35]. For androgen receptors in the NR dataset, androgen receptor interactions with drugs on the NR dataset are  used as the training dataset for NNDSVD-GRMF, and candidate drugs are ranked according to the prediction scores of NNDSVD-GRMF. Then, we select drugs with the top 10 scores and validate them. It can be found that 9 drugs are accurately predicted. The detailed results of the predictions are shown in Table 11.
In the second type of case study, we aim to assess whether NNDSVD-GRMF could be applied to drugs without known interaction targets. We select drug (Nicotine bitartrate) from the IC dataset as the subject of the case study. Nicotine bitartrate, a drug that acts on nicotinic receptors, has been shown to reduce falls in Parkinson's patients [36]. In the case study, the interactions of nicotine bitartrate with the target   obtained from the IC dataset are used as a training dataset for NNDSVD-GRMF, and candidate targets are ranked according to prediction scores. Then, we select targets with the top 10 scores and validate them. It can be found that 9 targets are accurately predicted. The detailed results of the predictions are shown in Table 12.

V. CONCLUSION
In order to accurately predict DTIs, we proposed new methods NNDSVD-GRMF based on graph dual regularized matrix factorization. This method can accurately obtain the structural information on the drug and target manifold. In the process of solving the objective function, we use nonnegative double singular value decomposition to enhance the initial stage of non-negative matrix factorization. In order to improve adaptability, the weighted form of NNDSVD-GRMF (i.e. NNDSVD-WGRMF) has also been proposed. In the experiment, our method has better performance compared with other state-of-the-art methods. At the same time, our method also has a better performance for certain specific targets and drug predictions.
The better performance of our method may be attributed to the following factors. First of all, NNDSVD-GRMF not only adds graph dual regularization, but also considers the initial stage of matrix factorization. The graph dual regularization enables the model to obtain more information about the data structure. The use of non-negative double singular value decomposition in the initial stage of matrix factorization not only affects the convergence of matrix factorization, but also improves the quality of the solution. Second, as a weighted extension of NNDSVD-GRMF, NNDSVD-WGRMF has better adaptability to datasets. Third, our method may be more suitable for solving the problem of DTIs prediction.
As future work, other multi-manifold regularization methods and network embedding technologies may be used to further improve the prediction performance. For example, in multi-manifold regularization [37], the manifold and coefficient matrix can be combined with multi-manifold regularization to maintain the local geometry of drug and target. When we reduce the dimensionality, the network embedding can preserve the structure of the network [38].