Convolution-Bidirectional Temporal Convolutional Network for Protein Secondary Structure Prediction

As a basic feature extraction method, convolutional neural networks have some information loss problems when dealing with sequence problems, and a temporal convolutional network can compensate for this problem. Howerover, ordinary temporal convolutional networks can not deal well protein secondary structure prediction because of their one-way analysis. Therefore, we propose an integrated deep learning model called Convolutional-Bidirectional Temporal Convolutional Network. for 3-state and 8-state protein secondary structure predictions based on a convolutional neural network and bidirectional temporal convolutional networks. Because the model combines the advantages of the convolutional neural network and bidirectional temporal convolution network, it can not only capture the local correlation in the amino acid sequence but also analyse the long-distance interaction in the amino acid sequence. Therefore, this model can effectively improve the accuracy of protein secondary structure predictions. The experimental results show that the combination of convolutional neural network and bidirectional temporal convolutional network is effective for predicting protein secondary structure.


I. INTRODUCTION
Proteins are essential for life activities, and their structures determine their functions. The structures of proteins include primary, secondary, tertiary, and quaternary structures. The primary structure is the basis of the protein structure and is known as the natural structure. The primary structure determines the stable spatial conformation, and these structures enable the biological function of proteins. The sequence space of protein is extremely complex. There are 20 possible residues at each position; therefore, the protein is highly diverse in sequence, structure, and function. The Protein secondary structure (PSS) is formed by folding of the protein primary structure, protein tertiary structure is in turn formed by The associate editor coordinating the review of this manuscript and approving it for publication was Bo Pu .
bending and folding of the secondary structure, which refers to a specific spatial structure resulting from the maintenance of multiple secondary bonds. The protein quaternary structure refers to a polymeric structure formed by multiple independent tertiary structure polypeptide chains linked by noncovalent bonds. Predicting the tertiary structure of proteins is a basic problem in computational biology, and predicting the secondary structure is a springboard for predicting the tertiary structure. Therefore, research on protein secondary structure prediction (PSSP) is particularly important.
Many methods have been used to predict PSSs, particularly, after machine learning, and statistical methods have been used to predict PSSs in recent years. The position-specific scoring matrix (PSSM) based on PSIBLAST reflects information regarding sequence evolution, amino acid conservation and mutations [1]. Combining VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ PSSM data with machine learning has led to major breakthroughs in PSSP. Support vector machines (SVMs) [2], [3], [4]. Neural networks (NNs) [5], [6], [7] and k-nearest neighbours [8] can improve the protein prediction accuracy to over 70%. DeepCNF [9] used deep neural networks and conditional neural fields to predict 3-state and 8-state secondary structures. PSRM [10] uses big data to train support vector machines [11]. In contrast to machine learning methods, CNNs can automatically extract the local features of amino acid residues. Significant results have been achieved using CNN and RNN [12]. For example, Guo et al. fused an asymmetric convolution neural network with long-term and short-term memory neural network models and applied them to predict the secondary structure of eight types of proteins [13]. Heffernan used a long short-term memory (LSTM) bidirectional recurrent neural network (BRNN) that captured long-term interactions without using sliding windows [14]. Snderby et al. used bidirectional long-and short-term memory neural networks to capture long-range correlations between amino acid residues for secondary structure prediction [15]. Wang et al. Applied a CNN with conditional random fields for secondary structure prediction on the benchmark CB513 dataset.
In recent years, temporal convolutional network (TCN) are novel networks that have been proposed for sequence modelling [16]. Traditional convolution cannot capture the dependent information of long sequences owing to the limitation of the convolution kernel size. The TCN uses causal and expansion convolution based on traditional convolution and uses residual blocks instead of convolution. A residual block contains two layers of convolution and a non-linear mapping, with WeightNorm and Dropout incorporated in each layer to regularise the network. TCN has been shown to outperform long short-term memory networks in many sequence modelling problems.
These models either use a single information processing method to predict PSSs or are deficient in other aspects. In this paper, we propose Convolutional-Bidirectional Temporal Convolutional Network (C-BITCN), which takes advantage of CNN and bidirectional TCN in predicting PSS, where CNN can effectively extract short distance features near amino acids, and bidirectional TCN can effectively analyse the interactions between amino acids. The experiments showed that C-BITCN outperformed other advanced models.

II. C-BITCN
A. CONVOLUTIONAL NEURAL NETWORK CNN can effectively extract local features of amino acid chains to improve the accuracy of the PSS. The input of the CNN was a sliding window intercepted and created by PSSM, and the size of each sliding window was 20 × 19. The convolution layer extracts local features from the input data through local convolution and weight sharing. The neurones above the convolution layer transfer the features to the convolution layer, which is realised by ''sliding'' the convolution kernel on the input matrix PSSM. In the experiment, after the PSSM matrix was input into the CNN, the dimensions of the fully connected layer were set to 20 and the output features of the fully connected layer were extracted. Fig 1 shows the process of extracting features after inputting the PSSM matrix into a CNN.
The convolution feature extraction process was completed using a feature extraction filter to filter the PSSM matrices one by one. The matrix of each region is multiplied by the corresponding weight plus an offset to obtain the feature map, which is defined as follows: where C i is the set of convolutional kernels in layer i, C i k is the k th convolutional kernel in layer i, and N is the number of convolutional kernels in layer i. W i−1 n is the region map generated by the input amino acid PSSM matrix with the previous layer of the convolution kernels. After the convolution of layer i, the feature map J i k is obtained and is defined as follows: b k is the offset and f is the activation function. In this experiment, the activation function is ReLU, which was used to provide a nonlinear element for the output of the convolution layer. The pooled layer in Fig 1 did not undergo any learning. Its function is to reduce feature dimensions and prevent overfitting. The neurones of the entire junction layer are connected to the previous layer, which is convenient for extracting the characteristics of amino acid residues. The input dimension of the softmax layer can be controlled by resizing the fully connected layer. The softmax layer is connected to the fully connected layer, and consists of three or eight neurons. The value of the fully connected layer satisfies the following equation: B. TEMPORAL CONVOLUTIONAL NETWORK TCN has been proposed to process sequences in recent years and has been proven to be superior to LSTM in language modeling and natural language processing. The TCN has both causal and expansive convolutional functions. As shown in Fig 2, causal convolution is represented as follows: the value of the next layer at time k only depends on the value of the previous layer at time k and before time k; therefore, the causal convolution is unidirectional, and future data cannot be observed, unlike the traditional convolution. However, causal convolution alone cannot solve the problem of traditional  convolution neural network. Because the modeling length is limited by the size of the convolution kernel, it is necessary to build a sufficiently long dependency relationship by multi-layer linear stacking. Therefore, causal and dilated convolutions are required to address this problem. In dilation convolution, d = 1 in the first layer means sampling the input at each point, and d = 2 in the second layer means sampling at one sample every two points. Dilated convolution causes the receptive field to increase exponentially with the number of layers. The properties of the sequential modelling task are as follows: assume that given a sequence of inputs x 0 , . . . , x T and wish to predict some corresponding outputs y 0 , . . . , y T . Typically, in order to predict the output y t at moment t, we are restricted to using only those inputs that have already been observed:x 0 , . . . , x t . Therefore, formally, a sequential modelling network is an arbitrary function f that generates a mapping: X T +1 → Y T +1 , The mapping is as follows: Therefore, the TCN has two principles: the network cannot leak from the future to the past, and the output length of the network is consistent with the input length. The first principle requires the use of causal convolution, whereas the second principle achieves one-to-one correspondence by extracting information from the sequence in the expansion convolution and then outputting it, as illustrated in  Causal convolution can only extract the previous information linearly, but its time efficiency is low when processing long sequences. Therefore it is necessary to increase the dilated convolution to improve the processing efficiency. For a 1-D sequence input x R n and a filter f : {0, . . . , k − 1} → R, the dilated convolution operation F on sequence elements s is defined as: In Equation (6) d is the expansion factor, k is the filter size, and s − d · i indicates the past direction. Therefore, dilation is equivalent to introducing a fixed step between every two neighbouring filters. When d = 1, dilation convolution can be transformed into regular convolution. Using a larger dilation allows the highest output level to represent a larger range of inputs, thereby effectively expanding the receptive domain of the CNN.
The residual block contains a branch that leads to a series of transformations F whose output is added to the input x of block: Doing so can effectively change the mapping of layers, and yield better results for deeper networks. Although the VOLUME 10, 2022  sensing domain of the TCN depends on the network depth n, filter size k, and expansion factor d, the network depth of the TCN may increase. At this time, the use of a residual module instead of a convolution layer makes the model more stable.

C. C-BITCN 1) BIDIRECTIONAL TEMPORAL CONVOLUTIONAL NETWORK
Although the TCN has many advantages, its unidirectional prediction structure can not fully capture information in the protein sequence. Therefore, we modified the TCN and trained the model by taking forward and reverse protein sequences as the input information of the model simultaneously, and obtained the bidirectional TCN. The process for obtaining reverse input data is shown in Fig 4. We modified the TCN into a bidirectional TCN so that the protein sequence X1 and inverted sequence X2 can be input into the training and prediction modules simultaneously for training and prediction, which can effectively improve the prediction accuracy.

2) INTEGRATE CNN AND BIDIRECTIONAL TCN
In this study, we propose an integrated deep learning model, C-BITCN, the structure of which is illustrated in Fig 5. First, the PSSM matrix is processed into 20 × 19 matrices, which are input into the 4-D CNN, and then the features processed by the CNN are extracted in the fully connected layer. After the extracted features were combined with the  PSSM sequence, they were inputted into the sequential convolutional network, and the final classification results were obtained.
The details of the integration of CNN and TCN are shown in Fig 6. The amino acid sequence X to be predicted is processed and input into the CNN, and the features extracted by the CNN are simultaneously input into the TCN and reverse TCN at the same time, and the obtained T1, T2 and original information X are input into the final classifier for classification.
Equations (8) -(10) describe this process. We record bidirectional TCN as BITCN , and the final classification module as FinalC. Input the output of TCN, the output of bidirectional TCN and X into FinalC to get the classification result Y .  The CNN and bidirectional TCN are aggregated for prediction because in recent years, many PSSPs have been developed using a combination of CNN and LSTM [17]. The advantage of these models is that they can not only extract the local features near the target amino acids by CNN, but also obtain the interaction between amino acid residues by LSTM, effectively improving the accuracy of PSSP. TCN can better replace the work of LSTM. The problem that TCN cannot be used for bidirectional prediction sequence has also been solved. Therefore, C-BITCN can obtain better results than other methods.

B. EXPERIMENTAL ENVIRONMENT
The experimental environment for this study was as follows: The cluster hardware consists of four NF5280M5s, one NF5288M5, and two gigabit switches. The host template was an Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz with 24 CPUs and a maximum of 176G of memory. NF5280M5 was equipped with a Tesla V100 GPU (16G of memory) computing card and NF5288M5 was equipped with a Tesla V100 GPU (32G of memory) computing card. Centos 7.4 X64 standard system, MATLAB running version R2019b.

C. EVALUATION INDICATORS
The DSSP subdivides the three classes of PSS into eight classes: H (alpha helix), G (310 helix), I (pi-helix), E (betastrand), B (beta-bridge), T (beta-turn), S (bend) and L (loop or irregular). For the purposes of this study, G, H and I were uniformly classified as H, B and E as E and the rest as C.
Q 3 and Q 8 are often used as the evaluation criteria in this study. These values were obtained using the following equation.
where TP i is the number of residues correctly predicted in class i and FP i is the number of residues incorrectly predicted in class i. Q 3 can be obtained from the following equation: where TP is the total number of amino acid residues. The segment overlap score (SOV) is an evaluation of PSS based on protein structure fragments. Unlike Q 3 , which targets only a single residue, SOV was used to evaluate whether the predicted protein sequence fragments were correct. Then, the formula for the SOV is: The length of S 1 is marked as length (S 1 ), the union of the S 1 and S 2 sequences is marked as max(S 1 , S 2 ), and the intersection of the S 1 and S 2 sequences is marked as min(S 1 , S 2 ). δ in the above equation was set to allow for variation of the fragments at the edges of the protein structure and δ(S 1 , S 2 ) was taken to be consistent with the following definition: . (15) VOLUME 10, 2022

D. EFFECT OF DIFFERENT PARAMETERS ON RESULTS
In the experiment, we found that many parameters in the model have a significant impact on the final prediction results, such as the size of the CNN sliding window, the dimension of features extracted from the CNN fully connected layer, the number of series, residual blocks in the bidirectional TCN, and filter size. The tuning process of the sliding window size is presented in Table 2.
Different network structures also affected the final results. There are two types of pooling layers, the maximum pooling layer and the average pooling layer, and the maximum pooling layer is used in this study.
In this model, the input data of the bidirectional TCN module contains the features extracted from the fully connected layer of the CNN and processed by the CNN. Therefore, the size of the fully connected layer will affect the final results of the experiment. The process of adjusting the appropriate fully connected layer is presented in Table 3 Tables 5 and 6, respectively.

E. CROSS-VALIDATION
To better evaluate the effects of the model, we also used three-fold cross-validation. In this experiment, there were 11650 proteins in the training set, containing a total of 2778501 training samples, 1852334 of which were used as the training set and 926167 as the test set in the three-fold cross-validation, with a sliding window of size 20 × 19. Three three-fold cross-validation experiments were   conducted to ensure that all data were used as test samples. The cross-validation results of the CNN and C-BITCN are shown in Table 7 and 8.

F. EXPERIMENTAL RESULTS
To evaluate the accuracy and stability of the model in predicting PSS, we used cullpdb containing 11650 proteins as the training set and CASP10, CASP11, CASP12, CASP13, CASP14 and CB513 as the test set. There were no duplicate proteins during training of the test sets.

1) RESULTS FOR C-BITCN
The Q3 and Q8 accuracies of C-BITCN for each dataset are listed in Table 9 and Table 10, respectively.

2) COMPARISON OF C-BITCN WITH CNN AND BIDIRECTIONAL TCN
Comparison of the experimental results of the CNN, bidirectional TCN, and C-BITCN used in the experiments is shown in Fig 7 and  These experiments indicate that C-BITCN can indeed combine the advantages of CNN and bidirectional TCN, and can achieve higher PSSP results.

3) COMPARISON OF C-BITCN WITH CNN AND BIDIRECTIONAL TCN
We used Q 3 and Q 8 as the evaluation criteria and compare the model with PSIPRED [25], DeepCNF [26], OCLSTM [27], and MUFOLD-SS [28]. The comparison results for Q 3 and Q 8 are presented in Fig 9 and 10, respectively. In Fig 9, the results of PSIPRED, DeepCNF and OCLSTM in the test set are taken from this paper [24] and [25].

IV. CONCLUSION
To analyze the local and global interactions between amino acids in protein sequences, we propose C-BITCN, an artificial neural network based on CNN and bidirectional TCN, for the first time and apply it to protein secondary structure prediction. It can be seen from the experimental results that the accuracy of C-BITCN is significantly improved compared with CNN and bidirectional TCN, indicating that C-BITCN is indeed effective. Compared to other methods, the results show that C-BITCN outperformed the other methods for predicting PSSs.
In our experiment, we used cullpdb after removing duplicate proteins as the training set and CASP10, CASP11, CASP12, CASP13, CASP14 and CB513 as the test set and achieved better results than other methods for PSSP. Although there are many ways to predict the secondary structure of proteins using the binding of CNN and LSTM, few have used CNN and bidirectional TCN to predict the secondary structure of proteins. C-BITCN can achieve good results because it can extract local features from CNN, analyse the interaction and connection between amino acid sequences, VOLUME 10, 2022 determine the type of amino acids, predict the amino acids on either side of the position, and analyse the interaction of amino acids in protein sequences with large length through bidirectional TCN. The two factors that have a great influence on the prediction of secondary structures are extracting the local characteristics of the amino acid chain and observing the interaction of amino acids in the protein sequence. C-BITCN has these two characteristics simultaneously; therefore, it achieves better results than the other methods.
Because this model adopts maximum pooling, amino acid mutation or information error will affect the prediction results. We will continue to optimize the model and test it on more datasets in the future.
YUNQING ZHANG is currently pursuing the M.Eng. degree in computer application technology with the Shandong Academy of Sciences, Qilu University of Technology. His research interests include deep learning, intelligent information processing, and biomedical information processing.
YUMING MA received the master's degree in automatic control from Harbin Engineering University. She is currently an Associate Professor with the Shandong Academy of Sciences, Qilu University of Technology. Her research interests include bioinformatics and machine learning.
YIHUI LIU received the Ph.D. degree in computer science from the University of Nottingham, U.K., in 2004. She is currently a Professor with the Shandong Academy of Sciences, Qilu University of Technology, China. She has published more than 40 articles. Her research interests include biomedical information and medical image analysis, including protein structure prediction, and microarray data.