Loading [MathJax]/extensions/TeX/ietmacros.js
Protein-DNA Binding Residue Prediction via Bagging Strategy and Sequence-Based Cube-Format Feature | IEEE Journals & Magazine | IEEE Xplore

Protein-DNA Binding Residue Prediction via Bagging Strategy and Sequence-Based Cube-Format Feature


Abstract:

Protein-DNA interactions play an important role in diverse biological processes. Accurately identifying protein-DNA binding residues is a critical but challenging task fo...Show More

Abstract:

Protein-DNA interactions play an important role in diverse biological processes. Accurately identifying protein-DNA binding residues is a critical but challenging task for protein function annotations and drug design. Although wet-lab experimental methods are the most accurate way to identify protein-DNA binding residues, they are time consuming and labor intensive. There is an urgent need to develop computational methods to rapidly and accurately predict protein-DNA binding residues. In this study, we propose a novel sequence-based method, named PredDBR, for predicting DNA-binding residues. In PredDBR, for each query protein, its position-specific frequency matrix (PSFM), predicted secondary structure (PSS), and predicted probabilities of ligand-binding residues (PPLBR) are first generated as three feature sources. Secondly, for each feature source, the sliding window technique is employed to extract the matrix-format feature of each residue. Then, we design two strategies, i.e., square root (SR) and average (AVE), to separately transform PSFM-based and two predicted feature source-based, i.e., PSS-based and PPLBR-based, matrix-format features of each residue into three corresponding cube-format features. Finally, after serially combining the three cube-format features, the ensemble classifier is generated via applying bagging strategy to multiple base classifiers built by the framework of 2D convolutional neural network. The computational experimental results demonstrate that the proposed PredDBR achieves an average overall accuracy of 93.7% and a Mathew's correlation coefficient of 0.405 on two independent validation datasets and outperforms several state-of-the-art sequenced-based protein-DNA binding residue predictors. The PredDBR web-server is available at https://jun-csbio.github.io/PredDBR/.
Published in: IEEE/ACM Transactions on Computational Biology and Bioinformatics ( Volume: 19, Issue: 6, 01 Nov.-Dec. 2022)
Page(s): 3635 - 3645
Date of Publication: 29 October 2021

ISSN Information:

PubMed ID: 34714748

Funding Agency:


1 Introduction

Tnteractions between proteins and DNAs play a crucial role in a wide variety of biological processes, such as, DNA replication, recombination, repair, gene transcription and expression [1], [2], [3]. Hence, the accurate prediction of protein-DNA binding residues contributes to elaborate the interaction mechanism of them, and facilitate our understanding of these biological processes. Traditionally, protein-DNA binding residues can be identified by experimental techniques, such as electrophoretic mobility shift assays (EMSAs) [4], [5], Fast ChIP [6], and X-ray crystallography [7]. However, these techniques are time-consuming and laborious. With the rapid advance of protein sequencing technology, a large amount of unannotated protein-DNA complexes is sequenced and deposited. Therefore, there is an urgent need to develop computational methods that can rapidly and reliably identify DNA-binding residues from protein sequences.

Contact IEEE to Subscribe

References

References is not available for this document.