Loading [a11y]/accessibility-menu.js
Multimodal feature learning framework for disease biomarker discovery | IEEE Conference Publication | IEEE Xplore

Multimodal feature learning framework for disease biomarker discovery


Abstract:

Most in silico biomarker discovery methods are based on a single disease data modality. However, they often fail to identify and understand the relationships between diff...Show More

Abstract:

Most in silico biomarker discovery methods are based on a single disease data modality. However, they often fail to identify and understand the relationships between different biomarkers emerging from various data types. Heterogeneous multimodal analysis methods facilitate the integration of multiple disease-related data resources, allowing us to gain a more accurate understanding of the mechanisms involved in a disease. Multimodal biomarkers resulting from such integrated analyses are complementary and robust and enable a holistic understanding of the disease. In this study, we proposed a novel neural network model that integrates and analyzes data from tissue transcriptomics, protein-protein interactions, and gene functional annotation resources. We employed a network-based approach by constructing a heterogeneous knowledge graph consisting of different biomedical entities and their inherent relationships. In addition, we utilized contextualized gene features mined from the biomedical literature. Our multimodal framework accurately predicts the differential activity of genes in idiopathic pulmonary fibrosis (IPF). Evaluation experiments were performed to determine the benefits of implementing the multimodal approaches. We also employed transcription factor enrichment analysis (TFEA) to validate the learned features. Finally, the gene clusters identified using multimodal gene features in IPF were found to be biologically relevant and meaningful.
Date of Conference: 06-08 December 2022
Date Added to IEEE Xplore: 02 January 2023
ISBN Information:
Conference Location: Las Vegas, NV, USA

Introduction

In silico gene expression analysis has become an established tool for gene discovery and the identification of molecular mechanisms in a disease [1]. These methods often rely on expression profiling to identify a single or clusters of relevant genes, whose expression is found to be altered in diseased tissues or cells. Additional analyses are required to identify the distinct biological mechanisms represented by these candidate genes and to explore their therapeutic potential. These in silico analysis methods have been successfully used to identify novel gene biomarkers for several diseases [2 - 7]. Recently, machine-learning approaches have also been utilized in biomarker discovery [8 - 12]. However, most of these algorithms consider one disease modality at any given point (e.g., gene expression profiles) and often ignore the relationships or interactions among the different components. Consequently, the utility of the identified genes (or gene sets) is limited and one-dimensional. For instance, they can help uncover the biological processes involved in a disease, while lacking any therapeutic benefit. Therefore, computational frameworks that can utilize different data verticals are warranted, particularly for complex and multifactorial diseases.

Contact IEEE to Subscribe

References

References is not available for this document.