Early Recurrence Detection of Invasive Ductal Carcinoma Utilizing High-Dimensional Genomic Data | IEEE Conference Publication | IEEE Xplore

Early Recurrence Detection of Invasive Ductal Carcinoma Utilizing High-Dimensional Genomic Data


Abstract:

Invasive ductal carcinoma (IDC) is one of the most prevalent and aggressive forms of breast cancer, constituting approximately 80 % of breast cancer cases in the United S...Show More

Abstract:

Invasive ductal carcinoma (IDC) is one of the most prevalent and aggressive forms of breast cancer, constituting approximately 80 % of breast cancer cases in the United States. After treatment, there is a 3–15 % chance that IDC will recur. Early detection of recurrent IDC is crucial for long-term patient survival because it can inform treatment decisions sooner. This study explores the application of machine learning to predict recurrence of IDC using mRNA expression data. We sought to compare performance of XGBoost and Random Forest, both of which are decision-tree ensemble machine learning models and are efficient when working with a moderate number of features. We aggregated mRNA expression and recurrence status data from TCGA's longitudinal BRCA study for 1084 IDC patients, of which 91 % were not IDC recurrent. mRNA expression data was collected about 11 years, on average, before IDC recurred. We employed Principal Component Analysis to reduce dimensionality of the genomic dataset from 20,652 to 50 features while retaining 90 % of the initial information. The Random Forest model with hyperparameters max_depth =25 and n_estimators = 500 performed best with 95 % accuracy, 83 % precision, and 56 % recall on the testing set. Our model identified over half of recurrent IDC cases approximately 11 years before recurrence while maintaining very high accuracy and precision scores for prediction. We believe model performance can be further improved by reducing skew in the data through introductio n of more recurrent IDC samples, inclusion of more features, and inclusion of patient tissue slide images for a multimodal annroach.
Date of Conference: 02-04 July 2024
Date Added to IEEE Xplore: 26 August 2024
ISBN Information:

ISSN Information:

Conference Location: Osaka, Japan

References

References is not available for this document.