Comparison of Machine Learning Classifiers on Integrated Transcriptomic Data | IEEE Conference Publication | IEEE Xplore

- Donate
- Cart
- Create Account
- Personal Sign In

ADVANCED SEARCH

Conferences >2023 IEEE International Confe...

Comparison of Machine Learning Classifiers on Integrated Transcriptomic Data

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Omics data are being generated for different conditions, and can be a valuable resource for building novel predictive models for medical diagnosis. Given the reduced numb...Show More

Metadata

Abstract:

Omics data are being generated for different conditions, and can be a valuable resource for building novel predictive models for medical diagnosis. Given the reduced number of samples in each dataset, the application of Machine Learning (ML) models requires data integration. At the same time, multiple ML models are available, and the best option for data integration is not known. These challenges have been addressed typically in restricted settings, i.e., for one single disease at a time. However, a thorough comparison of models on integrated data, for different conditions, is still missing. In this paper we confront 7 classifiers on integrated data for 6 diseases, over 14 datasets. We compared the models on single and integrated datasets, employing different pre-processing techniques. We also evaluated the effect of feature selection, analyzing the robustness and relevance of the features extracted. We observed that, even if integration slightly reduces predictive power, the models are still able to produce good classifications. When testing generalization abilities on new datasets, sometimes the performance decreases drastically, depending on the disease studied.

Published in: 2023 IEEE International Conference on Big Data (BigData)

Date of Conference: 15-18 December 2023

Date Added to IEEE Xplore: 22 January 2024

ISBN Information:

DOI: 10.1109/BigData59044.2023.10386445

Conference Location: Sorrento, Italy

Funding Agency:

References is not available for this document.

Select All

1.

V. Y. Kiselev, T. S. Andrews, and M. Hemberg, “Challenges in unsupervised clustering of single-cell rna-seq data,” Nature Reviews Genetics, vol. 20, no. 5, pp. 273–282, 2019.

CrossRef Google Scholar

2.

B. J. Erickson, P. Korfiatis, Z. Akkus, and T. L. Kline, “Machine learning for medical imaging,” Radiographics, vol. 37, no. 2, pp. 505–515, 2017.

CrossRef Google Scholar

3.

S. Michiels, S. Koscielny, and C. Hill, “Prediction of cancer outcome with microarrays: a multiple random validation strategy,” The Lancet, vol. 365, no. 9458, pp. 488–492, 2005.

CrossRef Google Scholar

4.

L. Ein-Dor, O. Zuk, and E. Domany, “Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer,” Proceedings of the National Academy of Sciences, vol. 103, no. 15, pp. 5923–5928, 2006.

CrossRef Google Scholar

5.

A. Sˆırbu and C. Priami, “Social choice theory for machine learning: a binary rank aggregation classifier,” 2019. MLDM Workshop of the AI*IA 2019 Conference, https://drive.google.com/file/d/1xhUK5v71D49Hk0X6Y0IvV10HLabqfRC/view

6.

I. Testa, “Classification through rank aggregation in the biomedical domain: a comparative analysis,” 2022. University of Pisa Thesis, https://github.com/iretes/bachelor-thesis

7.

T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, et al., “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” science, vol. 286, no. 5439, pp. 531–537, 1999.

CrossRef Google Scholar

8.

A. Statnikov, L. Wang, and C. F. Aliferis, “A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification,” BMC bioinformatics, vol. 9, no. 1, pp. 1–10, 2008.

CrossRef Google Scholar

9.

B. F. de Souza, A. C. de Carvalho, and C. Soares, “A comprehensive comparison of ml algorithms for gene expression data classification,” in The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, IEEE, 2010.

10.

J. W. Lee, J. B. Lee, M. Park, and S. H. Song, “An extensive comparison of recent classification tools applied to microarray data,” Computational Statistics & Data Analysis, vol. 48, no. 4, pp. 869–885, 2005.

CrossRef Google Scholar

11.

P. Wirapati, C. Sotiriou, S. Kunkel, P. Farmer, S. Pradervand, B. Haibe-Kains, C. Desmedt, M. Ignatiadis, T. Sengstag, F. Sch¨utz, et al., “Metaanalysis of gene expression profiles in breast cancer: toward a unified understanding of breast cancer subtyping and prognosis signatures,” Breast Cancer Research, vol. 10, no. 4, pp. 1–11, 2008.

CrossRef Google Scholar

12.

D. R. Rhodes, J. Yu, K. Shanker, N. Deshpande, R. Varambally, D. Ghosh, T. Barrette, A. Pandey, and A. M. Chinnaiyan, “Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression,” Proceedings of the National Academy of Sciences, vol. 101, no. 25, pp. 9309–9314, 2004.

CrossRef Google Scholar

13.

A. Sˆırbu, H. J. Ruskin, and M. Crane, “Cross-platform microarray data normalisation for regulatory network inference,” PLoS One, vol. 5, no. 11, p. e13822, 2010.

CrossRef Google Scholar

14.

H. Jiang, Y. Deng, H.-S. Chen, L. Tao, Q. Sha, J. Chen, C.-J. Tsai, and S. Zhang, “Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes,” BMC bioinformatics, vol. 5, no. 1, pp. 1–12, 2004.

15.

M. Zitnik, F. Nguyen, B. Wang, J. Leskovec, A. Goldenberg, and M. M. Hoffman, “Machine learning for integrating data in biology and medicine: Principles, practice, and opportunities,” Information Fusion, vol. 50, pp. 71–91, 2019.

CrossRef Google Scholar

16.

J. Stec, J. Wang, K. Coombes, M. Ayers, S. Hoersch, D. L. Gold, J. S. Ross, K. R. Hess, S. Tirrell, G. Linette, et al., “Comparison of the predictive accuracy of dna array-based multigene classifiers across cdna arrays and affymetrix genechips,” The Journal of Molecular Diagnostics, vol. 7, no. 3, pp. 357–367, 2005.

CrossRef Google Scholar

17.

S. Jung, Y. Bi, and R. V. Davuluri, “Evaluation of data discretization methods to derive platform independent isoform expression signatures for multi-class tumor subtyping,” BMC genomics, vol. 16, no. 11, pp. 1–10, 2015.

CrossRef Google Scholar

18.

M. A. Care, S. Barrans, L. Worrillow, A. Jack, D. R. Westhead, and R. M. Tooze, “A microarray platform-independent classification tool for cell of origin class allows comparative analysis of gene expression in diffuse large b-cell lymphoma,” PloS one, vol. 8, no. 2, p. e55895, 2013.

CrossRef Google Scholar

19.

B. Kegerreis, M. D. Catalina, P. Bachali, N. S. Geraci, A. C. Labonte, C. Zeng, N. Stearrett, K. A. Crandall, P. E. Lipsky, and A. C. Grammer, “Machine learning approaches to predict lupus disease activity from gene expression data,” Scientific reports, vol. 9, no. 1, p. 9617, 2019.

CrossRef Google Scholar

20.

D. Castillo, J. M. Galvez, L. J. Herrera, F. Rojas, O. Valenzuela, O. Caba, J. Prados, and I. Rojas, “Leukemia multiclass assessment and classification from microarray and rna-seq technologies integration at gene expression level,” PloS one, vol. 14, no. 2, p. e0212127, 2019.

CrossRef Google Scholar

21.

J.M. Gálvez, D. Castillo, L. J. Herrera, B. San Román, O. Valenzuela, F. M. Ortuno, and I. Rojas, “Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series,” PloS one, vol. 13, no. 5, p. e0196836, 2018.

CrossRef Google Scholar

22.

T. Barrett, S. E. Wilhite, P. Ledoux, C. Evangelista, I. F. Kim, M. Tomashevsky, K. A. Marshall, K. H. Phillippy, P. M. Sherman, M. Holko, et al., “Ncbi geo: archive for functional genomics data sets—update,” Nucleic acids research, vol. 41, no. D1, pp. D991–D995, 2012.

CrossRef Google Scholar

23.

T. P. Lu, M. H. Tsai, J. M. Lee, C. P. Hsu, P. C. Chen, C. W. Lin, J. Y. Shih, P. C. Yang, C. K. Hsiao, L. C. Lai, and E. Y. Chuang, “Identification of a novel biomarker, sema5a, for non–small cell lung carcinoma in nonsmoking women,” Cancer Epidemiology and Prevention Biomarkers, vol. 19, no. 10, pp. 2590–2597, 2010.

CrossRef Google Scholar

24.

T. Sato, A. Kaneda, S. Tsuji, T. Isagawa, S. Yamamoto, T. Fujita, R. Yamanaka, Y. Tanaka, T. Nukiwa, V. E. Marquez, Y. Ishikawa, M. Ichinose, and H. Aburatani, “Prc2 overexpression and prc2-target gene repression relating to poorer prognosis in small cell lung cancer,” Scientific reports, vol. 3, no. 1, pp. 1–9, 2013.

CrossRef Google Scholar

25.

Y. Yao, L. Richman, C. Morehouse, M. De Los Reyes, B. W. Higgs, A. Boutrin, B. White, A. Coyle, J. Krueger, P. A. Kiener, and B. Jallal, “Type I interferon: potential therapeutic target for psoriasis?,” PloS one, vol. 3, no. 7, p. e2737, 2008.

CrossRef Google Scholar

26.

R. P. Nair, K. C. Duffin, C. Helms, J. Ding, P. E. Stuart, D. Goldgar, J. E. Gudjonsson, Y. Li, T. Tejasvi, B. J. Feng, et al., “Genome-wide scan reveals association of psoriasis with il-23 and nf-κb pathways,” Nature genetics, vol. 41, no. 2, pp. 199–204, 2009.

CrossRef Google Scholar

27.

L. Raskin, D. R. Fullen, T. J. Giordano, D. G. Thomas, M. L. Frohm, K. B. Cha, J. Ahn, B. Mukherjee, T. M. Johnson, and S. B. Gruber, “Transcriptome profiling identifies hmga2 as a biomarker of 4996 melanoma progression and prognosis,” Journal of Investigative Dermatology, vol. 133, no. 11, pp. 2585–2592, 2013.

CrossRef Google Scholar

28.

O. Kabbarah, C. Nogueira, B. Feng, R. M. Nazarian, M. Bosenberg, M. Wu, K. L. Scott, L. N. Kwong, Y. Xiao, C. Cordon-Cardo, et al., “Integrative genome comparison of primary and metastatic melanomas,” PloS one, vol. 5, no. 5, p. e10770, 2010.

CrossRef Google Scholar

29.

S. Sood, I. J. Gallagher, K. Lunnon, E. Rullman, A. Keohane, H. Crossland, B. E. Phillips, T. Cederholm, T. Jensen, L. J. van Loon, et al., “A novel multi-tissue rna diagnostic of healthy ageing relates to cognitive health status,” Genome biology, vol. 16, pp. 1–17, 2015.

CrossRef Google Scholar

30.

M. Narayanan, J. L. Huynh, K. Wang, X. Yang, S. Yoo, J. McElwee, B. Zhang, C. Zhang, J. R. Lamb, T. Xie, et al., “Common dysregulation network in the human prefrontal cortex underlies two neurodegenerative diseases,” Molecular systems biology, vol. 10, no. 7, p. 743, 2014.

CrossRef Google Scholar

References is not available for this document.