Understanding Flaky Tests Through Linguistic Diversity: A Cross-Language and Comparative Machine Learning Study | IEEE Journals & Magazine | IEEE Xplore

Understanding Flaky Tests Through Linguistic Diversity: A Cross-Language and Comparative Machine Learning Study


This graphical abstract provides a concise visual summary of the research on flaky test detection using machine learning classifiers as well as cross-language evaluation,...

Abstract:

Software development is significantly impeded by flaky tests, which intermittently pass or fail without requiring code modifications, resulting in a decline in confidence...Show More

Abstract:

Software development is significantly impeded by flaky tests, which intermittently pass or fail without requiring code modifications, resulting in a decline in confidence in automated testing frameworks. Code smells (i.e., test case or production code) are the primary cause of test flakiness. In order to ascertain the prevalence of test smells, researchers and practitioners have examined numerous programming languages. However, one isolated experiment was conducted, which focused solely on one programming language. Across a variety of programming languages, such as Java, Python, C++, Go, and JavaScript, this study examines the predictive accuracy of a variety of machine learning classifiers in identifying flaky tests. We compare the performance of classifiers such as Random Forest, Decision Tree, Naive Bayes, Support Vector Machine, and Logistic Regression in both single-language and cross-language settings. In order to ascertain the impact of linguistic diversity on the flakiness of test cases, models were trained on a single language and subsequently tested on a variety of languages. The following key findings indicate that Random Forest and Logistic Regression consistently outperform other classifiers in terms of accuracy, adaptability, and generalizability, particularly in cross-language environments. Additionally, the investigation contrasts our findings with those of previous research, exhibiting enhanced precision and accuracy in the identification of flaky tests as a result of meticulous classifier selection. We conducted a thorough statistical analysis, which included t-tests, to assess the importance of classifier performance differences in terms of accuracy and F1-score across a variety of programming languages. This analysis emphasizes the substantial discrepancies between classifiers and their effectiveness in detecting flaky tests. The datasets and experiment code utilized in this study are accessible through an open source GitHub repository to facilit...
This graphical abstract provides a concise visual summary of the research on flaky test detection using machine learning classifiers as well as cross-language evaluation,...
Published in: IEEE Access ( Volume: 13)
Page(s): 54561 - 54584
Date of Publication: 24 March 2025
Electronic ISSN: 2169-3536

Funding Agency:


References

References is not available for this document.