I. Introduction
Malware is typically created by cybercriminals with malicious intent, who exploit vulnerabilities in operating systems, applications, or human behavior to gain unauthorized access or carry out malicious activities. Throughout the digital age, numerous computing devices have fallen victim to malware infections. Malicious software, commonly referred to as malware, is purposefully developed to achieve the harmful objectives of malicious attackers. Malware infiltrates networks, wreaks havoc on critical infrastructure, compromises computers and smart devices, and pilfers sensitive data [1]. Even mobile devices are not immune to the perils of malware, as a single erroneous click can unleash harmful software, transforming a computer into an unwitting participant in a botnet [2]. The injection of malicious code, or malware, into benign software programs stands as the most prevalent and potent cyberattack method employed to breach targeted computer systems and carry out malicious activities, such as the theft of confidential data or the deployment of ransomware, which subsequently be used to launch attacks on other computers. Although all operating systems (OS), including Windows, Linux, and macOS, are susceptible to similar attacks and can be targeted through various file formats like Portable Executables (PE), Executable and Linkable Format (ELF), and Portable Document Format (PDF) [3], malware typically exhibits characteristic symbolic signatures. Consequently, commercial antivirus software predominantly relies on signature-based algorithms to identify malware infections. However, this approach, unfortunately, falls short in detecting newly developed malware, as perpetrators employ tactics like bundling and obfuscation to undermine the reliability of such analysis. Despite the creative strategies employed by attackers, a host operating system remains a prerequisite for malicious programs to execute and carry out their destructive acts [6]. Constructing robust security systems to counter known and undiscovered malware assaults is imperative. Nevertheless, the multiplicity of malware samples and families poses a significant challenge in providing rapid and real-time automated responses to combat new malware threats. Thus, the integration of modern artificial intelligence (AI) methods, such as deep learning (DL), which have demonstrated efficacy in various applications, becomes imperative. Furthermore, behavioral techniques incorporating probabilistic reasoning offer diverse solutions to intricate problems [7]. This optimization process encompasses enabling approaches such as early halting classifiers, exploring alternative topologies and activation functions, and selecting different cohorts and epochs [9]. This study also presents a novel approach to malware detection and classification based on machine learning. By examining the efforts made in malware detection between 2022 and 2023, the study investigates the pertinent issues through a comprehensive literature review and the utilization of machine learning in malware detection. The estimated number of daily detected malware samples in 2023 is approximately 560,000, which signifies a significant increase compared to the previous year. This surge reflects the escalating sophistication and complexity of malware threats. Moreover, a substantial percentage of devices are vulnerable to malware attacks, with estimates suggesting that up to 90% of devices are at risk. This vulnerability is influenced by various factors, including the widespread usage of mobile devices, the proliferation of cloud computing, and the increasing interconnectedness of devices [43]. Looking ahead to 2025, the number of daily detected malware samples is projected to reach around 1 million, indicating a noteworthy increase from the current estimate of 560,000. Furthermore, the percentage of devices vulnerable to malware attacks is expected to rise, with estimates indicating that up to 95% of devices could be at risk by 2025. Shifting our focus to 2030, the number of daily detected malware samples is estimated to experience a substantial surge, reaching around 2 million, which represents a significant escalation from the current estimate of 560,000. Additionally, the percentage of devices vulnerable to malware attacks is predicted to increase, with some estimates suggesting that up to 97% of devices could be at risk by 2030 [45]. The global cost of cybercrime is projected to reach 8 trillion in 2023. This substantial figure emphasizes the significant financial impact of cyber threats on a global scale. Additionally, the estimated number of daily reported malware attacks worldwide in 2023 is approximately 560,000. This represents a considerable increase compared to the previous year, underscoring the escalating sophistication and complexity of malware threats [44]. According to Cybersecurity Ventures, the cost of cybercrime is expected to further rise to 10.5 trillion annually by 2025. This significant increase surpasses the previously estimated 6 trillion for 2021. Furthermore, the number of malware attacks reported globally is also anticipated to rise in the coming years. Cybersecurity Ventures estimates that the number of malware attacks will reach 1 million per day in 2025. Looking ahead to 2030, the cost of cybercrime is projected to escalate to 25 trillion annually, as estimated by Cybersecurity Ventures. This represents a substantial increase from the previously estimated 10.5 trillion for 2025. Additionally, the number of malware attacks reported globally is expected to continue rising. Cybersecurity Ventures estimates that the number of malware attacks will reach 2 million per day in 2030 [45]. The paper’s primary contributions and findings can be summarized as follows: an introduction that elucidates the operation of machine learning, its applications, and the impact of security considerations on its deployment; the PRMISA method for selecting papers, literature, and data sources; and a thorough literature review. Through our analysis, we demonstrate the volatility of malware by investigating the technical literature that identifies vulnerabilities in malware and the risks these vulnerabilities pose to users.