A SLR on Customer Dropout Prediction

Dropout prediction is a problem that is being addressed with machine learning algorithms; thus, appropriate approaches to address the dropout rate are needed. The selection of an algorithm to predict the dropout rate is only one problem to be addressed. Other aspects should also be considered, such as which features should be selected and how to measure accuracy while considering whether the features are appropriate according to the business context in which they are employed. To solve these questions, the goal of this paper is to develop a systematic literature review to evaluate the development of existing studies and to predict the dropout rate in contractual settings using machine learning to identify current trends and research opportunities. The results of this study identify trends in the use of machine learning algorithms in different business areas and in the adoption of machine learning algorithms, including which metrics are being adopted and what features are being applied. Finally, some research opportunities and gaps that could be explored in future research are presented.


I. INTRODUCTION
Customer analysis is fundamental to developing business and marketing intelligence [1], thereby supporting understandings of historical data and identifying trends and patterns [2]. This process is also known as data mining, i.e., the extraction of knowledge from data [3], and fitting models are required to determine patterns in observed data [4] to provide a means to answer business questions that have been traditionally time-consuming to solve [5]. Usually, techniques such as statistics, machine learning, pattern recognition, database and data warehouse systems, information retrieval, visualization, algorithms, and high-performance computing [6] are employed to extract meaningful and useful information or patterns through automated or semiautomated methods [2], [3], [7]. The information that many organizations accumulate can support managers in making decisions by providing more insights about customers. The developmental analysis of existing data in organizations allows accurate targeting of customers [8] and a better understanding of how the loyalty The associate editor coordinating the review of this manuscript and approving it for publication was Gianluigi Ciocca . mechanisms work to predict the customer intention to drop out [9]. These perspectives indicate that companies are realizing that their customer database is the most valuable asset that possess [10], [11]. This realization provides opportunities for organizations to explore their existing data to gain a competitive advantage.
The understanding of customer data allows customers to be retained through the exploration of existing information. According to Glady et al. [12], churn is a marketing-related term that represents a consumer who is switching from one company to a competitor in the near future, and, according to Verbeke et al. [13], is a management science problem that adopts a data mining approach to try to solve the problem related to the lower costs for retaining customers versus the costs of attracting new customers [14]. This approach requires determining which customers have a higher propensity to attrite [15], but the reasons for dropping out could be attributed to different events, Berry and Linoff [2] thus divide customer dropout in voluntary and involuntary categories; voluntary dropout represents a decision by the customer to end the relationship, while involuntary dropout occurs when the company ends the customer relationship because the VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ customer does not fulfil their obligations, via breaches such as lack of payment or abuse of service. This method creates two scenarios leading to an end event that is referred to as dropout. Another factor referred to by Neslin et al. [16] is that the end of the relationship occurs in a given time period. The identification of dropout can be developed in different contexts. There are two main scenarios, as follows [17], [18]: (1) contractual settings, where customers pay a fee, such as a subscription, and the customer informs the company that they are ending the relationship, and (2) noncontractual settings, for example, a customer buying a book, where a firm has to infer if the customer is still active (did not drop out). The contractual business concept implies that usage and retention are interconnected processes; i.e., customers need to renew their contracts/memberships/subscriptions to continue that usage [19]. The main characteristic of a contractual setting is the interaction in which the customer ends the relationship [20]. Dropout prediction requires proper approaches that consider the environment where it is developed. In contractual setting scenarios, such as insurance, telecommunications, and magazine subscriptions, firms can accurately understand the cash flow generated by their customers, as customers usually sign long-term contracts with firms [21]. The customer must choose whether to opt-in or to opt-out [22], i.e., (1) customers will choose to opt-in if they want to enter into a contract with a particular form (e.g., renewal form) or (2) customers will choose to opt-out if they prefer not to renew.
Depending on the contractual relationships and the knowledge of the exact time of churn, contracts are not fully applicable in the context of catalog retailers or other non-membership businesses [23]; moreover, contractual settings provide subscription records and usage logs [24].
By anticipating possible churners, companies can develop countermeasures, such as concessions, to retain as many customers as possible [25] and develop predictions regarding future intentions to facilitate their segmentation to support actions according to their likelihood of dropping out. The prevention of customer dropout is a problem that needs to be addressed that, according to [12], requires a retention action if the company wants to avoid dropout. According to Amin et al. [26], this problem impacts organizational performance, causes reduced sales, allows competitors to gain new customers, increases the cost of attracting new customers in relation to the lower cost of retaining them, and is a risk to the company image with the loss of the market and customer base. The quantification of dropout could be measured as a percentage rate that represents the customers ending a subscription or terminating the business relation [27] or as a churn probability calculated for each customer using historical data to predict their future behaviors [28] to calculate the customers who have already churned or to determine the churn risk.
In a contractual setting scenario, customer churn is more damaging than just selling less product, meaning that there is a well-defined termination of a relationship [29] that should be properly addressed. Companies are more profitable when they retain more customers due to the lower marketing costs and improved sales [26]. If an organization can predict a possible dropout and develop countermeasures to avoid desertion, they can avoid customer defections that lead to a loss of money. Reichheld [30] evidenced that reducing dropout rates by 5% (e.g., from 15% per year to 10% per year) could represent an increase in profits up to double, as acquiring new customers costs 5 to 6 times more than retaining existing customers [31]. Existing organizations are addressing this problem by shifting their target from capturing new customers to preserving existing customers [9], as investments in retention strategies have higher returns than acquisitions [32]. The importance of customer retention to maintain organizational profitability [33] causes the problem of how to quantify the financial impact of customer retention actions under the assumption that the organization goal should be related to an increase in the lifetime of the customer to increase their profits. The customer lifetime value (CLV) allows us to measure this impact, as it represents a monetary value resulting from the duration of a relationship with a customer [16]. Targeting existing customers according to their risk of dropout to increase lifetime expectancy should be addressed considering the likelihood of churn [32] and not wasting incentives on customers who are not profitable in the development of retention actions [16]. CLV supports the determination of a budget for marketing decisions and strategic and tactical decisions in companies [34]. Using this assumption, Verbeke et al. [13] suggested calculating the maximum profit that can be generated by including the optimal fraction of customers with the highest predicted probabilities to attrite in retention campaigns to optimize the process. The problem is addressed in different organizations, such as the publishing, financial services, insurance, electric utilities, health care, banking, internet, telephone, and cable service industries [16]. The objective is to maintain a relationship with customers. Valuable relationships must be retained and reinforced by building strong customer defection-avoiding schemes [9].
Predictive learning could be adopted to anticipate customer dropout; however, after the emergence of the fields of artificial intelligence (machine learning) [35], machine learning is considered a modern extension of predictive analytics [36], understood as an automated process to extract patterns from the data [37] generalizing from the examples in the training set [38]. More recently, deep learning has emerged, which is considered a part of machine learning [39], mimics the behavior of the structure of the human brain [40] and relies on machine learning algorithms that model nonlinear high-level abstractions [39]. Machine learning encompasses deep learning and is understood to be a consequence of predictive analytics.
Machine learning extracts patterns from the data [37] by generalizing from the examples in the training set [38]. Machine learning could be utilized to extract knowledge to understand dropout for the development of effective retention strategies [41] and to allow the discovery of patterns to support the hypothesis for addressing existing problems. Machine learning is applied to develop churn prediction models that generalize the relationship between churn behavior and historical data to produce predictions about the future behaviors of a company's customers, which is influenced by the input data and algorithm selected to model [42]. Using historical data, a model could be trained to classify a future dropout or nondropout. This perspective could be realized by predicting two possible events, either dropout or nondropout, which allow us to identify ways to induce customers to stay [16] by estimating the probability of customer churning in an given eperiod of time [33]. Machine learning can be used to develop retention strategies based on existing data [41] by extracting patterns from data [37] that support the development of counteractions before an event occurs.
Wai-Ho Au et al. [43] developed an approach to identify several rules to reduce customer dropout, such as adding bonus schemes, incentives to service renovation or supporting the exploration of existing situations to understand behavioral patterns. The idea is to prolong the life expectancy of customers relationships and to develop strategies that create value supported in the implementation of commercial actions in their life cycle [9]. Ng and Liu [44] suggested that deviations from normal patterns of behaviors are strong signs of likely defection, where deviation analysis is performed to detect customers who are at risk of dropout. There are several algorithms that can be adopted [42]: logistic regression, artificial neural networks, survival analysis, Markov chains, support vector machines, generalized additive models, decision trees, naive Bayes classifiers, K-nearest neighbor classifiers, random forests, cost-sensitive classifiers and evolutionary algorithms. In telecommunications, algorithms such as decision trees, neural networks, support vector machines, Bayesian belief networks, and regression, have been commonly employed [45].
Classification algorithms should fulfill the following two requirements: classification performance and higher interpretability. [46]. Some algorithms, such as neural network black box models [47] or support vector machines [48], lack of interpretability. Higher interpretability allows us to solve the existing problem of a lack of causality [15], where the principle of Occam's Razor, is usually interpreted to be simpler. Decision trees are very popular because of their interpretability [48], [49] and conceptual transparency [45]. Interpretability is an important aspect for the selection of an algorithm to classify dropout, as it allows the marketing department to extract valuable information from a model to design effective retention campaigns and strategies [50].
Risselada et al. [29] analyzed the performance of predicting models in several periods of time after the estimation period and identified a substantial decrease in the predictive quality. The following questions arise: what is the accuracy when considering a temporal factor, and do churn patterns exist when considering time-changing parameters? Survival analysis, or more generally, time-to-event analysis, allows us to consider a dynamic perspective considering the duration of the relationship with the customers. Survival Analysis consists of a set of methods to describe the probability of surviving past a specified time point, or more generally, the probability that the event of interest has not yet occurred by this time point [51], and attempts to address the timings related to customer dropout. Survival analysis is a class of statistical methods that modelling the occurrence and timing of events, e.g., customer attrition aiming to establish descriptive or predictive models in which the risk of an event depends on covariates [52]. More recent machine learning algorithms for survival analysis provide additional advantages that surpass the existing limitations, such as Cox proportional hazard assumptions [53], and allow the use of time-dependent variables.
Increasing retention contributes to improving the lifetime of a customer, which requires a set of techniques that can be applied to learn how to model the relationship between a set of descriptive variables and a target variable using a set of historical examples [37]. Existing studies mainly address nontime-varying variables [54] and propose different time periods to predict churn. The approaches employed to develop dropout prediction should simultaneously consider interpretability [49], classification performance [46], and financial objectives related to business profitability to optimize the customer lifetime value [34].
To address these perspectives and to increase the understanding related to predicting dropout in the context of contractual settings, the prediction should be developed to simultaneously consider the dropout prediction accuracy, the timing when this dropout occurs, and its interpretability to support better addressing of customer churn. However, at this point, we lack an overview of the research related to the use of machine learning algorithms, to target the dropouts and address these issues.
Our main research question is given as follows: what is the current state of machine learning research studies on predicting dropout in contractual settings? This research analyzes the state-of-the-art techniques and identifies machine learning studies to predict customer dropout. This study was developed under a systematic literature review methodology of following the guidelines of Kitchenham and Charters [55].
Our paper is organized as follows: In the next section, we address our research methodology, considering the research questions, and the search strategy applied to pursue our research objectives. The results section presents the research results according to the previously identified questions, and the validity threats of how to mitigate research threats in software engineering. Finally, some open questions, conclusions and future work are presented.

II. RESEARCH METHODOLOGY
According to Fink [56], systematic literature review (SLR) is a systematic, explicit, and reproducible method for identifying, evaluating, and synthesizing the existing body of completed and recorded work produced by researchers, scholars, and practitioners. The current research presents a SLR to provide guidance to researchers planning future studies and summaries of the literature on a particular issue [57].
The importance of understanding customer dropout and the diversity of employed algorithms requires an understanding of trends and existing problems to create a ground base of knowledge. To development of the systematic literature review, the methodology applied by Kitchenham & Charters [55] was adopted and developed in three stages, namely, planning, conducting and reporting, as described in Figure 1. The planning stage involves defining the research need, identifying the research questions and developing the review protocol. The conducting stage involves research identification, study selection, quality assessment, data extraction, and data synthesis. The last stage develops one activity, i.e., the report review.

A. SEARCH QUESTIONS
The first question addressed in this study, which is presented in the introduction, is presented as follows: what is the current state of machine learning research studies on predicting customer dropout in contractual settings? Based on this research question, the following questions were identified to determine the main aspects: related to customer dropout with contractual settings: • RQ1: What is the current state of the research being developed?
• RQ2: What algorithms have been employed to predict dropout?
• RQ3: Which features are used to predict dropout? • RQ4: When does dropout occur? • RQ5: How is machine learning algorithm dropout prediction accuracy measured? RQ1 aims to identify what studies have been published thus far in the area. This will allow the business area to be researched to be understood. The second and third research questions (RQ2 and RQ3, respectively) aim to identify which different types of machine learning algorithms are applied to predict dropout (RQ2) to deepen the knowledge of the approaches that are being utilized to address dropout. RQ3 explores the features that are selected to predict dropout. The research question RQ4 intends to understand if the timing related to customer dropout is considered; this question explores, for example, if any study explores the advantage of using survival analysis, as considering that allows us to examine not only if an event occurs but also how long it took to occur [58]. This perspective entails understanding the dropout event in the potential research to be explored and also the timing. Finally, RQ5 aims to identify how the accuracy of machine learning algorithms in predicting dropout is measured. The goal of dropout prediction is to develop a classification task to identify an optimal functional mapping between the input data X (describing the input pattern) with the class label Y to accurately predict the label of the unseen data, such as Y=f(X) [59]. The prediction should consider characteristics of the data such as class imbalance [60].

B. SEARCH STRATEGY
This phase requires the identification of the search strategy to identify advances in the research of machine learning algorithms to predict dropout. The authors used the Petticrew and Roberts [57] population, intervention, comparison, outcomes and context (PICOC) method to define the research string, recommended by Kitchenham and Charters [55], as illustrated in Table 1. The keywords identified were machine learning, customer dropout, and contractual. Other alternative keywords, such as churn and membership, were also adopted considering that Verbeke et al. [50] used the term churn to define the end of a relationship with an organization and Ascarza [18] used membership to represent a relationship with a professional group. The adopted search criteria were (('customer dropout') OR ('customer churn')) AND 'machine learning' AND ('contractual' OR 'membership'), which were applied to the title, abstract, and keywords in the search period between January 2000 and June 2020. The search, which identified 448 articles, was performed on the following databases: SpringerLink, Scopus, Science@Direct, ISI Web of Science, IEEE Digital Library, and ACM Digital Library ( Table 2). The results from each database were exported to a bibtex file and merged into a single bibtex file, and the bib2df package [61] was used to parse the bibtex fields and collect the metadata.
The following inclusion and exclusion criteria were applied for the study selection: 1) Inclusion: a) From 2000 to 2020 b) Studies peer-reviewed 2) Exclusion: a) Books b) Non-English articles c) patents d) thesis The article filtering process is summarized in Fig. 2. The articles from the different sources were combined, and 28 duplicate articles were removed, which produced a total of 420 articles. Additionally, 5 entries were removed due to the exclusion criteria of books, and 13 entries were removed because of incomplete information (title and DOI). After this process was completed, 402 papers were identified.
Considering the time involved in the protocol development, searching and retrieval of references, paper screening, data   extraction and quality assessment, data entry, and tabulation, which according to Allen & Olkin [62] corresponds to approximately 51% of the time involved, we adopted a machine learning system that can estimate the probability that a document should be included and automatically rank the documents from most relevant to least relevant. This process allowed the human reviewer to identify the studies to include earlier in the screening process [63]. Following this assumption, we employed ASReview [64], which also allows us to identify 95% of the eligible studies after screening between 8% of the studies and 33% of the studies [64]. The naive Bayes classifier was adopted using the following steps in ASReview: 1) Select five relevant and irrelevant articles; 2) ASReview orders the publications in such a way that the most relevant publications are listed first, simplifying the analysis of the abstracts; 3) All articles were screened, although the last 178 articles were not determined to be relevant articles. After the screening process, the final dataset included 87 articles (Figure 3), excluding 311 articles that did not fit the eligibility criteria and that did not address machine learning customer dropout with contractual settings after analyzing the abstract and 4 articles for which we did not have full text access to. The analysis was carried out with two approaches. The first focused on the bibliometric analysis and overall trends of the articles using natural language VOLUME 10, 2022 processing applied to the extracted data from the PDF [65]. The second approach developed the qualitative analysis of the articles, providing a greater transparency [66] and systematization and synthesis of the knowledge [57]. The data extraction process was developed at the same time that the papers were reviewed to identify the dropout domain, the type of organization (e.g., airline company, insurance company or telecommunication) and dropout prediction techniques (e.g., decision trees, logistic regression, or support vector machine). The domain identification was based on the article business area description and the context of the dataset (i.e., business area) used to predict the dropout.

III. RESULTS
This section presents the analysis that emerged from the systematic literature review to address the research questions, which provides the results from the bibliometric analysis and the quality scores and a qualitative analysis of the content of the articles.
The final list of the selected articles by source included 55 articles from Scopus, 1 article from IEEE, 78 articles from SpringerLink, 14 articles from Science Direct, and 1 article from ACM. To obtain an overall understanding of the existing studies, we developed a classification to identify the area of the study, which is represented in Table 3. In this research question, we present the answer to the question regarding the research being developed. Figure 4 shows the evolution of the research using machine learning to predict dropout since 2000. Between 2000 and 2008, only four studies were published: three studies were published in the telecommunications sector [5], [43], [47] and one study was published in the media business sector [89]. The number of publications started to increase in 2008, initially targeting the media, financial and telecom areas [86], [95], [102], [118]. The research growth in the prediction of dropout, namely, with contractual settings, focused on the business telecom, financial and media areas.
The dropout research mainly addresses telecommunications, which corresponds to approximately 44% of the publications, followed by the financial area (representing 18%); the remaining research focuses on the areas of software (3%), retail (2%) and games (2%), while energy, gambling, hospitality, logistics and security represent only 1% each ( Figure 4). There are overlapping areas that target several business areas, such as the financial and telecom industries (6%) [9], [29], [46], [90], [122], financial and retail industries(3%) [68], [74], [97], financial, telecom and retail industries (1%) [42], and telecom and media industries (1%) [18]. These studies developed an analysis targeting more than one business area as proof of concept of the approach being tested to predict dropout. Certain areas only addressed the subject in 2020, such as logistics [99], hospitality [96] and education [101]; however, the problem was addressed using traditional statistical methods without machine learning approaches [128]. However, research expansion is underway. Figure 5 shows that the majority of these studies are being published in the Journal of Expert Systems with Applications, followed by the European Journal of Operational Research, Knowledge and Information Systems, and Decision Support Systems.

B. RQ2 -WHAT ALGORITHMS HAVE BEEN EMPLOYED TO PREDICT DROPOUT?
After investigating the current state of the research being developed, we address the algorithms utilized to target customer dropout. Sixty different types of algorithms were identified. To simplify the organization of these algorithms, they were organized into the following categories: (1) natureinspired [129], which encompass nature-inspired algorithms, including artificial neural networks, fuzzy systems, evolutionary computing, and swarm intelligence; (2) ensemble, which use multiple classifiers that are built independently and that participate in a voting procedure to obtain a final class prediction, such as random forests, bagging and boosting [50]; (3) decision trees, which are one of the most popular classification techniques [45]; (5) statistical, such as logistic regression, naive Bayes, and Bayesian networks [50]; (6) rule based, which allows us to reduce the amount of information to understandable statistically supported statements [45]; (7) clusters to classify customers [69]; (8) survival analysis methods, which model the occurrence and timing of events [52]; (9) Markov chain [89]; (10) SVM, which classify the data using the maximal margin hyperplane [115]; and (11) other approaches not included in the previous categories.
Other studies adopted different approaches, such as ensemble methods, to address the problem or created new algorithms to employ this idea. Ensemble methods use multiple base classifiers to improve the predictive performance, which as stated by [50], is the idea that underlies its adoption by other studies [93], [120]. [119] considered ensemble methods to be one of the best approaches to model customer churn. The most commonly employed ensemble method was random forest, which overcomes the instability of decision tree subsetting n random predictors to grow trees using bootstrap sample data [95], which provides good predictive performance. The random forest was adopted for the following five main reasons [86], [95]: (1) predictive performance; (2) robustness to outliers and noise; (3) reasonable computing time and (5) ease of implementation. Another ensemble method was bagging (bootstrap + aggregating), which adopts a model averaging approach and boosting using weak learners to iteratively learn and create a strong classifier. The analyzed studies explored the use of boosting using other approaches, such as (1) AdaBoost [77], [97], [99], [106], [116], [121]; (2) gradient boosting [28]; and (3) extreme gradient boosting [90].
The statistical category includes logistic regression. The logistic regression technique is a well-known and simple technique that was referenced by 35 articles. The simplicity and availability of logistic regression and its use as a benchmark against others were identified as advantages in several studies [32], [93]. Some researchers address the problem of tuning the data preparation and analyzing the performance of standard methods, such as logistic regression, against more advanced classifiers, such as bagging, boosting and random forests [28]. Other algorithms, such Bayesian classifier also included certain variants, such as: (1) naive Bayesian classifier [92], [110], [118], (2) naive Bayes tree [108], [109], [117], and (3) Bayesian belief network [101]. The naive Bayes tree combines decision trees and naive Bayes classifiers. The general additive model was also explored, allowing us to relax in the assumptions that considered a nonparametric technique [91], and the general linear model [18] was also explored.
The category rule encompasses several algorithms under the assumption that certain approaches are black box models, such as support vector machines [108]. These approaches allowed us to extract rules, such as association rules, to reduce large amounts of information to small and understandable amounts of information [45].
The cluster category reflects the employment of approaches based on the adoption of unsupervised algorithms to develop the segmentation of customers, to support the use of machine learning algorithms to predict the churn in each segment [5], [25], [112]. Jafari-Marandi et al. [121] explored a similar approach that combine clustering methods parallel to classification methods with the aim of creating more control in the decision-making process of churn management, but at the least, expect to increase the accuracy by exploring the individuality of each customer to optimize the classification decision process. Ullah et al. [107] also employed clustering by employing fuzzy c-means, possibility c-means and possibility fuzzy c-means. Other studies applied k-means [106], [107], local outlier factors [107], and cluster based local outlier factors [107] for classification. Support vector machines are algorithms that attempt to obtain an optimal hyperplane to maximize the margin between positive examples and negative examples [32]. When linear transformation is not possible, support vector machines supports the transformation of the input space via nonlinear mapping into a higher dimensional space using a kernel function [95] to transform to a higher dimensional space. Verbeke et al. [41] suggested its use considering the higher performance due to its ability to capture nonlinearities. Several studies employed this technique by selecting different kernel functions, such as the radial basis function [95]. In certain cases, the survival approach was adopted to model the dropout occurrence and timing of the events. The idea was clarified by Burez and Vandenpoel [86] by identifying a static approach, where churn behavior was predicted at a moment in time using random forest and dynamic approaches over a period of time. The dynamic approach was explored in several studies identified in Table 4 with survival analysis. Routh et al. [96] utilized a random survival forest to explore the advantage of an algorithm based on random forests. The Markov concept explores a random process that is independent of the past state considering the state in a previous event.
Predictions are established based on the present state. The algorithms selected include the Markov chain and Markov logic network. Some articles proposed different approaches, such as multivariate time series, transfer learning, random walk, active based learning, social networks and fuzzy classifiers. The exploration of these different approaches started in 2011 with Verbeke et al. [41], where active learning was employed in one study, followed by transfer learning in 2015 [120]. More recently, in 2017, fuzzy classifiers [77] were applied, and random walk [75] and multivariate time series [15], [99] were utilized in studies published in 2020.
The variety and exploration of different approaches to address dropout accuracy has led to the integration of several algorithms to increase performance. Vijaya and Sivasankar [112] suggested that works that adopt hybrid models combining more than one classifier can achieve increased performance compared with those using single classifiers. This idea is not new, and some studies have explored the combination of clusters with churn prediction [5], [25], [111], where if the customers are grouped into clusters, the prediction accuracy can be improved within each cluster. The hybrid approach using clustering and classification, which segments the customers before developing a classification, could be effective [121]. An approach that is different from ensemble methods uses several models that are built independently before developing a final step that combine different models to create a final class prediction; the approach applies a pipeline concept, where the output of one algorithm server as the input for another algorithm [108].
Generally, the majority of the algorithms identified decision trees, logistic regression, random forest, support vector machines, and neural networks. The interpretability and simplicity of the first three approaches justifies its wider adoption. The variety of the adopted approaches also denotes an increasing use of hybrid models, where several approaches are combined to improve the performance accuracy.

C. RQ3 -WHAT ARE THE FEATURES USED TO PREDICT DROPOUT?
The aim of this question is to explore which features are being employed to predict dropout. The main approach in all the addressed studies is the use of demographic data and behavioral data related to the use of the service that is being purchased (Figure 6). Some researchers improved this perspective. Risselada et al. [29] utilized customer characteristics to represent variables such demographic information, socioeconomic information(income) and commitment and the relationship to the length (time in the relationship with the company), breadth (usage of other services), and depth (type of service purchased). Gür Ali and Arıtürk [54] proposed the use of new types of data, such as customer interactions with the organization, using a survey to address service quality, dropout and economic indicators reflecting external economic indicators that could influence dropout. The usage of external economic factors is unique in the research analyzed. Ballings et al. [81] explored the use of pictorial data to represent feelings and mindset, which represent variables that cannot be extracted from internal processes within the organization, such as purchase intentions, interests, satisfaction, and emotions. Benedek et al. [83] employed network topological properties to represent whether the calls were made to customers within the same telecommunication service or to a customer using another telecommunication service. Kaya et al. [126] adopted spatio-temporal data to explore the time and location as behavioral data. Moeyersoms and Martens [82] is the only study that uses high cardinality data, such as bank account numbers, family names and ZIP codes to predict customer dropout. Other studies addressed the problem using Recency, Frequency, and Monetary (RFM) model generated variables [32], [80], [93]. Al-Molhem et al. [68] conducted social network analysis to enhance the results of churn prediction models in the telecom domain using call detail records to construct a weighed graph representing the distance between two subscribers to calculate the centrality. The usage of marketing related variables, such as promotions offered to a customer, calls developed in a retention strategy, and helpdesk interactions, were applied by Verbeke et al. [50] However, certain studies did not identify the features employed [9], [12], [18], [22], [25], [42], [48], [67], [70], [73], [85], [87], [88], [97]- [99], [107], [110], [112], [113], [116], [119], [122], [127], which in some cases are related to the usage of more than one database. Idris et al. [119] used two databases (orange and cell2cell). Burez and Van den Poel [88] used data from two banks with 117808 and 102279 customers, a mobile dataset (n=100205), newspaper (n=122189) and PayTV (143198). De Bock and Van den Poel [97] explored two datasets (bank1=23562 and bank2=42783), which made them difficult to describe in a research paper.
The selection of the features and how they are used are fundamental to dropout prediction. Feature selection is considered an important and critical step [77], where a previously established process exists for extracting data, such as the Cross Industry Standard Process for Data Mining (CRISP-DM), which develops six distinct stages, i.e., business understanding, data understanding, data preprocessing, modeling, evaluation, and deployment [28]. This underlies the assumption of initially addressing aspects related to the preparation of the data. The existing data analysis techniques use the extracted patterns to make churn decisions for all customers depending on the quality of the learning using the training set [121]. Coussement et al. [28] suggested that optimal optimization in data preparation provides competitive results against classifiers such as neural networks or ensemble methods (e.g., random forests), where optimization was achieved using decision-tree-based remapping for categorical variables, equal frequency binning for continuous variables, and weight-of-evidence conversion as the representation method, suggesting the following techniques for the preparation of the data: 1) Transformation of categorical and continuous variables in values; 2) Remapping categorical variables; 3) Discretization(binning) continuous variables; 4) Dummy encoding; 5) Weight of evidence for calculating the strength of a category to separate the churners proportionchurners proportionnon−churners in a category; and 6) Variable selection using heuristics, such as subsetting variables with higher correlation with dependent variable and low intercorrelation among independent variables. Different types of features and approaches were adopted to predict customer dropout, such as demographic information, behavioral data and relationship length. Certain researchers have applied external variables to improve the prediction accuracy, such as economic indicators or high cardinality data. However, some studies did not identify which features were utilized, which in most cases was related to the use of more than one dataset in the study. The process related to the preparation of the dataset to be applied in the model prediction is fundamental, and certain studies employed several machine learning algorithms to prepare the data before the prediction model was actually developed.

D. RQ4 -WHEN DOES DROPOUT OCCUR?
Traditional dropout analysis is focused on a dependent variable (churn or not-churn), and different approaches are needed to understand when dropout occurs. Generally, a static perspective of the dropout problem is considered, discarding a temporal perspective of the problem. Few studies address this problem or do not consider dropout prediction as a static prediction, which should be considered. The models should be adapted regularly [29], or this problem should be addressed by considering ''time'' as an important feature to consider. The problem in the assumption lies in the dynamic behavior of the customer, which assumes that decisions change over time and that a customer with the intention to drop out at the end of the month may not engage in this behavior at the end of the month [15]. To account for time to churn, survival models have been developed [96]. These models solve important limitations in traditional methods, such as regressions, which would be appropriate only when all customers end the relation, do not allow us to predict when it will occur and do not consider censored data (observations with incomplete information about churn time) to capture the temporal dimension of the churn prediction challenge [92].
Perianez et al. [92] approached this problem using survival analysis, with the goal to determine not only if a customer will dropout but also when dropout would occur. Burez and Vandenpoel [86] addressed this problem using a survival distribution to understand when customers stop a subscription, which provides an additional perspective regarding behavioral data. The duration of the usage of a service could be measured in the days (or another time unit) between the contract starting date and the contract termination date [47]. This idea is also explored by Liu et al. [75], who analyzed the duration of a player game at a specific timestamp determined by their interaction within a subscription model for a player platform by identifying the retention rate of mobile games as a function of time. Other studies attempted to address this issue by using multivariate time series [15], convolutional neural networks and recurrent neural networks to learn the representative features of customers, especially focusing on daily level dynamic changes; but they lacked the ability to target customers according to their churning causes [15]. The availability of behavioral data (such as those represented in the time series) allows us to improve the model performance [82] and complement the information existing in the form of structured data (e.g., sociodemographic data). Gök et al. [111] did not explore the timings of the dropouts but tried to use the information available to create clusters from time series to feed the algorithm to predict churn in a second phase as customers behave in the form of a time series combined with the available features.
The timings involved in dropout prediction should consider the periods related to the estimation period; as stated by Risselada et al. [29], the predictive performance deteriorates considerably within a few periods after the estimation period because the parameter estimates change over time and the significant variables differ between two periods. Xiao et al. [120] suggested that machine learning models consider the training dataset and that the test dataset is subject to the same distribution, which is not satisfied in many situations, considering that changes occur as time progresses. The data representing the behaviors of the customers should be considered, which allows improvement of the model performance if increased [131].
The duration of the relationship until dropout allows for the evaluation of the tenure relationship between the customer and the organization, indicating lower dropout risks in longer relationships and higher risks in the first 18 months of the relationship [40]. This pattern was also identified by Burez and Vandenpoel [86] regarding churning after a year of subscription, where one out of three customers leave the company within one year, and half of the customers leave the company within two years. The timestamp gives an additional perspective of the dropout risk, allowing us to identify when it occurs and when to consider retention strategies considering the timing of the events. This approach enhances the understanding of the importance of the duration of the relationship between the customer and the organization as an important feature for predicting dropout, which was considered in fewer studies using survival analysis [54], [86], [92], [96].
However, as previously mentioned, few studies have addressed dropout considering a temporal perspective of the problem. The timing of the dropout is an important feature, which allows us to consider that the decision of the customers to make dropout changes over time, where the survival models allow us to solve the problem of traditional approaches to predict dropout or nondropout. The temporal perspective also allows us to identify when retention actions should be enacted to reduce dropout, providing indicators that should be explored to complement existing indicators that represent a research area opportunity.

E. RQ5 -HOW IS MACHINE LEARNING ALGORITHM DROPOUT PREDICTION ACCURACY MEASURED?
One important aim of this systematic review is to identify trends that can allow researchers to understand the types of metrics that are being employed to analyze the performance of machine learning algorithms. In this research, several metrics were identified. The literature shows the application of different cost-sensitive performance metrics, such as the area under the receiver operating curve (AUC), sensitivity, specificity, recall, precision, and F-score [121], considering that in the case of churn prediction, false negatives are five times as undesirable as false positives. The most commonly employed indicator to measure the performance was accuracy, which was utilized in 39 studies, followed by roc auc in 36 and sensitivity in 36, lift in 26, precision in 16, specificity in 9, and f-measure in 13.
The lift allows us to calculate the churn rate by group, and the top decile focuses on the measurement of the 10% of cases that are more likely to churn [8], [95], [123], [125], [127], while the profit approach in the uplift expands this approach to achieve the maximization amount of profit [16], [33], [50], thus providing a cost benefit perspective. The top-decile lift supports the development of more proactive actions to retain customers at risk of churn [95], [123].
The problem was formalized by Devriendt et al. [33] using the following four categories regarding churn: (1) customers who would never churn; (2) customers who churn independently of the retention actions; (3) customers who do not churn because of a retention campaign; and (4) customers who will churn if exposed to a retention campaign.
It is considered positive if approaches are developed to retain the customers, but the following problems should be considered: (1) customers who have a greater risk of dropout should be targeted to provide a base for a better ROI in the retention strategies [95], [123] and (2) retention strategies should be developed focusing on customers with higher satisfaction, or its inclusion could be a reminder of the contractual agreement nearing an end and leading to churn [33]. Although using uplift models seems to be a good strategy, they should also consider factors other than risk and customer satisfaction, as not in consideration them could be counterproductive, which means that they should be removed from retention strategies. Devriendt et al. [33] suggested that the use of uplift models could outperform predictive models and contribute to a greater profitability in the development of retention campaigns to reduce dropout. These approaches provide important benchmarking metrics by considering not only the model accuracy but also the financial performance to maximize the retention strategies. Better approaches should be provided to increase the return of investment in marketing campaigns [33], where the business objective is to reduce customer churn and customers who are about to churn but cannot be retained should be excluded from the campaign, as targeting them will be a waste of scarce resources. However, customers with a higher risk of churning may not be the best targets, as suggested. by Ascarza [18]. Investments in retention strategies are investments that should be developed to distinguish churners susceptible to marketing actions from those that will leave anyway [28].

IV. OPEN QUESTIONS
The systematic literature review identified open questions about customer dropout. From RQ1, some underresearched business areas were identified, such as the energy sector, education, logistics and hospitality. Compared with other business areas, such as telecom or the financial sector, research on the energy or water supply sectors is lacking, considering the contractual settings that are assumed to provide such types of services. Considering the software as a service SaaS business model of many software companies, the number of studies in this area is also surprisingly low.
RQ2 provided an overall perspective related to the algorithms that are being utilized to predict customer dropout. The first viewpoint could be the importance and wider adoption of decision trees and random forests [49], [74], [89] and logistic regression [91], which could be attributed to its higher interpretability and flexibility [45]. Interpretability is an important aspect for the marketing department in the extraction of valuable information from the model to develop effective retention strategies [50]. The problem arises in the balancing between interpretability and the higher performance of the algorithms inspired by nature (such as neural networks).
From a business perspective, dropout prediction should also be considered as business objective, which requires more than predicting whether the customer will churn [33], where higher interpretability provides better support in the development of retention strategies. The developed SLR also raises the possibility of integrating different algorithms using ensemble methods or integrating several models using a hybrid approach. None of the studies integrated the survival approach to predict customer dropout, for example, using a hybrid approach.
It is considered positive if actions are developed to retain customers, but the following problems should be considered, such: (1) customers who have a greater risk of dropout should be targeted to gain a better ROI from the retention strategies [95], [123] and (2) the retention strategies should be developed by focusing on customers with higher satisfaction, or its inclusion could be a reminder of the contractual agreement nearing an end and could lead to churn [33].
From RQ3, several types of features were identified, such as demographic, behavioral, and economic indicators, pictorial data, network relationships or high cardinality features. The problem that arises is that certain studies employed used data and features that were not described, and this creates a major issue. How can reproducibility be developed in a study without the availability of data or the identification of the features used? Considering that science is driven by data, with the development of new technologies, the increasing complexity of research and the amount of data collected, the challenge is to ensure that research is available to all [132]. This inclusivity requires the availability of both the data and algorithms so that they can be explored by other researchers. The features are selected mainly to verify the performance of the models and are essential to performance prediction, accuracy, and the steps for processing the data, which are fundamental to improve the model accuracy [77].
There are several challenges pertaining to the timing of dropout and the dynamic behavior of a customer with the intent to dropout [15]. The importance of understanding when dropout will occur and the risk when discarding the temporal perspective of the problem seems to be an element that should be addressed. Few studies have considered this issue [86], [92]. This gap in the research represents an opportunity to address the importance of the timeframe and its influence on the efficiency of the model.
According to each business model, the timeframe could be addressed considering the survival probability according to the customer relationship age. Dropout predictions could be developed according to these survival probabilities, as suggested by Esteves and Mendes-Moreira [106], to investigate which data timeframe produces the best results and how the efficiency of the models is influenced by this timeframe. Exploring the duration of the relation and the understanding of the features that increase or decrease this duration seems to be an important approach that could complement the existing approaches to predicting dropout.
From RQ5, the literature analysis showed that different types of questions arise. Which approaches are best for analysing the performance in predicting dropout? Several metrics, such as AUC, sensitivity, specificity, recall, precision, and F-score are employed. However, the goal of customer dropout is to improve the performance of organizations in retaining customers, which is a management problem for which data mining is adopted [50]. The goals of the model should be formulated considering the context of the problem that is being addressed; in marketing retention strategies, the uplift supports the development of proactive actions to minimize the investment in retention strategies [95]. Some assumptions that underlie the adoption of uplift metrics consider that customers with a higher risk of churning may not be the best targets, as suggested by Ascarza [18]. Other researchers have addressed the problem of using the top-decile lift to develop more proactive actions to retain customers at risk of churning [95], [123]. This approach considers that the top the 10% of customers with greater risk, and investments in retention strategies that distinguish churners susceptible to marketing actions from those who will leave anyway should be developed [28]. Although uplift models seem to be good strategies, they should also consider factors other than risk and customer satisfaction, as not considering other factors could be counterproductive, in which case the model should be removed from the retention strategy.
The true business objective is to reduce customer churn. Customers who are about to churn but cannot be retained should be excluded from the campaign, as targeting them would be a waste of scarce resources [33]. Using these models seems to be a good strategy, as they can outperform predictive models that consider only accuracy from a profitability business perspective. It should be considered that customers with a higher risk of churning may not be the best targets to develop retention strategies toward. These perspectives entail that a business context, or the clarification of a business objective underlying the prediction of customer dropout, should be developed to clarify which objectives should be achieved before employing machine learning algorithms. Surprisingly, the analyzed studies did not address the customer lifetime value as an objective to optimize considering the profitability of reducing dropout.

V. VALIDITY THREATS
This section describes the validity threats of this study. The common validity threats are listed as follows [133]: (1) construct validity; (2) internal validity; (3) external validity and (4) conclusion validity.
Construct validity: This threat evaluates whether the measures being adopted correctly support the concepts that are being employed. To reduce the threat to the search process, we selected several databases (SpringerLink, Scopus, Sci-ence@Direct, ISI Web of Science, IEEE Digital Library and ACM Digital Library) and employed the PICOC method to define the search string. The extracted measures were developed using bibliometric analysis and overall trends using language processing applied to the extracted article PDFs; additionally, the data extraction was developed to support our research questions. The sample size of our retrieved studies (420 articles) lowers this threat, supporting the generalization of the results.
Internal validity: This threat validates if a causal relationship, can be identified, i.e., if existing conditions could lead to other conditions, obtaining valid outcomes and evidence using a rigorous process. This category evaluates the implementation of the SLR process. To improve the internal validity, we followed the guidelines of Kitchenham and Charters [55] to develop a rigorous process. To reduce errors our process was developed integrating the results from multiple databases (SpringerLink, Scopus, Science@Direct, ISI Web of Science, IEEE Digital Library and ACM Digital Library), where article screening was developed using machine learning and validated by the researchers, which allowed us to automatically rank the documents reducing human error identifying relevant articles [39]. The adoption of contractual or membership tries to reflect the contractual relationship between a customer and an organization where the moment of dropout is known.
External validity: External validity addresses whether the findings of the study can be generalized and applied to another study. We selected 87 articles between 2000 and 2020 that addressed the prediction of customer dropout using machine learning with contractual settings in multiple databases. Scopus and ISI Web of Knowledge allowed us to reduce the risk of bias during the search, considering that they are the two main bibliographic databases with more than 77.8 and 79 million records [134]. Furthermore, the sample size gives some confidence in the generalization of the results.
Conclusion validity: The conclusion validity addresses issues that affect the ability to draw correct conclusions. The analysis in the selected articles was carried out using two approaches, a bibliometric and a qualitative analysis. The data extraction was developed considering each research question to increase the study reproducibility supported in the adoption of ASReview in the selection of eligible studies. The conclusions in our study could have been threatened by two main factors: the first one is related to the set of identified articles that address the subject of interest and how the analysis developed provides new knowledge. Nevertheless, several strategies have been employed to mitigate this threat.

VI. CONCLUSION AND FUTURE WORK
This study proposed a systematic literature review to identify different critical areas of research for customer dropout using machine learning. We developed a systematic literature review of relevant articles in the last 20 years. The studies explored different business areas, and it seems that there are research gaps that need to be filled and improved. Some business areas are underresearched, such as the energy sector, where studies targeting the dropout problem in a contractual setting are lacking. However, in recent years, we have seen studies in new areas such as the logistics, hospitality, and education sectors.
The variety of algorithms is impressive, and approximately 60 different types of algorithms were identified. Where a strategy in the data preparation should be developed is a key aspect that is probably more important than which algorithms to use, where interpretability is fundamental, considering the possibility of rule extraction.
The usage of algorithms depends on not only their performance but also their interpretability and simplicity, which could be related to the higher adoption of decision trees and logistics regression. The adoption of hybrid approaches to improve the performance of the predictions where the output of a model is the input of another model has been noted also.
The features used to predict dropout are dependent on the availability of the data and are related to a business area. The studies applied demographic and behavioral data, complemented with other features, such as social relationships with other customers. The process related to the preparation of the dataset is fundamental to the adoption of machine learning algorithms to prepare the data before developing the prediction model.
The performance of the algorithms should be measured in a business context where dropout prediction is addressed, whether a goal is to maximize profit using customer retention or whether the number of customers is considered an objective. The metrics predicting dropout should consider performance indicators such as accuracy but also metrics such as uplift, allowing us to support a profitability perspective in the employed approaches. Investment in counteractions to reduce customer dropout also needs to consider customer responsiveness, distinguishing customers who are more responsive to marketing actions from those who will churn independently of organizational effort. The studies addressed the problem with a static perspective considering the risk of dropout at a specific moment or as a dynamic problem that is considered a temporal perspective, where the risk of dropout varies according to the customer relationship age. Several authors present approaches that are more defined than predicting only dropout or non dropout. The performance of the models should consider a business objective that should aim to increase future earnings from the customer, which should be articulated with the interpretability of the algorithms to support the development of retention actions. Dropout timing is an important element to complement existing approaches, providing a temporal perspective for when retention actions should be developed.