Machine Learning and Marketing: A Systematic Literature Review

Even though machine learning (ML) applications are not novel, they have gained popularity partly due to the advance in computing processing. This study explores the adoption of ML methods in marketing applications through a bibliographic review of the period 2008–2022. In this period, the adoption of ML in marketing has grown significantly. This growth has been quite heterogeneous, varying from the use of classical methods such as artificial neural networks to hybrid methods that combine different techniques to improve results. Generally, maturity in the use of ML in marketing and increasing specialization in the type of problems that are solved were observed. Strikingly, the types of ML methods used to solve marketing problems vary wildly, including deep learning, supervised learning, reinforcement learning, unsupervised learning, and hybrid methods. Finally, we found that the main marketing problems solved with machine learning were related to consumer behavior, recommender systems, forecasting, marketing segmentation, and text analysis—content analysis.

that require a bigger analysis; in this regard, ML allows 33 solving problems faster and better than conventional tech-34 niques. Therefore, ML-based techniques are used to predict 35 the results of new data, predictions, and classifications or to 36 help people in the process of decision-making. Companies 37 need to learn more and more about their consumers, their 38 products, how to present them in the media, and how to plan 39 future activities efficiently, making use of their historical data 40 [8]. As we will see, ML has been widely used to discover 41 the most relevant needs of consumers and the relationship 42 they have with products and their attributes [9], [10], segment 43 satisfaction, recognition or recommendation, the selection of 44 a new product or reaction to advertising [11], [12], [13], [14]. 45 Data can come from different sources both structured 46 and unstructured [15], [16], [17]. Sources include websites, 47 social media, and blogs (YouTube, Twitter Tweets, Google 48 Trends, visits to Wikipedia, reviews on IMDB, restaurants, 49 tourism, hotels, and Huffington Post news) to predict the 50 consumers' demand, among others. Other sources include 51 data from business transactions such as e-commerce [18], 52 [19], retail scanners, as well as intentional sources of data 53 creation through user and internet usage (e.g., web cookies), 54 In step 2, we searched for individual articles published 131 by these journals in the Web of Science (WoS) database 132 between January 2000 and March 2022, which is a range of 133 almost 22 years. We tried using different keywords regarding 134 real applications of ML techniques. Our first search yielded 135 689 published articles, of which only a small number of 136 articles included what we were looking for. Finally, step 3, 137 we decided to search for a list of specific ML methods. The 138 final search is showed in the Figure 1. In the WoS database, 139 we used ''TS'', meaning search for topic terms in the follow-140 ing fields: Title; Abstract; Author Keywords; Keywords Plus. 141 With this, we obtained 320 journal articles. 142 In step 4, we reviewed one by one each of the 320 articles 143 and then performed a filter that would allow us to reach a 144 final number of articles. These criteria were the following: 145 the use of ML should be the main technique of the article, 146 the ML techniques should have been declared by the authors 147 (papers not showing a learning process were excluded), and 148 the article must provide enough information concerning the 149 technique used. Furthermore, the articles must define the 150 technique as ML and show the application of a case with 151 real data from verified sources (not experimental or simulated 152 examples [27], [28]). Some articles that used semi-supervised 153 learning but did not show a real application were rejected 154 [29], [30], [31]. Our database ignores articles that are not 155 in the area of marketing [32] (e.g., financial credit scores). 156 Ordinary regression (OLS), hierarchical regression, and clas-157 sic grouping (clustering) were not included if they did not 158 exhibit any learning technique. Articles that only exhibited 159 the use of software but did not provide intermediate results 160 were not included either [33], [34]. Articles that presented 161 software but did not exhibit the parameters employed and the 162 reasoning behind its use were not included [35]. Finally, a 163 total of 125 articles were included in this review. In step 5, 164 we searched within these 125 articles for the main groups of 165

209
Regarding the main ML techniques used in our review of 210 literature on marketing, it is difficult to achieve a systematic 211 and organized classification widely accepted. Figure 5 shows 212 a simplified organization of ML types of learning and ML 213 techniques, indicating that ML can be broadly categorized 214 into two classes: supervised and unsupervised learning [36]. 215 Supervised learning is used with labeled data in training 216 and learning. Unsupervised learning is a technique to find 217 VOLUME 10, 2022   due to the higher speed of computational capability. It is also 230 noteworthy that supervised and unsupervised learning has 231 been widely used in recent years, but not as other techniques. 232 Regarding the specific techniques used in the marketing 233 articles, within deep learning, as mentioned earlier, Figure 7a 234 shows that the most used technique is the artificial neural net-235 work (ANN), followed by the convolutional neural network 236 (CNN) and other techniques that change the way neurons are 237 interconnected or the adaptations of the original model. For 238 classification techniques (Figure 7b), the decision tree (DT) is 239 the most used technique, and this could be due to its simplicity 240 in the implementation and compression of the results. Addi-241 tionally, other variations of DT are used, including gradient 242 boosting (GB) and random forest (RF).

243
Support vector machine (SVM) or naïve Bayes (NB) is 244 almost equally used. Furthermore, this review ascertained 245 that regression models (Figure 7c) were rarely used; however, 246 eXtreme Gradient Boosting (XGB) is the most used. In the 247 case of unsupervised learning techniques, the ones found 248 are clustering (Figure 7d), in which K-means is frequently 249 used for market segmentation and latent Dirichlet allocation 250 (LDA) for text processing. In hybrid techniques (Figure 7e), 251 the most used in combination with other ways to improve 252 prediction are deep learning techniques such as ANN and 253 CNN, followed by SVM. Some published articles sought to 254 compare some techniques based on their performance (Fig-255 ure 7f); in this case, the most compared technique is DT 256 and RF, followed by SVM. Numerous techniques are used 257 in ML; however, we merely provided a brief overview of the 258 most common ones implemented in marketing applications, 259 as detailed in this review and Figure 7. The algorithms used 260 by the papers include individual algorithms or a combination 261 of them [37].

262
In the following sections, we will comment on the most 263 used ML techniques in marketing, organizing them by 264 type of learning according to Figure 5. We will be refer-265 encing the articles that have used each technique as we 266 go along.  interest search [39], [40]. Other applications are prominent in 280 content analysis to solve natural language processing (NPL) 281 tasks and identify information from words in a review [41], 282 [42], [43], [44], [45], news events [46], or hierarchical atten-283 tion networks (HAN) [47]. CNN is also applied as a con-284 tent generator [48], and it is also used for text and image 285 detection in social networks, brands, and retail [ [65], churn models 324 [66], [67], classification of online articles and reviews [68], 325 [69], demand prediction, or measurement of influencer index. 326 In ML there are some algorithms that can be used for market-327 ing purposes, including the following: SVM is a classification method that employs the mapping 330 of the input vector onto a high-dimensional feature space 331 and then, constructs a linear model that implements nonlin-332 ear class boundaries in the original space [70]. The data is 333 classified through a special kind of linear model, namely, 334 VOLUME 10, 2022 the optimal hyperplane, that maximizes the distance between 335 observations that belong to each category [71]. Support vector 336 regression (SVR) is also a popular SVM where the hyper-337 plane is the actual nonlinear function that should be esti-338 mated, and the sign of the residuals represents the two classes 339 [72]. The capacity of the system is controlled by parameters 340 that do not depend on the dimensionality of feature space 341 [73], [74]. Mostly, this technique is used to forecast customer 342 retention [75], online customer reviews [68], and prices in 343 supply chains [76].

370
DT is a method that generates rules for data classification 371 using a representation of a tree-like structure [79], and regres-372 sion trees respond to their predictors by recursive binary splits 373 [80]. The output is made by a decision node with two or imbalance in data, especially in churn [79]. This technique 384 has been used to predict the value of reviews [69], [82], the 385 choice of a brand based on social networks [81], and sales 386 [83], [84]. Ensemble methods are algorithms that construct a set of 395 classifiers and then classify new data points by taking a 396 (weighted) vote of their predictions [89]. The fundamental 397 principle of ensemble learning is to divide a large dataset into 398 small data chunks [68] (bagging models) or combine multiple 399 learning algorithms to obtain better performance (boosting 400 models) [80]. Random forest is the most commonly used 401 algorithm within bagging models, and it uses a multitude of 402 decision trees or statistical data structures during training to 403 best divide and average the labels to create a more balanced 404 prediction [90]; additionally, it combines several base classi-405 fiers into a robust classifier by increasing the overall accuracy 406 of the aggregated model [79].

407
The training set is randomly generated. RF is implemented 408 to reduce the correlation between the random distributions 409 of the input set and improve the bagging [91], [92]. Some 410 techniques can improve the RF algorithm; for instance, 411 non-parametric RF is more robust to outliers compared to 412 other bagging or boosting algorithms [92]. Boosting models 413 improve accuracy based on the idea that it is easier to find an 414 average of many approximate rules of the thumb than to find 415 a single highly accurate prediction rule [80]. Boosting models 416 include gradient boosting (GB) that builds a set of weak learn-417 ings (commonly used DT) to produce suitable learning by 418 correcting prior learning [74]. GB adjusts the predictor to the 419 residual errors made by the previous learning, i.e., increases 420 the gradient to allow optimization of an arbitrary loss function    Figure 8 shows the types of learning used to solve different 509 marketing problems. In this figure, we notice that deep learn-510 ing is the most prevalent method across the published articles 511 and is also most employed to solve marketing problems.

512
This may be because deep learning techniques are versatile 513 and employ different ways of solving complex problems. 514 Unsupervised techniques are primarily used to solve mar-515 ket segmentation problems and are not used for forecasting. 516 Reinforcement learning is only used in recommender sys-517 tems. Hybrid techniques can be seen with a greater prepon-518 derance in forecasting, requiring a more accurate prediction. 519 Supervised techniques can be used to solve any marketing 520 problem. Overall, the important thing to know is the nature 521 of the data available.

522
In terms of the specific marketing problems solved in the 523 published articles, Figure 9 shows that issue of recommender 524 systems is the most prevalent. Consumer behavior, forecast-525 ing, and market segmentation are also important. Notably, 526 text and video analysis has become more popular recently, 527 and this is also related to improvements in speed and the 528 simplicity in the use of ML techniques that allow this process 529 to be carried out, including deep learning and unsupervised 530 learning. The following sections will detail each one of these 531 applications.

533
Consumer behavior refers to the study of how clients, both 534 individuals and organizations, meet their needs and desires 535 to choose, purchase, use and get rid of goods, ideas, and 536 services. In other words, it refers to the decision taken by 537 clients during the purchasing process and the factors that 538 can influence this decision [112], [113]. These factors can 539 be cultural, social, and psychological, among others. Within 540 the applications of consumer behavior, we can stress char-541 acterization of clients, customer retention, trend prediction, 542 competition, etc. 543 VOLUME 10, 2022  based on emotions and loyalty [118], the credibility in the 556 classification of most popular users on consumer reviews 557 platforms such as Yelp [119], the prediction of customer 558 responses to campaigns [120], and the future behaviors of 559 a panel of customers [55]. Furthermore, the price sensitiv-560 ities and the importance of consumer behavior in super-561 markets [121] or purchases of ecological buildings [122] 562 have been studied. Studies also investigated the adoption 563 of payment according to the attributes of products coin-564 cidences in shopping baskets [123] and the use of peer-565 to-peer mobile payment systems and key backgrounds of 566 clients' intention [124]. It is possible to extract a hierar-567 chy of product attributes based on contextual information 568 of how attributes are expressed in consumer reviews [12]. 569 In the healthcare/health-related products industry, consumer 570 satisfaction was studied through posts on a review website 571 [125]. Additionally, some studies analyzed the impact of film 572 contracts in movie production and profitability of members of 573 the channel [126], the consumer's perception of the attributes 574 of certain brands in online posts through visual listening 575 [49], and the customers' repurchasing behavior of same-576 brand smartphones [127]. Some studies estimated the possi-577 bility that a consumer performing some actions, such as use 578 airline services again [128] or willingness to share personal 579 information according to their interests or social interactions 580 [84]. Some studies focused on limited player telemetry data to 581 observe churn from a gamified app [67].   in these techniques [150].

641
Most of predictions presented in the articles are related to 642 the prediction of market prices [76], forecasting of demand or 643 purchase patterns in business segments [40], [60], [74], [151], 644 classification or prices of products [65], [152], or difference  [77]. Other studies focused on predicting satisfaction and 659 brand recommendation as well as purchase intention [156] 660 and how natural-look campaigns are associated with the 661 increase in artificial beauty practices [157].

663
Marketing segmentation is one of the main strategies in the 664 field of marketing [158]. Its objective is to identify and 665 delimit market segments or ''groups of buyers'' that will 666 then transform into objects of the company's marketing plans 667 [158], [159]. The advantage of marketing management is 668 that this technique divides the total demand into relatively 669 homogeneous segments, which are identified for some com-670 mon characteristics. These features are relevant to explaining 671 and predicting the consumers' responses-in a determined 672 segment-to the marketing stimulus [158], [159], [160]. The 673 segmentation can be made according to geographic, behav-674 ioral, psychological, and demographic criteria [161]. 675 In this field, the articles analyzed presented the market 676 preferences according to reviews of hotels, tourism, and 677 hospitality sectors [ [163], and how to carry 681 out promotions and the effect on sales [80]. Other authors 682 segmented customers according to retailer-brand and channel 683 usage [164], brand equity and engagement in brand-related 684 social media behavior [81], or consumer sentiments on social 685 media [165]. Furthermore, some studies looked for patterns 686 of interest in trade based on clicks [101], including segments 687 that vary in their donation intentions, political attitudes, and 688 preferred types of charities [166]. Other works predicted 689 the characteristics and segments of companies that adopt 690 the use of workforce-based robotics [167] or determined the 691 differences in business model attributes of FinTech [168]. 692 Additionally, some works segmented consumers based on 693 their psychological profiles [169].

703
The analysis of text in marketing, according to the bibli-704 ographic review, is based on the identification of the influ-705 ence of word-of-mouth reviewers [173], chain operation in 706 the hospitality industry [42], or movie spoilers [106]; addi-707 tionally, TA is also based on the usefulness and classifica-708 tion of the attributes of reviews, opinions, or comments of 709 online consumers or Twitter users their consumption trends, 710 satisfaction, or patterns [41], [43] [44]. Additionally, the analysis 723 investigated how negative feelings influence the market for 724 firms [105] and the use and adoption of digital technologies 725 and platforms for teleworking in the post-pandemic era [107]. 726 In other social networks, the social influence and facilitating 727 VOLUME 10, 2022 conditions directly impact the users' sentiments toward intel-

744
In this section, in Table 1, we highlight the marketing prob-745 lems that are solved using ML and the most used techniques 746 for each. Moreover, Table 1

759
In the articles that we reviewed, the input data came from 760 social networks [78] daily sales [114], and online purchases [108], among other 769 types of data entries, are also used. The results depended 770 on the approach that the authors adopted in sentiment anal- We identified the best algorithms used based on the articles 782 that compared techniques (multi-technique). For classifica-783 tion, the use of DT, NB, RF, KNN, SVM, and XGB were 784 compared in [128], and the findings demonstrated that XGB 785 offered the best accuracy followed by RF in all the tests 786 performed.

787
Another study compared the performance of ANN and RF 788 [67] with input data from video games, and in the three tests 789 carried out for both classification and regression, the results 790 were similar with a similar time window, but when deciding 791 text analysis [125] with data from healthcare/health-product 793 e-commerce firms, linear regression (LR), XGB, RF, and DT 794 were used. The study concluded that in terms of the RMSE, 795 XGB and RF displayed the best predictive power, although 796 LR was almost the same. With input data from reviews to 797 measure the help they provide [82], DT, RF, GB, bagging  In brief, our results highlight that a significant and 853 diverse number of ML techniques are employed in extensive 854 marketing-based applications. In the field of scientific pro-855 duction, the number of publications increased over time, but 856 this growth was mostly observed in journals with a ranking 857 Q1 and Q2. In the period of years studied, the most used 858 technique was deep learning. We also realized that until 2008, 859 certain techniques were used generically in all marketing 860 applications; however, their use has expanded to a larger level 861 by the year 2022. It is noteworthy that several techniques are 862 widely used in a specific year, displaying a kind of boom 863 in popularity, followed by a period of decline and stability 864 in their use. For example, text mining analysis exhibited 865 disproportional use during some periods (2009-2010), then 866 no use for seven years, and finally being used again in 2016. 867 In the foregoing, we can see that in general, ML techniques 868 have experienced a degree of maturity in the field of market-869 ing. This is reflected in a larger diversity of applications and a 870 larger diversity of ML techniques. This has also accompanied 871 advances in ML applications, dispensing with the need of 872 having advanced knowledge of programming. Accordingly, 873 this allows researchers who are not specialized in computer 874 science to use the aforementioned techniques in their areas of 875 expertise (e.g., marketing). Last, digital marketing has pro-876 moted the need for better handling of more data with a more 877 complex analysis, which is provided by ML applications.

878
Regarding the limitations of this study, we can highlight 879 the lack of marketing categories in WoS and the use of JCR 880 as a proxy for this category. However, articles excluded did 881 not affect the main results of this research.

882
Regarding future works, research must focus on the appli-883 cations of other recent ML techniques in areas of marketing 884 that have not been presented before. Despite the incipient 885 maturity, there is also a belief that some ML techniques-886 given their simplicity-are not being applied correctly due 887 to the lack of knowledge surrounding them. Hence, a study 888 must be conducted to know if these conditions could produce 889 unexpected results using techniques that allow a larger sim-890 plicity or visualization of the results (DT and CA) against 891 more complex techniques such as ANN or SVM.