Data-Driven Remaining Useful Life Estimation for Milling Process: Sensors, Algorithms, Datasets, and Future Directions

An increase in unplanned downtime of machines disrupts and degrades the industrial business, which results in substantial credibility damage and monetary loss. The cutting tool is a critical asset of the milling machine; the failure of the cutting tool causes a loss in industrial productivity due to unplanned downtime. In such cases, a proper predictive maintenance strategy by real-time health monitoring of cutting tools becomes essential. Accurately predicting the useful life of equipment plays a vital role in the predictive maintenance arena of industry 4.0. Many active research efforts have been done to estimate tool life in varied directions. However, the consolidated study of the implemented techniques and future pathways is still missing. So, the purpose of this paper is to provide a systematic and comprehensive literature survey on the data-driven approach of Remaining Useful Life (RUL) estimation of cutting tools during the milling process. The authors have summarized different monitoring techniques, feature extraction methods, decision-making models, and available sensors currently used in the data-driven model. The authors have also presented publicly available datasets related to milling under various operating conditions to compare the accuracy of the prediction model for tool wear estimation. Finally, the article concluded with the challenges, limitations, recent advancements in RUL prognostics techniques using Artificial Intelligence (AI), and future research scope to explore more in this area.


I. INTRODUCTION
In the manufacturing industry, the milling process plays a crucial role because of its flexibility in production [1]. The productivity, quality, and cost of the final product depend directly or indirectly on the lifespan of the cutting tool during the machining [2]. The failure of the cutting tool is responsible for productivity and monetary loss of industry. Tool failure causes a higher rejection rate and increased unscheduled downtime of the machine. According to recent statistical data, the cutting tool acquired $5 billion US dollars in value ($1.9 billion for milling cutting tools), around 1.5% of the annual Gross Domestic Product (GDP) of the US market [3]. In manufacturing or any other industries, The associate editor coordinating the review of this manuscript and approving it for publication was Xianzhi Wang . the plant has some fixed cost (equipment cost, land, wages, etc.) and variable cost (power, raw material, electricity, etc.) to manufacture a product that intends to generates the high profit (gross income) for the organization after selling it into the market [4]. Figure 1(a) shows the graphical representation of fixed cost, the variable cost, and profit relation under standard working condition plant (without equipment failure/downtime). Once the equipment or components fails, it does not contribute to profit, and additional unplanned maintenance costs come into the picture.
As shown in Figure 1(b), equipment fails at time T 1 and returns to normal working conditions at time T 2 . When equipment fails, fixed cost continuously accumulating, but it gets wasted because no production is carried out. Simultaneously, the overall variable cost also increases (cost of consumables decreases but the cost of maintenance increases). These losses continue until the plant gets back into working condition.
In such a case, the cost of a severe outage failure cause due to unplanned downtime can be much more than the profit made in the same duration of time. In many cases, equipment gets replaced at a too early stage before its end of life, so one cannot utilize that useful life of the equipment effectively.
In another case, equipment gets failed before replacing and causes unplanned downtime. Proper estimation of useful life is necessary to predict the life of equipment cost-effectively. As shown in figure 2, potential failure and function failure need to be found based on the degradation symptoms to understand the useful life of the equipment.

A. SIGNIFICANCE OF STUDY
Progress in the manufacturing domain is at a rapid pace. The milling machine, in particular, has seen an upward trend with the usage of highspeed machining tools and hard workpiece materials (>45 HRC) [5]. Premature tool failures are often costly to repair, and they ultimately result in workpiece damage and possible harm to the machine and its operators [6]. There is a need to implement research-based solutions that estimate the RUL of the milling tool. RUL estimation is considered a core and challenging aspect of the Prognostics and Health Management (PHM) of machines or processes. In the PHM of the system, RUL is the key aspect [7]. It helps to predict the current health status of the degrading system by indicating systems performance degradation and prevention against sudden failure [8]. RUL provides cost-effective solutions in maintenance and provides the reliability of the system [9]. According to ISO 13381, using a prognostic approach, the industry can determine the risk and time of system failure [10]. The main objective of the prognostic is to estimate the RUL of the system by providing the machine's past operation status and current condition to predict the useful life before failure occurs. RUL estimation becomes essential in today's economic climate [11]. RUL estimation is favorable in many critical applications such as machine's essential components, aircraft, nuclear power plants, etc. From the conventional approach, one can calculate the useful life, but it considers only the static condition of the machine. As industries are moving towards the era of Industry 4.0, one can estimate the RUL of dynamic systems with real-time monitoring. RUL plays a vital role in condition-based maintenance [12], [13]. In a raw way, RUL is a period from the current time to the end of the functional life of the product [9]. RUL prediction is also helpful for checking the operational performance of equipment, inventory management, maintenance activity planning, etc.
The forecasting from (2021-26), predictive and prescriptive maintenance will capture around $22.72 billion by 2026 with a Compound Annual Growth Rate (CAGR) of 19.68%. [14]. According to a survey, machine downtime average cost is around $260,000/hour, including all business types [15]. In auto industries, downtime costs are around $50,000 per minute, approximately $3 million per hour [16]. About 70% of the industrial sectors are not aware when equipment needs maintenance or replacement due to lacking RUL estimation knowledge [17]. In manufacturing industries, on average, up to 20% of machine downtime occurs due to the failure of the cutting tool. It is necessary to select the proper maintenance strategy and estimate its useful life to minimize this unplanned downtime. The accurate system monitoring improves productivity from 10-40%, with cost-saving up to 40% [18], [19].

B. MOTIVATION
In milling operation, accurate tool life estimation is essential to maximize the functional life of the cutting tool. Continuous real-time monitoring of the cutting tool with appropriate maintenance strategies must be defined to avoid unplanned downtime. Advanced sensor technology and emerging AI techniques provide more insightful information about milling machine health. As shown in Figure 3, based on Scopus database publications over the last ten years, the publication trend in milling RUL estimation is rising, indicating that the importance of RUL estimation is increasing in recent years. To the best of our knowledge, very little exhaustive research covering the aspects of sensors, monitoring methods, algorithms, datasets on RUL estimation using a data-driven approach has been published yet. This study also provided the advancement in RUL, and future directions, which will motivate PHM researchers to explore data-driven strategies for RUL prediction of critical machinery.

C. TERMS AND TERMINOLOGY
Following are the few terms that are frequently used in rul estimation research of milling tools: RUL: RUL is defined as ''the length from the current time to the end of the useful life'' [20]. RUL helps estimate the inspection or maintenance period and minimize excessive inventory by reducing unplanned failure [21].
Milling process: Milling is the machining process in which rotary cutters remove the material of the workpiece, which is machined by advancing the cutter towards the workpiece.
Tool Wear: During the machining, the workpiece and cutting tool contact each other, which causes the change in tool morphology known as tool wear [22].
Flank wear: Flank wear is a type of tool wear at the flank face (the tool surface that comes in contact with the workpiece) of the cutting tool due to interaction between tool and workpiece.
Tool life: Tool life is the duration of actual cutting time after which the tool is no longer able to perform its required function. In general, tool life is the time duration of maximum acceptable wear.
Predictive maintenance: It is a condition-based maintenance process that uses data analytics to indicate the possible equipment failure time for scheduling maintenance. Proper maintenance scheduling helps to avoid unplanned or sudden equipment failure.
Machine unplanned downtime: Unplanned downtime occurs when a machine stops its working or production due to failure or unexpected shutdown.

D. EVOLUTION OF RUL ESTIMATION
RUL estimation has undergone significant evolution over the past four decades with the progression of inspection or monitoring methods, as shown in figure 4. Recent advances in analytical software and remote sensing methods have enabled the accurate RUL estimation of machinery and enabled greater decision support for carrying out sustainable maintenance activities. Table 1 depicts the phases of inspection/monitoring methods [23]. During the initial phase, visual supervision was done, in which inspection of each component was done physically with the help of domain expert supervisors [24], [25]. Data was stored in software like MS office. In the next evolution phase, the instrument-based periodic inspection was carried out using embedded software with the help of trained supervisors. Real-time condition monitoring was done with continuous remote assessment using sensors with condition monitoring software in the next evolution phase. Now, industries estimate the effective RUL to serve predictive maintenance for continuously remote monitoring by using sensor data. Automated inspection, verification, digital pattern analysis using simulation, advanced AI decision support are important source of performance measurement. In this phase of evolution, AI based decision models, Big-Data, cloud services are used by taking help from data scientists, reliability engineers and domain experts. Real-time condition monitoring mainly performs the diagnosis by uninterrupted monitoring via software with the help of different sensors. On the other hand, RUL estimation for  predictive maintenance focuses on the prognostic approach rather than just diagnosis. Prognostic helps to predict the future behavior of the equipment or component to predict its useful functional life.

E. RESEARCH GOAL
The purpose of this paper is to provide a systematic and comprehensive literature survey on the data-driven RUL estimation tool during the milling process. Table 2 shows research questions that help to achieve the research goal by doing a detailed survey in a data-driven RUL estimation.

F. CONTRIBUTION OF THE WORK
In this survey, the authors have highlighted the adverse effect of unplanned downtime of the machines due to tool failure in the milling process. The paper has listed the various maintenance strategies used in industries to maintain equipment health and RUL estimation significance during milling. The authors also provide the existing monitoring techniques for equipment health. Brief detail about different sensors used for data collection is provided. Furthermore, the paper gives details on the different decision-making algorithms used in the data-driven approach. The authors have surveyed few papers that have used publicly available datasets related to milling under various operating conditions to compare the tool wear estimation accuracy of various prediction models. Finally, the authors mentioned challenges, limitations, recent AI advancements, and future scope in the area of RUL estimation. Figure 5 shows the organization of the paper along with tools and techniques used in RUL estimation, which is divided into a total of eleven sections. Section I has addressed the  significance of study, motivation, terms, and terminology used in milling, the evolution of RUL estimation, research goals, the contribution of work and paper organization. In section II, research methodology is explained with selection criteria, selection results, and quality assessment. Section III presents the background study related to the milling process and tool wear, along with maintenance strategies and proposed PdM models. In Section IV, direct and indirect monitoring (sensing) techniques for signal or data collection in a data-driven model are explained. In section V, the data-driven model for RUL is described. In section VI, the popular sensors used in the in-direct monitoring technique and the need for multi-sensors over single sensor technique are explained. Section VII gives details regarding the different feature extraction and selection techniques. Section VIII shows the different data-driven algorithms used for monitoring and prediction. In section IX, few papers that have used publicly available datasets related to milling under various operating conditions to compare the accuracy of the prediction models for tool wear estimation are surveyed. Section X is the discussion section that represents the survey outcome, challenges, and limitations. Section XI is about RUL advancement related to AI. Section XII provides recommendations for future work. Finally, section XII gives the conclusion of this review paper.

II. RESEARCH METHODOLOGY
As the RUL estimation is a broader area, the authors have performed the literature survey using the systematic review process to address the research questions. The authors have divided methodology into three sections; selection criteria, selection results, and quality assessment.

A. SELECTION CRITERIA
Authors mainly used Scopus, Web of Science, and IEEE databases to retrieve related documents. A special query (search string) is formulated to retrieve the research article using multiple database searches. Table 3 shows the search string (query executed) for finding the number of documents  by joining master, primary, and secondary keywords using AND Boolean operator. Table 4 shows the records found out (n = 91) after searching papers using different databases (Scopus-46, Web of Science-32, and IEEE-13) from 2011 to 2021. Duplicate articles from each database are excluded (n = 39).

B. SELECTION RESULTS
Some more documents, such as non-English documents, book chapters, and conferences, are excluded (n = 15). Finally, as shown in figure 6 total of 37 core documents related to milling RUL estimation are considered for study after excluding documents. Figure 7 shows the network visualization diagram based on author keywords analysis. The size of the circle indicates the level of incidence of that keyword. If the distance between the network of the keywords is small, it shows a strong correlation between the keywords and vice versa. The network shows that, from extracted documents, the ''remaining useful life'' keyword having a strong influence with other keywords like ''tool wear,'' ''condition monitoring,'' ''machine learning,'' ''predictive maintenance,'' etc. The keywords with the same colors show that the formation of clusters by those keywords, a total of 6 different major clusters, is formed from the network visualization diagram.

C. QUALITY ASSESSMENT
After applying selection criteria (shown in figure 6), few papers are short-listed. Based on research questions, a proper maintenance strategy need to study which is applicable to achieve the research goal. From shot-listed papers, the following points are considered for quality assessment in this paper.
• Maintenance strategies: Research emphasizes the different types of strategies used in the industries for maintenance.
• Predictive maintenance: Research work also emphasizes the different predictive maintenance models used in industries.
• Data-driven RUL model: Paper mainly focused on the data-driven technique for RUL estimation.
• Sensors: Research work also concentrated on the different sensors used in the milling machine for data collections.
• Decision-making algorithms: Paper also focused on the different decision-making algorithms used for the RUL estimation.
• Advancement in RUL Prediction using AI: Research also provides the recent advancement in AI, which can be applied for accurate and robust RUL estimation.

III. BACKGROUND STUDY
In this paper, to perform a systematic literature review, a background survey is conducted on the tool wear issue of milling cutters. In the next stages, literature related to maintenance strategies, predictive maintenance models, monitoring (sensing) techniques, and the data-driven process is covered. Based on the tool wear issue of milling cutters, the authors first studied the different maintenance strategies used in the industries to understand the pros and cons of the individual approach for selecting a proper maintenance strategy. In the subsequent step, literature related to predictive maintenance is focused. In the final section, different proposed models for predictive maintenance are listed. And finally, based on the primary literature survey, our paper mainly focused on the data-driven predictive model used for the RUL estimation.
To fully comprehend the phenomenon of tool life estimation, one must first understand the milling process concept and the tool-wear that takes place during the machining.

A. MILLING PROCESS AND TOOL WEAR
In the milling process, machining is performed using multipoint rotating cutters or tools by moving them against the stationary workpiece. Figure 8 shows the milling machine and arrangement of the milling cutting tool and workpiece during machining. While performing the machining operation, appropriate parameters such as feed, speed, and cutting depth are considered based on VOLUME 9, 2021 experience or parameter optimization techniques. The cutting tool is a crucial part of the machine as it is accountable for the surface finish and machining accuracy of the product [35]. Tool wear is caused by relative motion between the cutting tool and the workpiece [19]. The worn-out tool causes inferior surface and dimensional inaccuracy, responsible for shortening the life of the finished parts. This tool wear due to a change in the shape of the cutting tool is responsible for finishing the final workpiece, dimensional accuracy of the final product, tool failure, etc. Generally, tool wear during machining takes place in two forms: flank wear (V B ) and crater wear (K B ). Figure 9 shows the changes in the geometry of the cutting tool due to flank and crater wear. Flank wear occurs due to contact between the tool and workpiece, whereas crater wear occurs due to relative motion between the tool and cutting chips. Figure 10 shows the (a) fresh unworn, and used (b) worn-out cutting insert showing flank wear. Many researchers concentrate on flank wear monitoring for tool life estimation. Flank wear is mainly responsible for the machining quality, reliability, and dimensional accuracy of the workpiece [2], [36].

B. MAINTENANCE STRATEGIES
Due to the advancement in manufacturing technologies concerning the industry 4.0 scenario, industries move from conventional to intelligent manufacturing approaches [37]. This intelligent manufacturing approach improves the quality, performance, and service of the product, reducing resource consumption by decreasing the rejection rate [38]. Due to this smart approach, the maintenance strategies of the manufacturing industry drawing more attention in recent years, and various prediction and diagnostic methods are used for maintenance purposes [39]. Figure 11 shows the different maintenance strategies used in industries such as reactive, preventive, and predictive maintenance [40]- [42]. Table 5 shows the maintenance strategies with suitable cases, unsuitable cases, benefits, and limitations.

1) REACTIVE MAINTENANCE
Reactive maintenance does not restrict unplanned downtime, as the reactive maintenance component is replaced after it fails. It may cause further damage to the equipment or process [43]- [46].

2) PREVENTIVE MAINTENANCE
In preventive maintenance, maintenance activity is scheduled at an equal interval of time. The part is replaced at an equal interval of time, based on experience. But due to this strategy, the maximum life of the component is not utilized effectively. At the same time, it increased the inventory handling cost and planned downtime. So, now industries are trying to shift towards the PdM approach due to its remarkable benefits [44], [45], [47], [48].

3) PREDICTIVE MAINTENANCE
It gives holistic insights into the health of the equipment and predicts component failure time. This smart manufacturing approach provides interaction between physical and cyber environments, predicting and improving the real-time behavior of the system. Figure 12 shows the maintenance approach used in reactive, preventive, and predictive maintenance [31]- [34], [49]. Figure 13 shows that the annual average unplanned downtime of the PdM strategy is lower than other maintenance strategies [53]. The predictive maintenance approach is widely used in recent years to reduce unplanned downtime during machining. The goals of PdM are to boost the quality and productivity of the industry by reducing the unplanned downtime and maintenance cost of the equipment.
One of the important aspects of predictive maintenance is the estimation of RUL [32]- [34]. According to the authors in [31], the prognostic approach is defined as ''An estimation of time to failure and risk for one or more existing and future failure modes.'' Figure 14 shows the combined diagnostics  and prognostics framework to determine the components' RUL. The authors in [54] divide the RUL prediction into four parts; fault detection (to detect the abnormal condition), fault isolation (to identify which component is failing), fault identification (estimating nature of fault), and RUL prediction (lead time to failure).    Predictive maintenance uses analytics to estimate the health of the system or equipment. Predictive maintenance aims to improve productivity and quality and reduce maintenance costs by decreasing unplanned downtime. Figure 15 shows the principles, goals, and leading application area of predictive maintenance in the scope of industry 4.0 [42]. The basic principle of PdM is to perform diagnosis, prognosis, and analyze the capture signals from sensors. The goal of PdM is to improve productivity, quality by reducing downtime and maintenance costs. The major application domains are smart manufacturing, security, robotics, health, etc.

C. PROPOSED MODELS FOR PDM
Commonly used PdM methods are knowledge-based model (reliability statistics model), physics-based model, and datadriven modeling approach, as shown in figure 16 [35], [42]. Proper selection of these models is based on their applications and characteristics.

1) STATISTICAL KNOWLEDGE-BASED MODEL
The Statistical knowledge-based model mainly uses past equipment failure or breakdown data for statistical characterization and makes fault prediction [55]. It uses the Bayesian method, fuzzy logic, Weibull distribution, etc., for the prediction of fault. This method does not consider system degradation, environmental effects. So, prediction accuracy is less compared to other methods. This method is not suitable for complex systems like CNC machines.

2) PHYSICS-BASED MODEL
In the physics-based model, mathematical models are built to reflect physical degradation behavior [56], [57]. This physicsbased model includes a Gaussian mixture model [58], Markov process model [59], etc. It requires real-time machinery information as well as expert knowledge to build a highfidelity model. It is challenging to develop a precise fault prediction model of a complicated system with different domains due to ignorance or the complexity of degradation mechanisms [56].

3) DATA-DRIVEN MODEL
In the data-driven model, data is collected using sensors from the running devices to derive a predictive maintenance model [60]. Essential features are extracted from the raw data (signals) to get useful information.
Different algorithms such as SVM, Gaussian Regression, ANN [70], etc., are generally used to analyze the collected data. Sensor positioning plays a crucial part in the data-driven system. If the sensors are not installed at the proper location, it causes difficulty in the data acquisition system, leading to an error in prediction [71]. This review mainly focuses on the data-driven predictive maintenance approach for estimating the RUL of the milling tool. Table 6 shows the predictive maintenance models with suitable cases, unsuitable cases, tools used, benefits, and limitations.

IV. MONITORING (SENSING) TECHNIQUES
Commonly used tool condition monitoring techniques for data-driven predictive maintenance are direct monitoring and indirect monitoring. Direct sensing techniques mainly include a microscope, lasers, cameras, Charge-Coupled Device (CCD) cameras, laser, ultra-sonic sensors. Direct monitoring provides direct information about machine   conditions. While in in-direct methods, sensors are used to measure cutting forces (dynamometer), vibration (accelerometer), temperature, sound (microphone), current/power, acoustic emissions are used, which provide indirect information about systems health. Figure 17 shows the different sensing techniques of direct and indirect monitoring methods with their benefits and limitations.

A. DIRECT MONITORING
It consists of optical microscopes, direct vision, lasers, ultra-sonic sensors, radio-active sensors, etc. This method measures the actual size of the area worn on the tool. Direct sensors provide a more accurate tool state and measure any wear, such as crater, flank, notch, etc., using image processing algorithms. Tool conditions are obtained using the optical image and machine vision technique [28], [72]. Figure 18 shows the generalized flow of the direct tool condition monitoring method. The disadvantages of direct sensors are that they are not appropriate for online monitoring due to the machining environment, such as chips and coolant, which easily disturb the accuracy [73]. It increases the downtime of the machine and reduces production time.  The monitoring processing time is not real-time as measurements are taken in tool holders only and measured data processed separately [74].

B. INDIRECT MONITORING
Indirect condition monitoring methods are used to monitor the real-time tool condition without interfering with the machining process. Indirect monitoring is suitable for diagnostic as well as prognostic purposes. Figure 19 shows the generalized indirect data-driven TCM process.

V. DATA-DRIVEN RUL MODEL
In the data-driven model, data is collected from the running devices with the help of sensors to predict the system run time behavior by monitoring its parameters [35]. The authors in [75] divided the complete RUL estimation process into four parts, as shown in figure 20; Data acquisition, Health indicator construction, Health stage division, and RUL prediction, respectively. The factories have been increasingly integrated cyber-physical systems and intelligent sensors to control complex machining environments and tooling; research is conducted on the data being tracked to automatically identify system and machining anomalies [76].  Data-driven algorithms have been suggested in recent years to improve the efficiency and precision of the diagnosis by combining rapid growth in smart sensors, data processing, and Deep Learning methods. The authors in [28] divide the data-driven model into two parts: an online monitoring model and model training. Online monitoring involves online monitoring by using sensors and making decisions. Simultaneously, the training model mainly consists of the configuration of the sensor, extraction of features, and monitoring model. Figure 21 shows the generalized flow of the data-driven model for RUL prediction, in which the first sensor data is collected from the milling machine by using different sensors. Collected raw signals need to be de-noise by removing due noise environment or other factors. De-noised signals are pre-processed by doing signal conditioning, amplification, filtration, etc. In the subsequent stage, processed signals are used to extract and select important features related to the health of the machine tool. Selected features are used for the diagnosis or prognosis by using suitable decision-making algorithms to predict the RUL of the machine tool.

VI. POPULAR SENSORS USED IN DATA-DRIVEN MODELS
Sensor configuration provides the sensor signals for feature extraction and extracted features related to monitoring tool conditions like tip fracture and tool wear. Sensor monitoring can be performed by using a single sensor or by using the multi-sensor fusion technique.

A. SINGLE SENSOR MONITORING
In this method, analysis of signals captured from sensors is used to estimate tool conditions. Sensor monitoring is an in-direct monitoring technique of a data-driven model. Dynamometers, accelerometers, acoustic emission, current sensors are generally used in indirect monitoring methods.

1) DYNAMOMETER
It provides cutting forces to describe the cutting process state during machining [77]. It shows an excellent response to cutting forces due to its high reliability and sensitivity. With progression in tool wear, a corresponding increase in cutting forces takes place in machining. Cutting forces is a sensitive element related to tool conditions to estimate tool state accurately. Two different types of dynamometers are used in milling machines; table-based dynamometer and rotating type of dynamometers [78]. A table-based dynamometer generally places between the interface of the workpiece and workbench; it shows an excellent response to a slight change in cutting forces during machining [78].
In comparison, the rotating dynamometer is connected to the tool holder or spindle [79]. The dynamometer selection is based on the amount of Kg-force (Kg-f) generated during the machining. A dynamometer can track tool breakage that occurs as a peak in the signal functions. A neural network combined with a dynamometer offers a simple decision-making process for tool wear estimation [80].
Drawbacks: Along with the above advantages, the dynamometer also shows some limitations. It is unsuitable for large and medium-size workpieces in milling due to its physical properties [81]. Dynamometer, which is mounted on the worktable, limits the size of the workpiece [82]. Installation of the dynamometer is a challenging task as it is placed between the workpiece and worktable interface. Using a commercial dynamometer and its maintenance Significantly increases its cost. The rotating type of dynamometer restricts the frequent tool change operation in automated Computer Numerical Control (CNC) milling machines [83].

2) ACCELEROMETER
Vibrations are caused in the machine due to friction force or fractured inserts between the tool and workpiece during matching. Growth in tool wear responsible for increased cutting force and vibration amplitude. The selection of the vibration sensor depends on the speed of the spindle, operating frequency bandwidth (Hz), and operating range in ''g'' (1g = 9.81 m/s2) of the sensor. Vibration signal measurement follows ISO 10816 [22]. The accelerometer provides similar periodic signals as cutting force. As the cutting tool starts to deteriorate, vibration signal amplitude increasing accordingly.
Drawbacks: Accelerometer also shows some limitations like mounting position causes changes in signals. Machining speed should be within a specific range for better results. The harsh working environment like fluid lubrication, chip strike causes changes in generated signals.

3) ACOUSTIC EMISSION (AE)
AE signals are generated due to the transient elastic energy generated due to the mechanical deformation of the material [84]. Tool wear or stresses between tool and workpiece takes place due to chip fracture or friction between chips.  The AE sensors detect such signals (noise comes from the machine) during machining. AE is nothing but the energy of the micro-level material due to deformation during machining [85]. The proper value of sensitivity (dB) and operating frequency (kHz) need to be considered for selecting the AE sensor. The machining process with dynamic bandwidth from 100 kHz to 1 MHz can be monitored using AE sensors [84], [86]. AE sensor signals do not disturb easily due to mechanical disturbance compared to vibration and cutting force signals and have a higher frequency range than environment frequency. Signals are easily recognized and quickly respond to the changing condition of the tool and the work material. AE sensors are much beneficial in micro-milling operations [87].
Drawbacks: Along with this, AE signals are disturbed easily due to the noisy environment, which causes trouble in extracting valid signals by denoising the raw signal from the sensor [88].

4) CURRENT SENSORS
The cutting force increases with an increase in tool wear, the current drawn by the spindle motor increases accordingly [89]. Motor current sensors are found somewhat acceptable for manufacturing environments than those for cutting force sensors, owing to their comparatively straightforward design [90], [91]. The cutting tool gets blunt due to the gradual wear; current drawn from the spindle motor increases compared to the normal working condition [92]. Hall effect sensors collect the current signals in end milling operation to monitor the tool condition [82], [92].
Drawbacks: The motor current is highly sensitive to noise and significantly affected due to friction during machining and damping of the feed drive system. It was also found that at higher spindle speed, current signals are not much sensitive to change. Table 7 shows the benefits and limitations of the individual sensor.

B. MULTI-SENSOR TECHNOLOGY
In machining, tool life prediction is a critical issue as the cutting process has dynamic and nonlinear behavior [109]. Sensors collect data from the machine from a particular location from where they have placed and generate the source of information in signals. As the machining and tool wear condition changes, it shows different behavior. Tool condition becomes critical due to behavior changes while using a single sensor. Hence, the multisensor technique is preferable for gaining the confidence to predict the proper tool behavior [110]. Simultaneously, to avoid drawbacks of the individual sensor discussed above (Section VI.A), the multisensory concept for TCM becomes more popular. Different sensors strongly correlate the tool condition of tool wear and overcome the sensor's sensitivity loss by other sensors. The multisensor approach increases robustness and better performance by reducing uncertainty in tool wear due to a single sensor. Table 8 shows a few papers related to the RUL estimation and condition monitoring using a data-driven approach.

VII. FEATURE EXTRACTION AND SELECTION
The raw data collected from the sensors have large number of dimensions, and processing such high dimensional data may require a lot of computing resources and time. Hence to get more insights into the data for efficient processing, we need to reduce the raw data dimensions such that it represents the original dataset completely and accurately. For data analysis purposes, relevant features are extracted from the signal. Further, feature selection is a process that helps to identify the important features of equipment and eliminates the features that contribute less to the output or target variables of the model. A proper feature selection process significantly improves the prediction accuracy and performance of the model.

A. FEATURE EXTRACTION
Feature extraction is performed to convert raw machinery data into more meaningful data which can be fed to the model. It aids in the reduction of the dimensions of the original signal information obtained across various signal processing domains. Signals captured by using the sensors need to convert from analog to digital form. For denoising the data, signals need to be passed through the low pass and high pass filters. Features that have a good correlation with target variables are selected, enhancing the learning rate during model training and thereby improving the predictive performance of the model. Collected signal data is classified in the time, frequency, and time-frequency domains.  information in the time series [111]. Graphical time-domain representation plots the change in signal over time while frequency-domain provides how much data or signal lies within a given frequency band over a range of frequency. The time-frequency domain provides the frequency band of the signal over the time interval.

1) TIME-DOMAIN
It extracts the features of the tool state from the acquired signals of the sensors using time series and different statistical parameters to reduce the dimension of the signal information.
Time-domain uses other dimensional and non-dimensional statistical parameters. Dimensional parameters such as average, Standard deviation, Root-Mean-Square (RMS) and non-dimensional parameters such as kurtosis, skewness, waveform, crest factor, etc., are extracted from the signals [112].

2) FREQUENCY-DOMAIN
It extracts the signals in a frequency domain from the pre-processed signals to relate them with the tool state. Before extracting the parameters of the feature in the frequency domain, the Fast Fourier Transform (FFT) is used to convert the time domain into the frequency domain. The frequency-domain signals are then used to extract the parameters such as tooth frequency, peak-to-peak amplitude, spectral skewness, spectral entropy, power spectrum, etc [97].

3) TIME-FREQUENCY DOMAIN
As the machining process is dynamic, it generates non-stationary signals during machining. Therefore, Timefrequency domain features are more suitable for nonstationary signals [28]. Generally, a wavelet transform is used to extract the signals in the time-frequency domain. The author [113] uses the wavelet packet transform method for richer signal analysis in the high-speed milling process to predict the tool wear.

B. FEATURE SELECTION
Once the features are extracted into different domains, they are correlated with the machine health condition. For proper feature selection, systematic feature ranking [114] methods such as regression models (random forest regressor, decision tree regressor, linear regression, etc.), classification models (random forest classifier, decision tree classifier, etc.), and few other methods such as Pearson's correlation coefficient, Principal component analysis (PCA), etc. are used, which helps to rank the important feature related with the machine health condition.
Pearson's correlation coefficient (Pearson's r coefficient) is generally used to select the extracted feature in a milling operation. Pearson's r coefficient gives the correlation between the tool wear and extracted features [35].
Equation (1) shows the Pearson's r coefficient, x and y represent the extracted feature and tool wear condition, respectively. The value of r varies from −1 to 1. Zero indicates no correlation, while 1 and −1 indicate a strong positive and strong negative correlation [52]. The correlation can be classified into three groups based on the value of r: weak correlation (0 < r < 0.3), moderate correlation (0.3 < r < 0.7), and strong correlation (0.7 < r < 1) [115]. Generally, the features which are having a correlation having an ''r'' value greater than 0.7 (r > 0.7) are selected.

VIII. DATA-DRIVEN DECISION-MAKING ALGORITHMS
Different monitoring and prediction Machine Learning (ML) models are available to analyze sensor data used in datadriven models. The author of [28] has reviewed different monitoring models in the milling for tool conditions. These models are used to monitor the tool condition in various machining processes for deciding tool conditions. Models such as SVM, ANN, CNN, AE, LSTM, etc., are used to track the performance of the tool.

A. SUPPORT VECTOR MACHINE (SVM)
SVM is a supervised classification algorithm based on statistical learning theory [28]. The main advantage of SVM is that it shows better performance even with a large magnitude of data. In this method, a hyperplane is used to separate the data points. Support vectors are responsible for the position and orientation of the hyperplane by employing the kernel function to construct a linear algorithm as a solution for the nonlinear problem. SVM maps the nonlinear input data to a high-dimensional feature space [116]. Figure 23 shows the working principle of SVM. Many researchers use the SVM for tool condition monitoring [6], [117]- [121]. The author of [107] uses a multi-sensor fusion technique to gather signals from the machine during machining and applies the SVM monitoring model using cutting parameters and signal features as an input vector. According to [122], SVM is a suitable ML technique to predict the RUL of equipment with time-series techniques. [103] use the SVM technique to classify milling tool conditions. Discrete wavelength transform extracts the feature from sound sensor signals and found that SVM is an efficient classifier compare to other classifiers use in face milling operation [103]. According to [123], the nonlinear feature reduction and SVM estimate the tool wear and calculate the RUL of the tool [123]. The authors in [107] use the SVM and multi-sensor fusion technique to monitor the tool and workpiece deformation and found that SVM shows a good result by considering the penalty coefficient.

B. ARTIFICIAL NEURAL NETWORK (ANN)
ANN consists of nodes or units which are connected in a series of the hierarchical network. This model is inspired by the concept of working of the human brain. ANN contains the input and output layers and one or many hidden layers of nodes (neurons) connected. Figure 24 shows the working principle of ANN. Determining the number of nodes and hidden layers are challenging based on the individual's knowledge and experience. The connection between each neuron in layers is having some value called weight. These weight values of neurons obtained through sample training are adjusting such that they try to minimize the errors in output to get the best possible solution. Many researchers have applied the ANN model for monitoring tools in a milling machine, which shows better performance in the estimation of tool wear [100], [124], [125].
The author of [105] considered the publically available PHM 2010 dataset [126] for estimating the wear in high-speed milling operation (10400 rpm) by using the ANN algorithm. ANN is also used in tool wear prediction in turning operation using the multi-sensor fusion technique [115].

C. AUTO-ENCODER (AE)
AE mainly contains the two phases, encoder, and decoder, which help reconstruct the input data. Figure 25 shows the Auto-encoder architecture. The encoder is used to compress the input into Latent Space Representation (LSR), and the decoder aims to reconstruct the input from the LSR by using the decoding function. The practical application of AE is to denoise the raw data and perform the dimensionality reduction to provide more insights into raw data. AE models are generally used for fault diagnostic. In RUL Prediction, AE models typically use the extraction of degradation features. The author of [127] uses the Neural Network and sparse AE to classify very closed-bearing vibration signals. Stacked sparse AE is used to predict the RUL of aircraft engines along with Logistic Regression [128]. A combination of AE and Deep Neural Network (DNN) predicts the RUL of bearing [129].

D. CONVOLUTIONAL NEURAL NETWORK (CNN)
CNN is a feedforward multilayer Artificial Neural Network. CNN shows better outcomes in machine fault diagnosis and surface integration nitration [130]. Figure 26 shows the basic CNN architecture network [131], [132]. The double-CNN framework is used for intelligent RUL prediction, offers a robust feature extraction ability of CNN by extracting features from the vibration signals [133]. New DL architecture in prognosis is developed for RUL estimation by using deep CNN [134]. CNN was used for the multi-scale feature extraction in the time-frequency domain for developing intelligent RUL prediction of bearing [135].

E. RECURRENT NEURAL NETWORK (RNN)
RNN is a Deep Learning architecture to process the dynamic information from preceding layers using feedback connections from hidden or output layers for the next layer [136]. Figure 27 shows the RNN loop and unrolled RNN architecture [137], [138]. Long and Short-Term Memory (LSTM) is used along with RNN to overcome this limitation. The RNN and LSTM network gains great attention nowadays in many applications related to RUL prediction. LSTM-RNN is used for the calculating RUL of lithium-ion batteries [139]. RNN based health indicator on enhancing the bearing RUL prediction accuracy [140].

F. LONG SHORT-TERM MEMORY (LSTM)
LSTM is proposed by Schmidhuber and Hochreiter [141], which is an advancement of the Recurrent Neural Network (RNN) to avoid the limitations of RNN by adding information in between the memory cells. It is made to avoid dependency issues by using gates to monitor memory cells [137]. LSTM is modeled in a chain structure and can store the information for an extended period. The figure 28 shows the LSTM architecture [137], [142], [143]. The sigmoidal function (σ ) takes the output from the last cell and the current input for processing. The sigmoidal function also determines which part of the previous cell output should be eliminated from an individual cell. The authors of [126] consider the LSTM for extracting the in-depth features from the multi-sensor time-series data and temporal features to construct the new vector input for the tool wear prediction by providing it to a Nonlinear Regression Model. This validation of the models tested on PHM 2010 [126] and NASA milling datasets [126]. Table 9 shows the different decision-making models with their applications, benefits, limitations, and percent accuracy.

IX. MILLING DATASETS FOR MODEL ACCURACY PREDICTIONS
Very few publicly available milling datasets are available on which RUL prediction is applied. Most of the researchers used NASA and PHM 2010 milling datasets for the RUL prediction. These available milling datasets are considered for checking the accuracy of prediction models.

A. NASA DATASET FOR MILLING
NASA Dataset [144] is generated by considering various operating conditions on the milling machine (Matsuura Machining Center MC-510V). During experimentation, cast iron and steel material are considered workpiece material, and a 60 mm face mill with six KC710 inserted tools are used for machining. Constant cutting speed (200m/min) and variable depth of cut (1.5mm and 0.75mm) and feed rate (0.5 mm/rev and 0.25 mm/rev) are considered. Sixteen different cases are considered for a different number of runs. Acoustic, vibration and current sensors are used to capture the signals during machining.
Acoustic emission and vibration sensors are mounted on the spindle and worktable, while the current probe is attached to the spindle motor of the milling machine for capturing the signals. The authors of [126] use the LSTM algorithm for an available NASA dataset for milling. As compared to other models, LSTM shows good results for the prediction of tool wear.  z directions, respectively. For capturing tool condition signals during machining AE, an accelerometer and dynamometer are used. AE and accelerometer are mounted on the workpiece while the dynamometer is placed between the interface of the workpiece and work-table.The microscope is used to measure the flank wear of each flute. Seven different signals (vibration along (x, y, z), Cutting force along (x, y, z), and AE rms) are captured. Signals are captured for six different cutters (C1 to C6), and corresponding tool wear is available only for cutter C1, C4, and C6 in the dataset. The LSTM model leads the high precision, around 92.54%, 92.04%, and 89.56% for cutter 1, 4, and 6, respectively, for the PHM dataset. Table 10 shows the accuracy of different models for predicting tool conditions [126].
Few more publically available datasets like the NUAA Ideahouse milling machine tool wear dataset [160], ''Systemlevel Manufacturing and Automation Research Testbed'' (SMART) at the University of Michigan [161] can be used for the RUL prediction in the future.

X. DISCUSSION
This data-driven predictive maintenance approach to estimate the useful life of the tool provides valuable and critical information about machining complex operations. From literature, it was found that sensors like accelerometer, dynamometer, current, acoustic emission are effective and preferable in datadriven condition monitoring. Even though the initial setup cost increases due to expensive sensors and data analytics, overall benefits in decreasing downtime and increased industry productivity are significant.

A. THE SURVEY OUTCOME
This survey helps to understand the importance of data-driven PdM for RUL estimation in milling. The RUL of a machine is the amount of time it will likely run before it has to be repaired or replaced. Accurate RUL estimation can enable engineers to schedule their maintenance activities, optimize the use of maintenance resources and avoid unnecessary delays due to machine downtime. As a result, estimating nearly accurate RUL in predictive maintenance plans is essential. From an extensive literature survey, it is found that usage of multi-sensors gives more promising prediction results compared to a single sensor technique. Decision-making AI based algorithms like ANN, SVM, and LSTM are showing good responses for prediction accuracy.

B. CHALLENGES AND LIMITATIONS IN THE ESTIMATION OF RUL
From the literature survey, the authors found some challenges and limitations in this area which are as follows: • In-depth RUL estimation needs to be done by considering the machine performance from multiple faults perspective. These faults can be analyzed by collecting data from different sensors. However, this multi-sensor data is varied in formats, size, and measurement units, making it difficult to investigate using one common analysis framework. So, the development of technical AI-based frameworks and algorithms for effective utilization of the multi-sensors data is challenging and needs more attention in the future research work of RUL estimation.
• Data captured via sensors play a major role in implementing the intelligent RUL estimation setup. However, environmental factors such as factory floor noise, environmental temperature, working conditions (flood lubrication, machining chips, etc.) affect the input signals of the sensors leading to the generation of noisy data. This noisy data affects the accuracy of AI-based RUL predictions. So, effective data pre-processing techniques, outcome validation metrics, and autocorrecting AI algorithms are required.
• In order to develop an unbiased AI-based RUL estimation model, a large amount of historical data is required, which would have samples from various fault scenarios.
The collection of such a large amount of data is sometimes unfeasible from the cost and time perspective. So, data augmentation techniques for the generation of synthetic data would be required.
• It is observed that similar prediction algorithms can't be applied for different fault data which is captured in different conditions. It would require an amalgamation of multiple fault prediction algorithms which are part of one AI system.
• Even though multiple sensors give a high confidence level decision-making model, it is difficult to identify the redundant and noisy signals from different sensors while performing a data prepossessing and feature extraction process.
• The results of the RUL estimation AI models must be more interpretable and logically understandable for users to comprehend why a certain RUL prediction was made at a certain instance of time and how the value is calculated.

XI. ADVANCEMENTS IN RUL PREDICTION
RUL prediction using Artificial Intelligence techniques has undergone major evolutions over the past few years which encompasses shallow-structure-based machine learning techniques to n-hidden layer-based deep learning techniques. In recent years, AI advancements have further strengthened the RUL estimation strategies. AI-led techniques such as generative adversarial networks, explainable AI, transfer learning, domain adaption, digital twin, adversarial machine learning, and domain adaption will help to resolve some of the open challenges faced in RUL estimation in predictive maintenance. Figure 29 and Table 11 highlight some of these open issues and the solutions provided by these techniques with references.

1) GENERATIVE ADVERSARIAL NETWORKS (GAN)
In the manufacturing industry, sensors mounted to collect condition data of machines can malfunction due to inconsistent power supply and various such issues. In such cases, there  could be a data deficit. Figure 30 shows the working principle of the Generative Adversarial Network (GAN) [183]. GANs can generate synthetic data in place of missing sensory values due to sensor failure. Shuai Zheng and Chetan Gupta propose discriminant GAN for equipment health classification to generate more separable data samples belonging to different health degradation stages of machinery [184]. Recently many researchers have proposed the use of GAN's for the generation of anomalous data or anomalous features [185], [186]. However, most of these techniques involve the conversion of vibration-based signals into images. A potential research approach is to see how such methods can be applied to more complex datasets consisting of vibration and time-series data [187]. A more thorough study on the physical credibility of the generated samples and the impacts of these synthetically produced multiple faults on algorithm results is needed.

2) EXPLAINAIBLE AI (XAI)
Most of the current machine learning models do not explain the predictions made. Explainable AI (XAI) techniques are an efficient model prediction interpretability tool that helps machine supervisors better understand fault diagnosis and prognosis. Figure 31 shows the working principle of XAI [188]. The authors of [189] have demonstrated the power of combining Xplainable AI techniques such as ELI5 and LIME and domain knowledge for RUL estimation in industrial machinery.
Explainable AI has a promising future in machine diagnosis, and various research directions can be envisioned. The XAI interpretability results need to be further evaluated based on the quality, utility, and satisfaction of the explanations and the effect of explanations on the model's success and the supervisor's confidence and reliance [190]. Several XAI evaluation measurement techniques have been proposed recently, such as explanation satisfaction scale, utility checklist, explanation trustworthiness, and many more [191], [192]. In the future, counterfactual explanations will help the industry take corrective measures [193]. Counterfactual Explanations show how to make the smallest modifications to the input data to get a particular outcome. Consider a case wherein the model predicted an anomaly in machinery's working, resulting in decreased RUL of the machine. Counterfactual explanation in such a case would tell the machine supervisor what changes in the operation of the machinery (input) would have avoided the anomaly and further improved its RUL [194].

3) TRANSFER LEARNING (TL)
Dynamic operating environments of the machinery can affect the model prediction. Transfer Learning (TL) algorithms can help improve model accuracy for pre and post-model deployment in dissimilar data distribution across the source and target domains. Figure 32 shows the working principle of TL [195].
The authors of [177] propose a novel transfer learning technique based on multiple layer perceptron (MLP) for dissimilar data distribution problems in RUL prediction of bearing machinery. However, many scopes to use self-supervised learning [196] and self-supervised contrastive learning [197] algorithms are fine-tuned on limited data. It is proved that self-supervised algorithms work better in data scarcity situations and where data labeling is time-consuming and costly. In self-supervised learning, the learning model trains itself by using one portion of the data to predict the other and produce labels automatically. Contrastive learning approaches are a class of self-supervised algorithms that learn to encode what makes two samples identical or different in order to construct representations. It is a discriminative method for grouping related samples together and separating diverse samples.
These approaches are particularly useful in transfer learning, wherein the model trains only on the distinctive high-level features in the source domain, thereby reducing training time.

4) DIGITAL TWIN (DT)
Digital twin (DT) is the hybrid simulated version of the physical and data-driven machinery setup. It can help provide real-time condition monitoring of machinery over the cloud infrastructure. Figure 33 shows the digital twin approach used in milling machines [198]. Digital twins incorporate multi-physical, multi-probability variables from the various domains by using sensor technology, physical model, and simulation model [199]. While modeling DT model for milling machine, it can be divided into DT descriptive model which describes structural and mathematical equations based on parameters and experience, DT mapping model which helps to map the real-time working condition with DT system and DT intelligent model to identify the irregularity in the system to predict the fault with the help of artificially intelligent algorithms [200]. In the Cyber-Physical System (CPS) scope, a digital twin may be described as the actual product's digital mapping model [201]. DT is widely used for predictive maintenance, fault diagnosis, detecting anomalies present in systems, inferring quality of the product, real-time monitoring of the system, etc [35], [198], [202]- [204]. The authors of [178] propose a deep-learning-based digital twin model for a lithium-ion battery to map the relationship between various health indicators such as the cell voltage and the cell state-ofcharge (SOC) on RUL estimation. The authors of [203] use a physical-based simulation model and digital twin concept to calculate the RUL to enable predictive maintenance of the machine.
Digital Twin is the model and data carrier that can carry out physical mapping in digital or virtual space and then bridge the digital and real world. Along with a predictive maintenance approach, one can develop a Digital Twin (DT) for the milling machine or critical part of the milling machine. DT can simulate the whole machining process using realtime process parameters along with consideration of machine degradation. In the context of RUL estimation, a twin model can be used to predict the useful functional life of the critical parts of the system by doing real-time simulation.
As in the digital twin, data exchange occurs between the physical and digital systems in a bi-directional way. DT can provide a more accurate RUL estimation with higher reliability. The DT-based approach provides more insightful information about the system by providing feedback between the real and digital world at every stage. If there is an anomaly in the machining process, the digital twin provides feedback to the controller for making necessary changes. DT approach can also help to increase the functional RUL of equipment by taking action against identified abnormalities in the system or by doing parameter optimization at an early stage. So, digital twin-assisted predictive maintenance with the hybrid modeling approach can be used to predict the RUL of the system more accurately.

5) ADVERSARIAL MACHINE LEARNING(AML)
Some machine learning models are efficient in making predictions but might not be effective against illegal intrusions. Adversarial Machine Learning (AML) models secure the model structure against any adversarial attacks that can jeopardize the robustness of the predictive maintenance framework. Figure 34 shows the working principle of Adversarial Machine Learning (AML) [205], [206].  vulnerable to adversarial attacks and can hamper RUL estimation to a large extent [169]. Leveraging the benefits of Blockchain technology can be one of the future research directions for building a trustworthy XAI model against adversarial attacks. Decentralized AI systems are enhanced by blockchain, which provides an open-source and freely available digital ledger distributed among AI agents through peer-to-peer networks [207].
Since blockchain makes AI decisions transparent and visible to all AI nodes on the network, it becomes more difficult for AI agents to change or reject them [208]. Blockchainenabled RUL estimation models can be resilient against security attacks as the RUL data can be made decentralized, and the integrity of the data can be maintained on the blockchain network.

6) DOMAIN ADAPTION (DA)
RUL models are built considering a particular machinery setup, but a scenario might occur when they need to be applied to another machinery setup. This new machinery setup is generally different from the previous one, and the model prediction accuracy might get hampered. Domain Adaption (DA) can help in efficient feature extraction in unlabeled machinery data, a common challenge faced by most real-time industries. Figure 35 shows the working principle of Domain Adaption (DA) [209]. The author [172] proposes a contrastive adversarial domain adaptation (CADA) method for cross-domain RUL prediction, and such techniques can help the model being robust against varying setups.
Domain adaptation analysis has mostly focused on homogeneous cases in which the source and target input spaces share the same characteristic feature set. However, real-time complex industrial applications are heterogeneous, consisting of varied condition monitoring scenarios. Sensor setups are also heterogeneous in nature, with variations in the type, location, and number of sensors deployed. The research on heterogeneous unsupervised domain adaptation, particularly when applied to complex physical structures, is still at a nascent stage, but it has a lot of potentials, especially for industrial applications. Another prospective research direction would be the use of simulation technology for the creation of the source domain and adapting it to the real-life target domain.

7) MULTI-MODAL/MULTI-SENSOR DATA FUSION
Different types of sensors, instruments, measuring methods, experimental setups, and other sources are used to collect information about a phenomenon, such as predicting RUL. Multi-modal data fusion provides numerous benefits such as achieving a more coherent image and global view of the system in question, enhancing decision making, analyzing specific scenarios about the system through different modalities or time, extracting information from data for varied purposes. Figure 36 shows the multi-sensor data collection using Multi-Modal Data Fusion (MMDF) [210]. Anqi He & Xiaoning Jin implemented multi-modal data fusion on the Ion-Mill Etching process by collecting multi-sensor data from different run-to-failure cycles [211]. The designed method presented a more systematic failure prediction methodology. Using heterogeneous sensory and operational data under diverse operating conditions and contexts. One of the future research directions in multi-modal data fusion strategy would be to accurately rate the important sensor modalities while simultaneously distinguishing the important elements within each modality. Such a technique can guide the RUL estimation system for the contribution of each sensor for better diagnosis and prognosis. Also, most of the multi-modal data in smart manufacturing setup needs to be collected in dynamic environments indicating a variation in the data itself.
Hence the design of online and incremental data fusion models that can learn new knowledge without losing historical knowledge is needed as part of future research work. Also, the data quality in multi-modal might not be very good, and the data can contain a lot of noise. Hence deep learning models for low-quality multimodal noisy data need to be strategized urgently [181].

XII. RECOMMENDATIONS FOR FUTURE WORK
Apart from the above-mentioned future research work in each existing advancement, the authors would also like to put forth few more potential research directions in RUL estimation: • A hybrid modeling and decision-making approach for RUL: It was found that many researchers individually consider the data-driven model or model-based approach to calculate the RUL of the tool, which may contain prediction errors due to uncertainties in individual models. A combined data-driven and model-based approach along with hybrid decision-making algorithms may decrease the errors in RUL prediction.
• Machining parameters optimization: Condition monitoring during predictive maintenance can also help optimize the input parameters of the machine to improve the RUL of the system. Researchers can consider the real-time process parameters and degradation machine state for optimizing the input process parameters.
• Integrated de-noising method: Sensor signals are contaminated by the changes in sensor working conditions, disturbance due to large machinery startup, highfrequency interference, etc. It is challenging to remove or filter the noise from the raw signals to improve the reliability and accuracy of the signal to extract the original features. To overcome industrial sensor signal de-noising, one can use integrated de-noising based on energy-correlation analysis and wavelet transform packet.
• Robust Condition-Base Predictive Maintenance (CBPM): In a complex system, CBPM is still a challenging area due to heterogeneous data, remote location monitoring, and network infrastructure. The data collected from the system is in heterogeneous (discrete) forms, such as system state data, system errors data, system, and environmental sensors data, manually collected operator observation data and, maintenance action data, etc. For implementing the robust Condition-Base Predictive Maintenance (CBPM) for a complex system, researchers can use smarts sensors, a hybrid-predictive analysis model, and secure network infrastructure. The smarts sensors are capable of handling heterogeneous data. Hybrid predictive analysis models help analyze the data to produce the prognostic alarms, estimate RUL of key components, maintenance action needed, and comprehensive health management of the system. Secure network infrastructure helps to provide an extensible and flexible framework to apply CBPM for complex systems successfully.
• Prescriptive maintenance: Prescriptive maintenance approach aiming to automize the maintenance process. It is not only monitored, predict, and provided the maintenance recommendations but can able to take its own maintenance steps decision with the help of advanced ML/DL and AI techniques.
• Reinforcement Learning: Reinforcement learning is a type of machine learning in which a program learns to perform a task by repeatedly interacting with a complex environment. Figure 37 shows the working principle of Reinforcement Learning (RL) [212]. The computer explores the world using an iterative trialand-error method. This investigation produces evidence that the computer uses to decide the best course of action to complete its task. Reinforcement learning can be utilized for real-time decision-making capability in predictive maintenance techniques. The reinforcement agent can be used to optimize model predictions for RUL and achieving high utilization of resources simultaneously [213]- [215].
• PHM as a Service: Cloud Manufacturing applies cloud computing technology in the manufacturing domain [216]. Cloud Manufacturing is a customercentric manufacturing paradigm that takes advantage of on-demand access to a pooled pool of diversified and dispersed manufacturing tools to form a single product [217]. Prognostics Health Management (PHM) can be offered as a service on the cloud providing SaaS, PaaS, and IaaS facilities. The service provider can provide Cloud-based data acquisition software and models for prognostic applications. The manufacturer can build a maintenance model using available platforms and leverage cloud infrastructure (storage and networking resources) to implement solutions [216].
• Big data sensing: In a data-driven model, data signals are collected using sensors. In the multi-sensors technique, as the number of data-generating sensors increases, a large amount of sensing data is collected. This large amount of sensing data is difficult to handle using traditional methods. Big data sensing techniques VOLUME 9, 2021 are required to handle such a large amount of data for sensing applications. Matured infrastructure needs to be developed to collect, analyze, and process such large data by exploring more in Big Data sensing techniques.
• Physics-induced deep learning prediction: Physicsinduced machine learning is a promising approach to stimulating interpretability in machine learning models, especially for applications beyond the image processing domain where visualizations cannot be easily extracted. Prior knowledge about the system's physical mechanics integrated with deep learning-based knowledge can help amplify the performance of the system and improve its interpretability [187], [218].
• Generation of representative/benchmarking datasets: One of the key demands of any deep learning application is the need for representative or benchmarked datasets which can be used to represent real-world scenarios. Computer vision and natural language processing domains have ample representative datasets, which are key drivers for exemplary research in those domains. However, in the context of Predictive Maintenance, the lack or insufficiency of representative datasets has discouraged the application of deep learning approaches in industrial applications to a certain extent. Generation of representative datasets using data augmentation techniques can be one of the potential research directions.
• Federated Learning: Centralised data for applying machine learning and deep learning models can be a practical challenge for real-time manufacturing industries. Consider the case study of a milling machinery company that wishes to predict a costly milling machine's RUL. Foremost the models require training data. However, the supervisor will have to test many milling machines before they failed to obtain the data. A less expensive solution would be to get client operating milling machine data representing real-world scenarios and operating setups for the milling machinery. The client training data would be a practical and cheaper solution. However, the client might be apprehensive about sharing their data with the company considering privacy concerns and regulatory impediments. Another challenge could be that the client might be geographically located in another country, and sharing such enormous sensory data would be infeasible. Federated learning comes to the rescue in such scenarios. A server synchronizes a network of nodes in federated learning, each of which has training data that it cannot exchange directly. The nodes each train their model, which they then exchange with the server. Figure 38 shows the framework of the FL in the context of Industry 4.0 [219]. Federated learning aims to ensure anonymity and reduce communication costs by not transferring the data itself. Since federated learning allows for training on a large volume of private data by only transmitting small models across the network, it has a lot of potential for industrial predictive maintenance [220].

XIII. CONCLUSION
This paper reviews the data-driven predictive maintenance for the RUL estimation of the milling cutting tool. Existing literature shows that RUL prediction is an emerging area and has a lot of scope for development in industry 4.0. The paper also explores various open research questions faced by PHM researchers in this domain. The authors have discussed different data-driven monitoring methods, feature extraction methods, and decision-making models as well. Also, the paper covers datasets related to milling under various operating conditions to compare the accuracy of the prediction model for tool wear estimation. Effective RUL estimation aims to serve the purpose of Predictive Maintenance (PdM). Identifying the RUL of machinery can help us to strategize the predictive maintenance activities for the machinery. Accelerometer, acoustic, dynamometer, current are mainly used sensors for collecting the data signals from the milling machine. The multi-sensors technique provides better prediction and more trustable results as compared to the single sensor technique. Due to the non-stationary behavior of acquired signals, the time-frequency domain wavelet analysis is preferable for milling feature extraction. ANN, SVM, LSTM are generally used as decision-making algorithms for condition monitoring and RUL prediction of the tool during the milling operation. The paper also presents challenges, limitations, AI advancement in RUL prediction, and future directions related to this area.