Contact Force Detection of Grinding Process Using Frequency Information and Differential Feature on Force Signal

In grinding tasks, the contact force has a significant impact on product surface quality. Therefore, force-sensing technology to detect contact force is important. Although force sensors are widely used for contact force detection, the response of the force sensor includes sensor-specific errors such as offset. In this paper, we propose a contact force detection method based on the combination of frequency information and the differential feature (<inline-formula> <tex-math notation="LaTeX">$\Delta F$ </tex-math></inline-formula>) of the force signal. The use of high-frequency information reduces the influence of force sensor-specific errors. However, contact force detection using only high-frequency information causes a time delay in the detected value relative to the measured value depending to the frame size of time window used for frequency analysis. To reduce the time delay, high-frequency information and <inline-formula> <tex-math notation="LaTeX">$\Delta F$ </tex-math></inline-formula> are integrated by inputting them into an long short-term memory (LSTM)-based force detection model. To verify the effectiveness of the proposed method, we compared it with a force detection model based on an FNN and CNN on a dataset of plane grinding tasks. Consequently, the detection accuracy of the LSTM-based model was superior to that of the FNN and CNN models. Compared to the LSTM model using only high-frequency information as input, the detection accuracy was 26% higher when the error was small and 57% higher when the error was large. In addition, the time delay was reduced from 166 ms to 30 ms using <inline-formula> <tex-math notation="LaTeX">$\Delta F$ </tex-math></inline-formula> as the input. The frequency information and <inline-formula> <tex-math notation="LaTeX">$\Delta F$ </tex-math></inline-formula> are features calculated from the same force information dataset; therefore, no additional dataset is required.


I. INTRODUCTION
Automation of tasks with changing contact states, such as assembly and grinding, which are called contact-rich tasks, are widely studied in robotics [1], [2], [3], [4]. In contactrich tasks, the contact force with the environment changes significantly depending on the contact states; hence, forcesensing methods for detecting contact force are important. In grinding tasks, the contact force has a significant impact on the quality of the product surface; hence, controlling the contact force to the desired value is required [5], [6].
The associate editor coordinating the review of this manuscript and approving it for publication was Sotirios Goudos .
Studies have been conducted to control contact forces during grinding tasks by attaching mechanically compliant instruments to the end effector (EEF) [7], [8], [9]. However, because of the design of instruments for each task, they have limited applicability. Therefore, force controls such as hybrid control [10] and impedance control [11] have been widely used to improve the performance of grinding tasks [12], [13], [14], [15], [16]. Although these methods improve the performance of the contact force control by improving the control system, the performance of the control strongly depends on the ability to detect the contact force.
Hence, improving force control by using force information with less noise is being researched. The reaction force observer is a technique for detecting the external force from the joint torque using an observer instead of acquiring it using a force sensor [17], [18]. In [17], a notch filter was implemented during data processing for better force detection during a polishing task. Force sensors are widely used to directly detect contact forces [19], [20]. However, force sensors have sensor-specific errors such as temperature drift and offset. Therefore, calibrating these errors require large efforts [21], [22], [23]; however, addressing all sensor-specific errors is difficult.
In grinding tasks, the response of force sensors includes the vibration of the grinding tool in addition to the contact force and sensor-specific errors. To remove the influence of vibrations, a low-pass filter (LPF) is generally used for force control. However, because both the contact force and the sensor-specific errors exist in the low-frequency band, detecting only the contact force using the LPF is difficult.
In this study, we focused on the frequency information of the vibration caused by the tools. A relationship exists between the frequency information of the force signal and the surface roughness [24]. In the sanding task, Nguyen et al. calculated the command values for control using the surface roughness estimated from frequency information [25]. Studies were conducted to estimate the contact force using the frequency information of the force signal [26]. In [26], a force detection method was proposed to reduce the influence of sensor-specific errors by using the frequency information of the force sensor response values. When the sensor-specific error is small, the contact force detection using only the vibration information of the tool is less accurate than using the response value of the force sensor. Moreover, there is a time delay between the actual contact force and the detected value.
To solve these problems, we propose a contact force detection model that combines the frequency information and differential features ( F) of the force signals. Table 1 summarizes the characteristics of each piece of information in the contact force detection. Using the method in [26], estimating the contact force from high-frequency information is possible without the effect of drift. However, it is often affected by noise, which deteriorates the detection accuracy. In addition, a time delay occurs between the actual contact force and the detected value depending to the frame size of time window used for frequency analysis [27], [28]. To solve the above problems, contact force detection was performed by combining F and high-frequency information. F was insensitive to drift and noise and obtained a stable output. Although the absolute value of the contact force is not detectable from F only, it is calculated from high-frequency information. Time delay is compensated using differential information in general, and the use of F is expected to reduce the time delay that occurs when only high-frequency information is used.
The main contribution of this study is to show that the combination of frequency information and F improves the detection accuracy of the contact force and reduces the time delay of the detected value. Fig. 1 shows an overview of the proposed force detection model. In the proposed model, F is the input to the NN, in addition to the high-frequency information derived from the tools. The differential feature of the raw force information comprises all signals other than the DC component. Therefore, the differential feature of raw force information includes not only the contact force but also the vibration of the grinding tool; hence, we calculated F by differentiating the LPF-processed signal from the raw force information to extract only the change in contact force. As the frequency information and F are independent information obtained from the same dataset, collecting additional teacher data is not required. Therefore, the advantages of the proposed method are as follows: • The proposed method improves the detection accuracy using features calculated from the same dataset as the frequency information.
• The delay caused by the frequency analysis is compensated by the input of the differential information to the NN. For frequency analysis, a Mel-spectrogram (MS) [26], [29], [30] was used, considering the time variation. In [26], the use of MS provides higher detection accuracy than that of the short-time Fourier transform (STFT). In addition, the MS represents frequency information with lower dimensions than the STFT. Therefore, the number of network parameters are reduced by using MS. Although the correlation between the frequency information and the contact force is strong, the relationship is highly nonlinear. Additionally, a high time constant is required for force control. To solve nonlinear regression problems with a limited amount of time-series data, NN models such as 1D-CNN [26], [31] and long shortterm memory (LSTM) [30], [31] have been used. However, convolution layers are known to reduces the performance under very noisy conditions [34]. In this study, we used an LSTM-based model for contact force detection.
We evaluated the force detection accuracy of grinding a flat surface using a 6-DOF manipulator and demonstrated the effectiveness of the proposed method. The remainder of this paper is organized as follows. The experimental setup is described in section II. In Section III, the principles of the proposed method are explained. In Section IV, experiments and results are presented to demonstrate the effectiveness of the proposed method. Finally, Section V summarizes the paper.

II. ROBOT AND CONTROLLER
A. EXPERIMENTAL SETUP Fig. 2 shows the setup of the experimental machine used in this study. A 6-DOF manipulator was used in the experiment and a 6-axis F/T sensor (EEF-type F/T sensor) was attached to the tip of the hand. A grinder is attached to the end of the F/T sensor. Preparing the contact force as a label is necessary for training the NN. In this study, we adopted the force information obtained from the board-type F/T sensor [35] as the correct label. As this sensor is unattached to the robot, it is not affected by the offset owing to the work posture. In addition, as the board-type F/T sensor is not directly attached to the grinder, it was not easily affected by the grinding tool vibration. The sampling time of the force sensors for both the EEF-type and board-type was 1 ms. Fig. 3 shows a block diagram of the control system, and Table 2 lists the control parameters. In the figure, θ and τ represent the angle and torque of each joint, respectively, and  P and F are six-dimensional values representing the position/ posture and force/torque, respectively. The subscripts res and cmd denote the response and command values, respectively, and the superscripts s and w denote the sensor and absolute coordinate systems, respectively. This is a general force/position control system in which the position, velocity, and force are fed back with gains K p , K d , and K f , respectively. As we focused on the 1D contact force, we set K f gain only in the Z-axis direction. F s res is the force/torque in the sensor coordinate system and F w res is the force/torque in the absolute coordinate system calculated by the coordinate transformation to F w res . F w res is the raw response of the force sensor, which include the vibration of the tool attached to the tip of the sensor. Therefore, the signal processed by the first-order LPF with a cutoff frequency of 5 Hz was used for force control. A disturbance observer (DOB) [36] was used to ensure disturbance of the position and force control system.

III. METHODS
In this section, we first describe the relationship between the contact force and frequency information in grinding tasks. Next, the frequency information and F used as the inputs of the NN are described. Finally, a force detection model based on LSTM was discussed.

A. MEL-SPECTROGRAM
In tasks such as grinding and cutting, the vibration of the tool also affects force information. Therefore, extracting features from the oscillatory signals is necessary. In this study, as the frequency changes over time, we used MS as a feature. The MS is calculated based on the STFT and is obtained by applying a nonlinear transformation to the frequency axis of the STFT. The MS is expected to represent the frequency information with a smaller number of dimensions. The mel scale (mel) is converted from the frequency (f) using the following equation: Here, f 0 and m 0 are the break frequency and the scale factor, respectively. f 0 was set to 700 Hz and m 0 is a parameter dependent on f 0 . The number of acquired samples is limited because of the online detection of force information. Therefore, setting the sample size is important to calculate the STFT. A large sample size increases the calculation costs. In addition, the force detection accuracy deteriorates for instantaneous changes, because more time-series changes are considered. When the sample size is set to small, the resolution of the frequency axis is reduced, and the analysis results accuracy deteriorates. Consequently, the accuracy of force detection worsens. In this study, the STFT was calculated using 512 samples (512 ms) of force information, and using a frame size of 256 ms, frame hop of 32 ms, and the Hann window function. A 64-channel Mel filter bank is applied to the calculated STFT. Therefore, the MS is a 9 × 64 matrix with nine time and 64 frequency domains. Fig. 4 shows the force and frequency information for each force sensor. The left figure shows the case that is without the error owing to the offset, and the right figure shows the case with the offset. Fig. 4(a) shows the raw information of the EEF-type F/T sensor, information after LPF processing, and response of the board-type F/T sensor. Fig. 4(b) shows the MS calculated from the raw signal in (a), and Fig. 4(c) shows an expanded version of (b) with 0 Hz to 40 Hz. From the results of (a) and (b), we confirmed that MS varies with the contact force. Therefore, using this relationship, we detected the contact force from the MS. However, as shown in the right figure of (c), the low-frequency information is affected by the offset; hence, it is necessary to remove the low-frequency information from the MS. Low-frequency information removal should be designed depending on the task. In this study, we removed the lower 5-dimensional features from the 64-dimensional MS. In addition, the upper 14-dimensional features were removed in the same manner to reduce the computational costs. As the 6th order corresponds to 40 Hz and the 50th order corresponds to 400 Hz, the features were input considering only the information from 40 Hz to 400 Hz.

B. DIFFERENTIAL FEATURE OF FORCE INFORMATION
The contact force is included in the low-frequency information, which also includes the sensor-specific errors. In this study, low-frequency information was removed from the MS to reduce the influence of the errors. This process simultaneously also removed the contact force. Consequently, the detection accuracy is reduced. To improve detection accuracy, it is necessary to calculate the features related to the contact force separately from the MS. Fig. 5 shows the MS when the frequency resolution was set to approximately 0.5 Hz. When the frequency resolution improved, instantaneous changes in the frequency information were observed at 1-2 Hz. This was caused by a change in the contact force. Thus, by increasing the frequency resolution, the changes in the contact force can be detected. However, the improvement in the frequency resolution requires an increased number of samples used in frequency analysis. In online contact force detection, the number of samples that can be used is limited. Therefore, detecting changes in the contact force is impractical using frequency analysis. Hence, to recognize the change in the contact force, we used the differential feature of force information ( F).
LetF w res be the signal after LPF processing of the raw force information obtained from an EEF-type force sensor, t be the current time of sample acquisition, and τ be the frame size of the differential features. The F at time t is expressed as follows: (2) VOLUME 10, 2022    6 shows the FFT results for F calculated from the impulse input with an amplitude of 1. Here, τ is set to 512 ms. Depending on the τ , the frequency of the signal is extracted from the raw force information changes. Therefore, τ was set according to the frequency of the contact force. In this study, τ was set to 512 ms because the contact force contained information below 2 Hz, as shown in Fig. 5.

C. PREDICTION MODEL BASED ON LSTM
The 1D-CNN and LSTM are generally used in NN with frequency information as the input. In this study, LSTM was used as the contact force detection model. Using LSTM as the detection model, the temporal variation of frequency information is expected to be considered for force detection. The same effect is expected when using 1D-CNN. However, as 1D-CNN emphasizes changes in the kernel, the model using LSTM can consider long-term changes in tasks with less instantaneous changes, such as the grinding task. Fig. 7 shows an overview of the force detection model used in this study, and Fig. 8 shows the details of the input and output of the LSTM. The MS and F inputs to the NN were normalized using the following equations: Here, m and ω are indices in the time and frequency domains of the MS, respectively,MS andˆ F are normalized values, and min and max are the minimum and maximum values, respectively. As shown in Sec. III.A, we extract 6-50 dimensions of the frequency information ofMS, defined asMS , and input them into LSTM. The last output of LSTM and F are then combined and input to the FNN. The output  of the FNN is detected contact force in z-axis (F z ). LSTM consists of two hidden layers, with 50 nodes in each layer. The FNN was placed after the LSTM and consisted of two hidden layers with 50 nodes in each layer. The ReLU was used as the activation function for the LSTM and FNN. Adam [39] was used for parameter optimization, and we set the learning rate to be 0.001, the decay rates of first and second order moments to be 0.9 and 0.999, respectively, and value to prevent zero-division to be 10 −8 . Model under or overfitting is a problem in machine learning. The number of parameters required for contact force detection is not large, and underfitting owing to the small number of parameters is not a problem. We dealt with over fitting by setting an appropriate number of epochs, referring to early stopping method. Frequency-domain features are also expected to be effective in preventing model overfitting.
Overfitting is prevented using frequency-domain features rather than time-domain features [37]. It is suggested that more network parameters are required when time-domain features are used [38]. The use of frequency-domain features reduces network parameters than that of time-domain features, which consequently, preventing model overfitting.

IV. EXPERIMENT
In this section, we first describe the details of the datasets used for validation. Next, to verify the effectiveness of our method, we compared the inputs and NN models by changing their values. Finally, the scope of its application was described.

A. PREPARATION OF DATASETS
We obtained datasets for three cases to verify the detection performance of the model. Dataset1 is a dataset of a simple grinding task without movement of the X-Y plane. In this dataset, as shown in the left figure of Fig. 4(a), there is no deviation in the signal obtained from the EEF-type and board-type F/T sensors. However, Dataset2 include deviations between the force information obtained from the two sensors, to verify the robustness against the sensor-specific errors in the force information. In this study, we reproduced sensor-specific errors by programmatically shifting the offset by +2 N. Dataset3 is a dataset of linear movements in the X-Y plane after contact, with the origin at the point of contact and movement in the Y-axis direction at 0, 1.5, 3.0, 4.5, and 6.0 mm/s, respectively. This dataset was used to verify how the vibration due to lateral movement and the dynamic friction between the tool and the surface affect the detection accuracy. Table 3 presents a combination of training and test data. In condition b), we tested whether the model trained on data from a simple grinding task was affected by the offset. In condition c), in addition to Dataset1, Dataset2 including offsets, was also used for training. To improve the detection accuracy, F is input to the NN in addition to the frequency information in this study. However, it is also possible for an NN to recognize low-frequency information as a differential feature. We compared the case where the NN itself recognizes low-frequency differential features with the case where F is explicitly input to the NN as a feature. In condition d), we compared the training data with and without the movement data and verified the effect of the movement during the grinding process on the detection accuracy.
As the robot trajectory generation is highly reproducible and the number of samples that can be obtained is not large, the collection of datasets under the same conditions will only yield the specific relationship between the frequency and force information. Therefore, we obtained data for multiple contact forces by inputting five patterns of 0-4 mm at intervals of 1 mm as the commands in the z-axis direction (P z ). For each command, we obtained 7 and 3 data for training and test, respectively. Therefore, Datasets 1 and 2 comprised 35 ( a) and b), 70 and 30 for condition c), and 175 and 75 for condition d), respectively. Each datasrt consists of 17 s time series sample including contact and non-contact conditions. As we focused on the 1D contact force in this study, only the contact force in the z-axis direction (F z ) was obtained.

B. VALIDATION MODELS AND INPUTS 1) INPUTS
To verify the difference in detection accuracy depending on the input features, we compared the following two inputs in addition to the proposed method defined as (MS+ F): the input of raw force information defined as (RAW), and the input of MS only defined as (MS). The parameters shown in Sec.3.A are used to calculate MS. The RAW consists of 512 ms of raw force information as well as the time series length used to calculate MS. However, the sampling rate is changed from 1 ms to 4 ms to reduce the sample numbers of the NN input. Therefore, RAW consisted of 128 samples (512 ms) of raw information.

2) MODELS
To verify the detection accuracy of the models, detection models based on FNN [6] and 1D-CNN [26] used in addition to LSTM. The model based on the FNN consists of an input layer, two hidden layers with 100 nodes, and an output layer with one node. The input layer had 128 nodes to match the RAW. In [6], the output is the surface roughness of the product after completion of work; however, it is modified to output the contact force to fit the study. In the model based on 1D-CNN, we also examined the case in which the inputs are changed to RAW, MS, MS+ F, as well as LSTM. Fig. 9 shows the force detection model based on 1D-CNN using MS+ F as the input. The model based on the 1D-CNN consists of two convolutional layers and one maximum pooling layer, followed by two hidden layers with 50 nodes connected to the FNN. In each model, ReLU was used for the activation function and mean squared error (MSE) for the loss function. Each model was then trained for 1000 epochs. Table 4 compares the root MSE (RMSE) [N] in the test dataset for each model under conditions a) and b). From condition a), the force detection model using LSTM is superior to that of VOLUME 10, 2022 using FNN and CNN. By comparing the results with those obtained using LSTM in condition a), the detection accuracy is confirmed higher when F is used as the input than when RAW is used as input. the detection accuracy was confirmed to be improved by using the values related to the contact force as the input. From condition b), force detection is possible even when F is used as input, with the effect of offset removed, as is the case with the conventional method. The detection accuracy was improved by 53% in condition a) and by 26% in condition b) compared to the case where only MS was used. Fig. 10 shows the detected values and the correct label for the test data in condition b) when LSTM is used as the detection model. Here, the inputs are MS only and MS+ F (proposed method). Fig. 10(a) shows the detected values for one trial, and Fig. 10(b) shows the response in the range of 3.5-4.0 s from the trial in Fig. 10(a). As shown in Fig. 10(b), Our proposed method detected the rise of contact force more consistent with the correct label than that for the case in which only MS is used as the input. The time delay at the rise is 30 ms when MS+ F was used as the input and 166 ms when only MS was used as the input. F is the first derivative of the force information and can capture the characteristics at the moment of contact/non-contact switching. Therefore, the time delay is considered to be reduced compared to the case in which only the MS is used. The simultaneous input of F is suggested to enable stable force detection during contact, compared with the case where only MS is used as an input. Fig. 11 shows the RMSE [N] in condition b) when the number of training data at each command value is changed to 2, 3, 5, 7, 10, and 15. Thus, the training dataset contains 10, 15, 25, 35, 50, and 75 data. Here, we used the detection model based on LSTM, and the inputs were MS and MS+ F. For both inputs, the detection accuracy improves as the amount of training data increases. However, the detection accuracy is better when MS+ F is used as the input, regardless of the amount of training data. To achieve the same level of detection accuracy as the proposed method using only the MS, it is necessary to collect more training data. As the amount of data collected by robots is limited, the proposed method can improve the detection accuracy with a small amount of data.  To verify whether the NN recognizes the low-frequency information of MS as differential feature, we compared the inputs: all MS calculated from the force information defined as (MS(ALL)), the MS of 40-400 Hz defined as (MS(MID)), and MS(MID) plus F defined as (MS(MID)+ F). Table 5 shows the RMSE [N] under conditions a)-c) when LSTM was used as the detection model. From the results of MS(ALL) in condition b), when the training data does not include the offset, the detection accuracy deteriorates when all frequency information is used as input. Therefore, the NN is trained to simply detect the contact force from the input MS and does not recognize low-frequency information as a differential feature. However, by adding the data containing the offset to the training data as in condition c), the offset can be removed, and accurate force detection becomes possible. The detection accuracy of MS(ALL) in condition c) was better than that of MS(MID) and comparable to that of MS(MID)+ F. Therefore, inputting F is as effective as learning the differential features of the low-frequency information. However, to train differential features using only frequency information, as in MS(ALL), it is necessary to prepare data that includes sensorspecific errors. Using the proposed method, it is possible to deal with such errors with only a limited amount of data from simple grinding tasks. Fig. 12 shows the loss evolution for each input under condition c). By explicitly inputting the difference features as F to the NN, the number of epochs required for convergence is also reduced.

C. PERFORMANCE EVALUATION
Finally, the influence of the movement in the X-Y plane on the detection accuracy was verified. Tables 6 and 7 show the RMSE [N] of the test data under condition d). Note that Table 6 shows only the data with v y of 0.0 mm/s, while Table 7 shows the data with v y of 0.0 and 6.0 mm/s used for training. The blue background frame represents the trained conditions, whereas the white frame indicates the untrained conditions. From Table 6, we confirm that the model trained with only   stationary data has a worse detection accuracy for moving data. This is because the effect of friction changes owing to the movement and vibration of the robot. However, as shown in Table 7, the detection accuracy was improved using the data obtained during movement.

V. CONCLUSION
In this paper, we proposed a contact force detection model based on LSTM using a combination of frequency information and differential features of the force signal as input. This study confirmed that the above model can improve the time delay and detection accuracy. As F and the frequency information are calculated from the same force response, no additional datasets are required. To verify the effectiveness of the proposed method, we compared it with a force detection model based on an FNN and CNN on a dataset of plane grinding tasks. Consequently, the detection accuracy of the LSTM-based model was superior to that of the FNN and CNN models. Compared to the LSTM model using only frequency information as input, the detection accuracy was 26% higher when the error was small and 57% higher when the error was large. In addition, the time delay was reduced from 166 ms to 30 ms using F as the input. However, the detection accuracy of this method is degraded by the movement during grinding process. This problem is improved by including movement data in the training data. In this study, only the single-axis contact force was detected for the plane-grinding operations. In future, it will be necessary to extend the detection of force in the three axes to support the grinding of curved surfaces.