Enhanced Equation Discovery of 3-DoF Robotic Manipulator Dynamics Using LASSO Model Selection Criteria With Variable Segregation Algorithm

The challenge in controlling a manipulator robot is to model the system to obtain an efficient control system design. One approach that can be used to model the dynamics of a manipulator robot is data-driven modeling. However, in its implementation, data-driven modeling is highly sensitive to sensor noise, which significantly affects the accuracy of the system identification. In addition, the existing approach yields only a generalized form of the differential equation for each joint, which has not been divided into inertial, Coriolis, and gravitational variables that can be used for other purposes. In this study, a LASSO model selection criteria with a variable segregation algorithm (LMSCVS) is proposed to derive the dynamic equation of a 3-DoF manipulator robot, segregating the generalized form variables into Coriolis and centrifugal, inertia, and gravitational variables. Additionally, a Dynamic Expression Nonlinearization (DEx-N) algorithm is introduced to generate nonlinear candidates more efficiently to express the dynamics of the robot manipulator. The experimental results on the ROB3 hardware demonstrate that the proposed method successfully discovers mathematical equations, resulting in higher accuracy and sparsity compared to the previous method. The processing time of the proposed method is also significantly faster. Based on these results, the proposed method has a better performance in identifying real systems that usually have noise in the sensor data and in discovering the equation of robot manipulator dynamics for broader purposes.


I. INTRODUCTION
System modeling of a robot manipulator plays an important role in the development of robotic control systems.By using the mathematical equations of the models enabled us to gain an in-depth understanding of the characteristics and dynamics of the robot manipulator, which in turn enabled the design of effective and optimized control systems [1], The associate editor coordinating the review of this manuscript and approving it for publication was Mohammad Alshabi .[2].The dynamic modeling of a robot manipulator can be performed using three approaches: manual modeling [3], [4], computer-based modeling [5], [6], and hybrid modeling [7].Robot manipulator dynamics modeling using manual or analytical approaches, such as the Lagrange-Euler [8] and Newton-Euler formulation [9], require a high level of knowledge of dynamics and physics laws.If the parameters are not modeled correctly, there will be a discrepancy between the model and actual dynamics of the robot.Typically, some simplifications are carried out in modeling analysis [10], which can lead to the omission of certain parameters and result in reduced model accuracy because some parameters are removed from the model.An easier solution is to use computer-based or data-driven modeling, which uses a computer to process the input and output data into mathematical equations or universal approximators [11].This is easier because we only need to prepare a dataset from data measurements.The hybrid approach is a combination of analytical and computer-based modeling approaches, which is also known as parameter estimation, where the model equations are obtained through an analytical approach, and the parameters are estimated through the process of a computer-based modeling approach.However, owing to the presence of analytical modeling, parameter estimation is less effective when applied to robot manipulator dynamics modeling, because obtaining complex equations through an analytical approach is difficult [7], particularly if the degrees of freedom, which are determined by the number of joints, are increased.System identification refers to the use of a computer to build mathematical models of a system based on the measured input and output data [12].Hence, this technique can be used in computer-based or hybrid modeling.An effective and accurate system identification technique is required to obtain a mathematical model that fits an identified robot.Therefore, research and development of more accurate and effective system identification techniques are essential to improve the quality, reliability, and usability of robot manipulators in various industrial and research applications.
System identification methods can be classified as linear and nonlinear systems.Linear system identification was proposed in 1950 [13] and is widely used for various purposes.Linear system identification produces linear equations such as the state space [14], continuous transfer functions [15], digital transfer functions using Steiglitz-McBride regression [16], linear difference model [12] using Autoregressive with Exogenous Input (ARX) [17] and Autoregressive Moving Average with Exogenous Input (ARMAX) [18].Nonlinear system identification occurs when linear system identification is unable to present the important aspects of a nonlinear system [13].This causes the resulting model to be inaccurate, which also has implications for an inefficient control system design.System identification with a linear model structure [19] is unsuitable for applications in robot manipulators with nonlinear and highly coupled characteristics [21].Lu et al. proposed a deep neural network-based system identification method to efficiently and accurately identify differential equations using nonlinear operators from data, which is called deep operator neural network (deepOnets) [22].DeepOnets can learn 16 types of operators, both linear and nonlinear operators.However, deepOnets is not effective in control system analysis implementation because it produces a universal approximator instead of mathematical model equations that can be used for analytic purposes.
The sparse Identification of Nonlinear Dynamics(SINDy) [23], followed by SINDy with control (SINDYc) [24] has been proposed to deal with the dynamics of nonlinear systems and can produce nonlinear mathematical model equations by selecting a nonlinear candidate function that has a value above the threshold.However, the accuracy of SINDy still needs to be improved, particularly when an imprecise threshold causes the equation produced by SINDy to be inaccurate.Ensemble SINDy [25] improves SINDy with the ability to deal with noise.However, this method is time-consuming because it must be broken down into several models.Sparse Identification of Nonlinear Dynamics with Akaike Information Criteria (SINDy-AIC) [26] was proposed to improve SINDy accuracy by increasing the number of regressions with different thresholds, and the final regression results were selected with the smallest sum of squared error (SSE) and the least number of candidates.SINDy-PI [27] with AIC, which uses a combination of least-squares and least-norm solutions [28] is also used to discover the equation of 2-DoF robot-manipulator dynamics, which can produce mathematical equations similar to those of the identified system.However, research carried out using computer simulations has not been tested in the presence of noise, which is typically present in real systems, and the method uses least-squares and least-norm solutions that are sensitive to noise [29].The resulting differential equation obtained in [28] is still in the form of a generalized form of differential equations, which is difficult to use for broader purposes such as the need for the inversion of dynamic equations.
This study addresses several challenges and its main contributions of this study are as follows.First, SINDy-AIC has the potential to be used for identifying robot-manipulator dynamics.However, when the least-squares method is used to identify hardware-based systems in which sensor noise is usually present, it results in an increase in the correlation between independent variables, causing the identification results to be less accurate, and making the equation discovery process more challenging.An approach is proposed to improve the accuracy of the sparse regression of robot manipulator dynamics owing to sensor noise.This approach applies LASSO regression with Akaike Information criteria-based model selection criteria to improve the identification accuracy.
Second, a comprehensive strategy is proposed to identify the nonlinear equations that represent the dynamics of robotic manipulators.This is important because robot manipulators often have complex nonlinear dynamics, which in turn complicate the modeling process.
Third, the inappropriate use of an anonymous function can result in an equation that cannot be interpreted, large number of independent variables, and longer processing time.An algorithm is proposed to generate nonlinear candidates that are more suitable for expressing system dynamics and more efficient by producing fewer independent variables.Therefore, it can speed up the computing process, minimize the risk of multicollinearity, and eliminate variables that do not match the equations that express the system dynamics.
Fourth, library functions play a crucial role in the equationdiscovery process, inappropriate functions can complicate the process and generate nonlinear candidates that can lead to overfitting.To address this issue, a library function is proposed specifically designed for robot manipulators with voltage excitations.
Fifth, generalized equations are difficult to use for broader purposes.A variable segregation algorithm was proposed for the robot manipulator dynamic equations into Inertia, Coriolis, and Gravity variables.This novel algorithm helps in understanding the components that contribute to dynamic behavior and can also be used for broader purposes, such as control systems and fault detection design.

II. METHODOLOGY A. 3-DoF ROBOT MANIPULATOR
A robot manipulator can consist of two or more links driven by a joint and motor as the driving force.The robot manipulator can move revolutionally and prismatically depending on its purpose.A 3-DoF robot manipulator, called ROB3 by P@P Picking System GmbH [30], was used to test the effectiveness of the proposed method.ROB3 has three joints driven by a gearbox DC motor equipped with a potentiometer to read the angular position of the robot joint.In this study, two experimental scenarios were used: model simulation and ROB3 robot manipulator, as shown in Fig. 1 and Fig. 2 respectively.The model of 3-Dof robot manipulator used in this scenario with coordinate arm link parameters as shown in Table 1.There are various method that can be used to model the dynamics of robot manipulator, such as Gibbs-Apell(GA) and Newton formula [31] and Kane's Formula [32].In this study, the Lagrange-Euler method was chosen because of the simplicity of the algorithms to model robot manipulators in multi-DoF and mostly used in control system.Based on Table 1, the model equations obtained using the Lagrange-Euler [9] method are presented in Table 2, where q denotes the state vector of angular positions of joints, with q i representing the state for the angular position of joint i, q denotes the state vector of angular velocity of joints, with qi representing the state for the angular velocity of joint i, q denotes the state vector of angular acceleration of joints, with qi representing the state for the angular acceleration of joint i; C(•) represents the cosine function; and S(•) represents the sine function.The state variables are explicitly written to denote the state at time t, for example q i (t).However, for the writing efficiency, the time variable (t) is omitted.
The hardware interconnection diagram used for system identification is shown in Fig. 3, where the computer was used as a monitoring or logging system to read the sensor output and actuator input, which were sent by an 8-bit microcontroller.The microcontroller was also used to control the motion of the robot by providing a pulse-width modulation (PWM) signal to the gearbox DC motor, which varied the rotational speed of the gearbox DC motor.To simplify the system identification process, the PWM duty cycle was converted into a voltage signal.The value of the voltage signal received by the motor can be calculated as follows, where V pwm denotes the voltage signal received by the motor, D cycle denotes the duty-cycle percentage, and V input denotes the voltage input of the microcontroller.However, the power provided by the microcontroller is too low to drive the motor.Consequently, the voltage signal must be transferred to a higher-power level.This problem can be solved using an H-bridge, which is commonly referred to as a motor driver.In addition, the H-bridge can be used to change the direction of DC motor rotation, which can be controlled by the microcontroller via the direction pin.The H-bridge used in this study was packaged in L298N chips.The process of identifying the robot manipulator requires sensor output and actuator input data.The input of the actuator data is the equivalent voltage signal data received by the gearbox DC motor or the voltage output from the motor driver, whereas the sensor output data is the angular position of the robot joint, which can be read using a potentiometer.The potentiometer outputs a voltage variation according to the potentiometer angle position, and the voltage value varies from 0-5Volt.The microcontroller reads all three potentiometer values using the three analog-to-digital converters (ADC) using the following equation: where ϱ is the adc value within the range of 0 − 2 n , n is the bit resolution of the adc, V ref is adc voltage refference.The V adc value was then converted to the angular position of joint (q) using the equation as follows, where V max and V min are the maximum and minimum voltages of the potentiometer, respectively, q max and q min are the maximum and minimum joint angles, respectively.The microcontroller read the actuator input and sensor output of the robot joints with an average sampling time of t s = 8ms, which was then sent via serial communication.

B. LASSO MODEL CRITERIA WITH VARIABLE SEGREGATION ALGORITHM (LMSCVS)
The proposed regression method is LASSO model selection criteria with variable segregation.A diagram of the system identification process is presented in Fig. 4. The process comprises four main procedures: data collection, nonlinearizing data, performing regression, model selection, equation construction, and variable segregation.

1) DATA COLLECTION
Data Collection procedures were carried out to prepare the data for the identification process, including the following steps: 1) Recording the measurement data: Providing input signals to the robot manipulator and then recording all input and output data that were sent via serial communication and stored in a log file.2) Deriving the datasets: The joint data collected from the measurement was only robot joint's angular position that was defined as q, We needed to obtain the angular velocity q and the angular acceleration q which can be obtained by deriving the joint's angular position.However, deriving noisy and discrete data results in angular velocity and acceleration in chattering and noisy data, respectively.These angular velocity and acceleration data are not suitable for system identification and need to be filtered to obtain better derivative data.A continuous RC low-pass filter was applied to angular velocity and acceleration.The alternative approaches of RC low-pass filter such as Butterworth Filter, Moving average and Savitzky-Golay Filter is also can be used to addressed the noise in data.

2) NONLINEARIZING DATA AND PERFORM REGRESSION
Nonlinearizing data is a process in which the original measurement data ⃗ q, ⃗ q, ⃗ q are combined and converted into a nonlinear form.1) Defining Nonlinear Candidates: The candidate of a nonlinear function should be defined by prioritizing the candidate with the highest probability of being selected.Trigonometric functions are the most likely candidates for use in robot-manipulator dynamics.One of the contributions of this study is the formulation of effective nonlinear candidates that can be applied to robot-manipulator modeling.Candidates are usually expressed in anonymous functions by providing position, velocity, and acceleration data to the anonymous functions to produce new independent variables.If the anonymous function is not well designed, it results in inefficient candidates, and the resulting equation cannot be interpreted as a robot manipulator equation.
It is necessary to apply the following rules to define an anonymous function: Rule 1: The anonymous function @(p,r,s,u,v,z) is a library function where p,r, and s are used to denote the joint position variables (q i ); u and v are used to denote the joint angular velocity (q i ); and z are used to denote the joint angular acceleration (q i ). 2) Nonlinearizing Data: In this study, the Dynamics Expression(DEx) Nonlinearization Algorithm as presented in Algorithm 1 is proposed based on Rule 1 which returns the result of the nonlinearized input data X :,k , where colon denotes all elements of rows or columns of the matrix variables, and k represents the library function number.This algorithm allows for effective nonlinearization of data using anonymous functions, eliminating the need for unnecessary variables in the function.3) Threshold Variation: Equation ( 4) was used as the basis for achieving threshold variation [26] and allowed for the generation of diverse thresholds in a vector.This equation is called a logarithmic-spaced vector, and is defined as follows: where a denotes the threshold start, n denotes the number of variations, and b denotes the threshold end.3) objective function of LASSO that is shown in (5) was performed to obtain the independent variable gain coefficient .The LASSO objective function was also implemented in MATLAB lasso().The LASSO objective function [33] is defined as: where N is number of data points, Y i is the dependent variable which is the output of the system at point i, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

3) MODEL SELECTION
Model selection was performed to select the best λ with the lowest absolute error for each state.1) Inverse dynamic simulations: Inverse dynamic simulations are faster and more computationally efficient than forward dynamics that requires the integration of Ordinary Differential Equation (ODE) simulation.The inverse dynamic calculation is defined as follows: This inverse dynamics simulation using (6) was performed to obtain the predicted value Ŷλ for each gain coefficient λ and nonlinearized input data X .2) Sum of Squared Error calculation: The error was calculated by using the gain coefficcient from the regression results and nonlinearized input data X and output Y that is defined as follows: 3) AIC Calculation: The corrected AIC [26] was used to select the best λ with the smallest absolute error and number of coefficients.The AIC equation is defined as: where e is the error from (7), N is the number of samples, η is the number of nonzero elements in λ .4) Model Selection: The gain coefficient λ with the best criteria has the lowest AIC score and can be obtained using min function in MATLAB.

4) EQUATION CONSTRUCTION
The equations for the robotic manipulator were constructed using the best gain coefficient , and the construction algorithm was similar to that used to define the library, as shown in Algorithm 1.In the equation construction process, the anonymous function in the nonlinearizing data step was manually changed into a string variable and then constructed using the following equation: where X names (.) is the nonlinear library in the string variable, best is the best based on the results of the AIC calculations, and Ŷ is the resulting differential equation of the robot manipulator dynamics.

5) VARIABLE SEGREGATION
Based on derivation using Lagrange-Euler, Newton-Euler, and Generalized d'Almbert formulation [9], [34], the rotation of the robot manipulator is influenced by three components: Coriolis and centrifugal H , gravity C and the moment of inertia D torque.The inverse dynamics of the robot manipulator equation using the torque τ as the input are defined as τ = D(q) q + H (q, q) + G(q) (10) where τ ∈ R 1xn denotes the joint torque vector, D ∈ R nxn is the acceleration-related symmetric matrix used in inertial terms, H ∈ R n denotes the Coriolis and centrifugal matrices, and C ∈ R n denotes the Gravity terms.In this study, the joints were actuated by a brushed DC motor with a torque that was considered proportional to the motor currents [34], [35].The torque can then be expressed as τ = KV pwm (11) where K is the voltage-to-torque constant that can be included in D, H , and G. Equation (10) with voltage-based excitation can be expressed as: where D, H , Ḡ are D, H , G variables that were multiplied by K −1 .Equation ( 12) was used as the model to be discovered by the proposed method.The following proposition of Lagrange-Euler-based dynamic of robot manipulator is used to derive the library and also the variable segregation algorithm: Proposition 1: The Inertial terms in the robot manipulator dynamics equation involve the joint angular acceleration and are devoid of the joint angular velocity variable.
proof: Let us consider the inertial terms of the Lagrange-Euler-based robot manipulator dynamics equation [8], [9], [36] as: where Tr (•) is the trace of a matrix expression, U jk and U ji defined in ( 14) is the effect of the joint motion, J j is the inertial tensor, which is the product of the mass and squared radius of gyration of the link and yields units of kg.m 2 , Q is the transformation matrix used to differentiate between revolute and prismatic joints, where Q does not have units, k A j−1 is the coordinate transformation matrix that relates the jth coordinate frame to kth coordinate frame, k A j−1 consists of the joint angular position within trigonometric functions.Since Tr U jk J j U T ji has units of kg.m 2 obtained from inertial tensor multiplication and qk has units rad/s 2 , it can be concluded that (13) proves that the inertial term is an equation that involves the angular acceleration variable in its calculations and are devoid of a joint angular velocity variable.□ Proposition 2: The Coriolis and centrifugal terms in the robot manipulator dynamics equation involve two joint angular velocity variables and are devoid of the joint angular acceleration variable.
proof: Let us consider the Coriolis and centrifugal terms of the Lagrange-Euler-based robot manipulator dynamics equation [8], [9], [36] defined as follows, where U ji is the effect of the joint motion defined in (14) and U jkm is defined in (16) is the interaction effect between the joints.Since Tr U jkm J j U T ji has units of kg.m 2 and is multiplied by two angular velocity variable, which results in units of rad 2 /s 2 , the units for torque are already satisfied.Then, (15) proves that the Coriolis and centrifugal terms are equations that involve the angular velocity variable and are devoid of the joint angular acceleration variable in its calculations.
□ Proposition 3: The Gravitational terms in the robot manipulator dynamics equation are devoid of angular velocity and acceleration variables.
proof: Let us consider the gravitational terms of the Lagrange-Euler-based robot manipulator dynamics [8], [9], [36] equation as: where m is the mass of the link, g is the gravitational acceleration, n is the joint number, U ji is defined in (14), and r is the distance from the center of mass of the link.Since (17) already includes an acceleration variable obtained from the gravitational acceleration g (m/s 2 ), multiplied by the mass (kg) and distance (m) from the center of mass of the link, the units for torque are already satisfied.It is proven that the gravitational terms are devoid of angular acceleration and velocity of joint variables.□ In addition, bias variables are used to model the unknown torque, and can be added to the gravitational terms.Based on these three propositions, the discovered equation can be segregated into three main variable groups of robot manipulator variables by using Algorithm 2, which can be implemented in MATLAB.

6) EVALUATIONS
This section presents the evaluation of the identification results to evaluate the effectiveness of the proposed method.Three evaluations were performed: the fitting score, Root Mean Squared Error (RMSE), and sparsity.
1) Fitting Score: The fitting score or coefficient of determination, denoted by R 2 , as shown in (18), is used to measure the goodness-of-fit of predicted output with the observed output from the dataset [37].
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.19) where SS res in (19) is the sum of squared errors (SSE), SS tot in (20) is the total sum of squares, n is the number of observations, Y i is the observed value from the dataset for the i th observation, Ȳ is the mean of the observed values of the dependent variable, and Ŷi is the predicted value obtained from the resulting simulation equation for the i th observation.2) Root Mean Squared Error (RMSE): RMSE was used to measure the accuracy of regression results.3) Sparsity: Sparsity was used to measure how many nonlinear candidates were excluded from the resulting equation by multiplying with zero element of : the more candidates that were excluded, the simpler the equation and also reduced the multicollinearity effect from other candidates.Sparsity is defined as where κ is the sparsity percentage, ∥ ∥ ∅ is the number of nonzero elements of , and Z is the number of all elements in .

III. RESULTS AND DISCUSSION
In this section, the results of the LASSO model selection criteria with the variable segregation algorithm for discovering the equation of the 3-DoF robot manipulator are presented.To measure its effectiveness, the proposed method was compared with the previous SINDy-AIC in terms of processing time, accuracy, and resulting equation.The experiment was conducted in two ways: by discovering the model equation using measurements of the simulation of the model, and measurement of the ROB3 robot manipulator.Simulation experiments were performed using a 3-DoF robot manipulator model obtained using the Lagrange-Euler method.
A. DATA NONLINEARIZATION Data Nonlinearization is a process carried out in both experiments.To prove the effectiveness of the proposed nonlinearization algorithm, dynamic Expression Nonlinearization (DEx-N) as described in Algorithm 1 will be compared to the PySINDy nonlinearization algorithm [38], [39].The definition of the library functions is based on the model equations obtained using the Lagrange-Euler method and follows Rule 1.The library functions used by both methods are listed in Table 3.One of the contributions of this study is the introduction of a library function that is required to model the dynamics of robot manipulators with voltage excitation.The proposed library function is based on the strong correlation between the voltage and motor joint velocity, where the motor joint velocity is proportional to the voltage excitation.The library function is listed in No.3 of Table 3.
Eleven library functions are available in Table 3 for modeling robot manipulator equations based on the Lagrange-Euler model.When employing PySINDy with the ''interactiononly'' option, the total number of variables reached 639.This results from a binomial coefficient combination of i, j, k, l ∈ R m , where m = nx3, and n represents the number of variables in each state to be nonlinearized.In this study, n = 3 corresponds to the number of joints, resulting in m variables including q 1,2,3 , q1,2,3 , and q1,2,3 .The number of variables can increase if the ''interaction-only'' option is set false.However, according to Rule 1, the DEx-N algorithm involves fewer combinations, totaling n k combinations.
The total number of output variables N resulting from nonlinearization can also be observed in Table 3.The input data being nonlinearized is stored in the variable X ∈ R TxN , where T is the number of data samples, and N is the number of variables produced from the data nonlinearization process.From the table, it can be observed that the DEx-N algorithm can minimize the number of nonlinear candidates generated.A reduced number of variables will have implications throughout the identification process, such as reducing the regression processing time and minimizing the risk of multicollinearity.Moreover, by using the DEx-N algorithm, the equations generated for the robot manipulator are more in line with Propositions 1, 2, and 3.For instance, variables q and q will not appear together within a single variable.Therefore, going forward in this paper, both SINDy and LMSCVS used the libraries generated by the DEx-N algorithm.

B. ROBOT MODEL SIMULATION
The mathematical model obtained using the Lagrange Euler-based approach stated in Table 2 was utilized to simulate the dynamics of the 3-DoF robot manipulator.The objective of this simulation is to determine whether the method can generate mathematical equations that resemble previously known model equations.
The scenario for the equation discovery of the robot model is as follows.1) The joint angle position is generated using a sinusoidal signal with two data variations, as shown in Table 4.This variation is necessary to ensure that the input data varies (persistent excitation), providing a clear representation of the identification system.2) The simulation diagram is shown in Figure 5.The input model consisted of three types of data: q i is the angular position of the ith joint, qi is the angular velocity of the ith joint, and qi is the angular acceleration of the ith joint.The angular velocity and angular acceleration data were obtained by calculating the derivative of the angular position for each joint.The model equations expressed in Table 2, obtained using the Lagrange-Euler method, are used to process the input into the output torque.3) The results of the input and output data generated from the simulation can be seen in Fig. 6.The simulation was performed for 60 s with a fixed time sampling of 0.01 s.
It is important to note that training data can have impulses with very large values owing to drastic changes in the data derivatives.Therefore, data cleaning is necessary to prevent disruptions caused by impulses in the identification process.Data cleaning can be performed by removing data rows with values greater than the absolute maximum value of the normal data, as shown in Fig. 6 represents the cleaned data.4) The input data q, q, q are nonlinearized using the DEx Nonlinearization algorithm.Summary of the equation discovery results are presented in Table 5.It is evident that SINDy-AIC excels in generating equations with 100% similarity to the Lagrange-Euler model.By contrast, LMSCVS yields variables that do not precisely match the Lagrange-Euler equations.Candidates with large coefficients can be modeled using LMSCVS, even if their coefficient values differ from the Lagrange-Euler-based model coefficients, and some candidates with small coefficients are replaced by others.Nevertheless, it achieved an accuracy of up to 99% when tested with validation data.In Fig. 7, a comparison of the SINDy-AIC and LMSCVS models in estimating the torque values generated by the Lagrange-Euler model can be observed.The SINDy-AIC method produces a model equation with 100% accuracy in terms of both the variables and coefficients, as shown in Table 6.This enables it to closely resemble the Lagrange-Euler model, resulting in lines that closely align with it.However, the LMSCVS method exhibits some lines that do not closely align with the Lagrange-Euler model, particularly when there are changes in the angular velocity direction.However, LMSCVS is still capable of effectively modeling 3-DoF robot manipulator equations.
Table 6 presents the equations generated by the LMSCVS and SINDy-AIC methods.From this table, it is evident that SINDy-AIC excels in terms of accuracy in the equations obtained when compared to the Lagrange-Euler model in Table 2.The LMSCVS method, as shown in Table 5, outperforms in terms of data processing speed and also results in many variables and coefficients resembling the Lagrange-Euler model.From a sparsity perspective, the LMSCVS model surpasses the SINDy-AIC.Therefore, in a deterministic scenario, SINDy-AIC can be considered as a solution for equation discovery.The equations in SINDy-AIC are presented in a generalized form, as seen in a recent study [28].The equations generated by the LMSCVS method using the proposed variable segmentation algorithm have the form that corresponds to the Lagrange-Euler equations, including variables for Inertial, Gravitational, Coriolis, and centrifugal components.

C. ROB3 ROBOT MANIPULATOR
The scenarios in the experiment using the microcontrollerbased ROB3 robot manipulator were as follows:  2) The angular positions of the joints were collected via direct measurements, processed through derivation, and refined using a low-pass filter.A comparative analysis of the velocity and acceleration data before and after filtration is presented in Fig. 9 and 10, respectively.From Fig. 9 and 10, it can be observed that the filtered data have better signal quality than the unfiltered derived joint position, and the unfiltered data exhibit a chattering phenomenon and noisy behavior that are not optimal for use in the system identification processes.The integrated outputs for the unfiltered and filtered velocity data are shown in Fig. 9.b, these results indicate that both the filtered and unfiltered data still have similar results after integration.Notably, the results of the integrated acceleration data without a filter introduced noise, which is important when the resulting equations are used in forward dynamic simulations.The complete results for the filtered data used in this experiment, which were used as the dataset are shown in Fig. 8.c and Fig. 8.d.
3) The nonlinear function candidates in a robot manipulator consist of a combination of trigonometric functions and follows Rule 1.The nonlinear candidates used by the proposed method and SINDy-AIC are listed in Table 3, the same as that used in the simulation.
The identification results obtained using the proposed method were compared with the SINDy-AIC using a combination of least-squares and least-norm solutions.To investigate the influence of the training data and library used, the experiments were divided into four case studies: case study I, using the training data shown in Figure 8 and test data shown in Fig. 11; the second case study swapping the test data with the training data; and case study III removed the most contributed library.These case studies aimed to assess how variations in the training data, testing data, and choice of library affect the results of the identification process.

1) CASE STUDY I
The libraries used are listed in Table 3, and the training data used to obtain the equation are shown in Fig. 8 and the test data are shown in Fig. 11.The use of different data for training and testing is necessary to measure the accuracy of the equations for different data.A summary of the equation discovery process and results is presented in Table 8.From Table 8 it can be observed that the sparsity value obtained using LMSCVS reached 92.98%, exceeding the SINDy-AIC, which was only 77.97%.This level of sparsity demonstrates the ability of the method to mitigate multicollinearity among the independent variables.Multicollinearity can lead to overfitting because some variables may not be necessary across different datasets.This is evident when examining the model's performance in fitting the test data, which exhibits a significant decline in the fitting scores, particularly in the case of the 3rd joint, 20584 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.where the fitting score becomes negative.In the case of LMSCVS, a similar trend is observed, with the sparsity value decreasing to 75% when evaluated using the test data.The results indicate that higher sparsity values help reduce overfitting, whereas lower sparsity values increase the risk of overfitting.The magnitude of the sparsity value can be adjusted by selecting appropriate lambda values for both methods.
In terms of the accuracy, which is stated in the fitting score, the LMSCVS method had higher values than the SINDy-AIC method.The results also show that LMSCVS is faster than SINDy-AIC is.This is caused by the least-norm solution that uses the pinv algorithm, which yields a longer processing time owing to the ill-conditioned matrix, and the SINDy sparsify algorithm, which loops ten times.LMSCVS also outperformed SINDy-AIC in terms of the RMSE.However, the RMSE value for joint 1 remained relatively high at 2.8 V.This is possibly caused by variables that have not been effectively modeled within joint 1, particularly those contributing to the position drift during each movement.Another possibility is that the training data are not sufficient   to model the dynamics of Joint 1.The data drift phenomenon is shown in Fig. 8.b, wherein the angular position of the joint 1 data consistently drifts despite consistent input values.The occurrence of this data drift can be attributed to gearbox motor conditions.Fig. 12 shows the fitting results when evaluated using the test data shown in Fig. 11.It is noticeable that the SINDy-AIC method can fit the test data, but errors frequently occur, likely owing to noise in the angular velocity and angular acceleration data.Furthermore, there were more inaccuracies in fitting the test data for joint 3.The LMSCVS method fit the test data with minimal errors in the results.However, when the ROB3 joint begins to move, a voltage input of 0-3 V proves insufficient to initiate arm movement, primarily owing to factors such as friction and gearbox constraints, which demand higher voltage and current.Consequently, an input voltage range of 0-3V was ineffective for controlling the robot joint motor in this alternating input type.If necessary, a saturation function can be introduced at the output of the dynamics to model this behavior.

2) CASE STUDY II
The experiment was performed by changing the test data shown in Fig. 11 as training data and the training data shown in Fig. 8 into the test data.The aim of this experiment was to determine the importance of the training data.A summary of the results obtained can be seen in Table 9.
Table 9 and Fig. 13 show significant differences in the results from those obtained in case study I, as shown in Table 8, especially in the results obtained using the SINDy method (AIC).The fitting Score using the test data increased to 70%, which was accompanied by a decrease in the number of gains from 185 to 70, thereby increasing the sparsity value.Based on these data, we can see the relationship between sparsity values and overfitting, namely, that increasing the sparsity value will increase accuracy, which also reduces the risk of overfitting.The SINDy-AIC processing time is also faster because the ill-condition of the matrix is also reduced.This can be caused by the training data in Case Study II, which have more informative data and fewer ill-conditioned matrices caused by singular matrices during the regression process.The fitting score data obtained with LMSCVS also showed an increase, but it was relatively small, only around 3%, which shows that the LMSCVS method is more robust to the type of training data used, even though the training data in case study 1 are difficult to use by SINDy-AIC.However, LMSC is still able to carry out regression with good results and relatively stable sparsity values with slight changes, that is, a decrease of approximately 0.5%.From these results it can be seen the importance of data variations can influence the accuracy results in the regression process, in case study 2 show that the LMSCVS method surpasses the SINDy-AIC method.
The final equation was obtained using LMSCVS and SINDy-AIC are listed in Table 10.The resulting equations using the proposed method are segregated into Inertia, Coriolis and Gravitational variables in matrix forms.In the gravity equation for joint 1, there should be no variable or G 1 = 0, because no gravity component affects joint 1, which rotates horizontally (yaw axis).However, because there are losses in the gearbox motor that inhibit joint motion and cause data drift, the resulting gravitational variable in joint 1 is used to model the dynamics that cause data drift, and is modeled with bias and several functions.The bias variable was present in all gravity terms to model the unknown torque in dynamics.

3) CASE STUDY III
This experiment aims to prove the proposed library function that contributes significantly to modeling the dynamics of robot manipulators with voltage excitation by eliminating the proposed library function from Table 3 No. 3, namely @(u)sign(u).* u. 2 , this function models the joint motor velocity without considering the joint angles.A summary of the identification results is presented in Table 11.
Table 11 shows the effect of the proposed library function on accuracy; both methods experienced a significant drop in accuracy to tens of percent.These results were derived from experiments with varying lambda values, and only the results with the optimal lambda value are presented in the table.These findings underscore the importance of incorporating the proposed library function into the modeling of robot manipulator dynamics with voltage excitation.
Based on these three case studies in discovering the ROB3 robot manipulator with voltage excitation, the proposed method has demonstrated its ability to effectively model the dynamics of a 3-DoF robot manipulator in stochastic condition.It can be observed that the LMSCVS method is capable of discovering mathematical equations more robustly, resulting in lower levels of overfitting compared with SINDy-AIC.The selection of the lambda value used as a threshold in SINDy-AIC, and as a penalty in LMSCVS, must be carefully considered.A lambda value that is too small can decrease the sparsity and increase the risk of overfitting.In addition, the choice of library functions is crucial to avoid multicollinearity and optimize the computational process.

IV. CONCLUSION
This study introduces an enhanced equation discovery of robot manipulator dynamics using LASSO model selection criteria with a variable segregation algorithm (LMSCVS) as a solution to discover the mathematical equations of robot manipulators with voltage excitation, and yield equations that are easier to use for broader purposes.SINDy-AIC, which uses least-squares regression, is sensitive to noise and the absence of an approach that can produce a mathematical equation that separates the differential equations of the robot manipulator into three variable groups: Inertial D(q)q, Coriolis and centrifugal H (q, q), and Gravitational G(q).
In this study, the Dynamic Expression Nonlinearization (DEx-N) algorithm is introduced to generates nonlinear candidate equations by nonlinearizing the input data based on the dynamics equations of the robot manipulator.A comparative analysis with the binomial coefficient algorithm commonly employed in PySINDy demonstrates that the DEx-N algorithm produces a reduced number of candidates that prove to be more efficient in modeling robot manipulator dynamics.In addition, the DEx-N algorithm helps to prevent issues such as multicollinearity and overfitting.
In experiments to discover the dynamic equation of 3-DoF robot using analytical models under noise-free conditions, the results showed that the SINDy-AIC method demonstrated an accuracy of 100% in both the fitting score and model equation.However, the LMSCVS method yields results that closely resemble those of the Lagrange-Euler model, with a 99% fitting score and similarity in the model equation and coefficients.This suggests that the LMSCVS method is a reliable choice for modeling robot manipulators.In deterministic conditions, it can be concluded that SINDy-AIC represents the optimal solution for modeling system dynamics.
In experiments to discover the dynamic equation of the ROB3 hardware, the results showed that the LMSCVS method outperformed the SINDy-AIC method in terms of accuracy, speed, and sparsity.The results also showed that a low sparsity level leads to overfitting when tested using other data.Therefore, careful selection of the lambda value is essential to achieve a high sparsity and mitigate the risk of overfitting.This study also introduces nonlinear candidate equations that play a significant role in modeling robots with voltage excitation.Without these variables, the fitting scores for both the methods were significantly reduced.
Furthermore, it is important to highlight the specific and crucial implications of the proposed method, which not only improves the accuracy, reduces multicollinearity, and reduces 20588 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the processing time of system identification but also introduces an algorithm to discover the mathematical equations of the manipulator robot that addresses the limitations of existing methods, which has broad application potential in various fields.With this contribution, the proposed methods and algorithms have the potential to provide positive impacts and useful solutions for various applications that require the accurate and efficient modeling of robotic manipulators.

V. FUTURE WORKS
In future research, it would be valuable to extend the application of this approach to more complex robotic systems.For example, discovering the dynamic equations of mobile-based robotic systems [40], robotic systems with higher degrees of freedom, and flexible link robotic manipulators [41].However, applying the proposed method to a robot manipulator with a higher degree of freedom (DoF) or General DoF may pose several challenges, particularly in terms of accuracy, processing speed, and the effects of noise.The accuracy may diminish when implemented on a robot with higher or general DoF due to multicollinearity in independent variables caused by coupling effects at each joint.This challenge can be mitigated by incorporating variations in penalty values in LASSO to increase the strength of regularization, leading to more coefficients being exactly zero, and effectively addressing the issue of multicollinearity.
The increased amount of data resulting from a higher number of robot joints may lead to a longer processing time for the proposed method.To address this, one can consider taking sensor data with a larger sampling time, introducing more variations in joint movement within a shorter time frame.This approach helps reduce the volume of measurement data while ensuring sufficient data variation for the regression process.
The challenges such as noise with high variance can significantly impact the equation discovery process.This can be anticipated by using a lowpass filter or other type of filter to reduce noise in position, velocity and angular acceleration states.

FIGURE 6 .
FIGURE 6. Training data of robot manipulator simulation model.

FIGURE 8 .
FIGURE 8. Training data for case study I and test data for case study II and III.

FIGURE 10 .
FIGURE 10.Filter is applied on joint acceleration.

VOLUME 12, 2024 20585
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

FIGURE 11 .
FIGURE 11.Test data for Case Study I and Training data for Case study II and III.

FIGURE 12 .
FIGURE 12. Case Study I fitting comparison.

FIGURE 13 .
FIGURE 13.The fitting comparison on case study II.

TABLE 5 .
Summary of Lagrange-Euler-based model identification results.

TABLE 6 .
Discovered equation of simulation model.

TABLE 7 .
System excitation.Filter is applied on joint velocity.

TABLE 8 .
Summary of case study I identification results.

TABLE 9 .
Summary of case study II identification results.

TABLE 11 .
Summary of case study III identification results.