Data-driven Methods Applied to Soft Robot Modeling and Control: A Review

Soft robots show compliance and have infinite degrees of freedom. Thanks to these properties, such robots can be leveraged for surgery, rehabilitation, biomimetics, unstructured environment exploring, and industrial grippers. In this case, they attract scholars from a variety of areas. However, nonlinearity and hysteresis effects also bring a burden to robot modeling. Moreover, following their flexibility and adaptation, soft robot control is more challenging than rigid robot control. In order to model and control soft robots, a large number of data-driven methods are utilized in pairs or separately. This review first briefly introduces two foundations for data-driven approaches, which are physical models and the Jacobian matrix, then summarizes three kinds of data-driven approaches, which are statistical method, neural network, and reinforcement learning. This review compares the modeling and controller features, e.g., model dynamics, data requirement, and target task, within and among these categories. Finally, we summarize the features of each method. A discussion about the advantages and limitations of the existing modeling and control approaches is presented, and we forecast the future of data-driven approaches in soft robots. A website (https://sites.google.com/view/23zcb) is built for this review and will be updated frequently.


I. INTRODUCTION
S OFT robots have been developed for a large number of applications.Owing to their infinite degrees of freedom (DOFs) and flexibility, soft robots are leveraged in robot assistant surgery, especially minimally invasive surgery [1].Compared with their rigid counterparts, soft robots are relatively safe due to their softness and have significant advantages in assisting elderly and disabled people with daily tasks [2] and cooperating with humans [3].Moreover, they are used as hand recovery devices [4] and wearable devices [5] for various medical purposes like rehabilitation and human motion monitoring.Animals are composed of soft tissues, and researchers in the bioinspired area apply soft materials like silicone to build soft robots and mimic the behaviors of living beings, such as octopus [6], elephant [7], and earthworm [8].These robots produce specific motions and manipulations by imitating the structures and behaviors of soft animals.With the help of such soft biomimetic robots, some exploring tasks in various environments like underwater [9] and walls [10] can be achieved.Soft robot hands are adaptative to objects and commonly applied for grasping tasks involving fragile [11] and diverse [12] objects.To summarize, unique properties like infinite DOFs, compliance, and safety for humans lead to the high potential of soft robots.Hence, soft robot study is a research area highly attractive to robotics scholars.
Followed by the aforementioned advantages and applications, the most considerable drawback of soft robots is the challenge of modeling and control.Deformable materials lead to the nonlinear and delayed responses of soft robots.Moreover, infinite DOFs make it complicated to build accurate models for soft robots.The physical properties of soft robots will also vary due to the aging of soft materials.Owing to the above characteristics, the modeling of soft robots is more complex than that of their rigid counterparts, which also leads to challenging control tasks [13], [14].Therefore, modeling and control play essential roles in soft robot research.
The working process of soft robots is summarized in Fig. 1.Various kinds of robot designs are used in soft robots, like 1 DOF soft fingers [15], 3 DOF continuum robots [16], parallel robots [17], concentric tube robots [18] and so on.To actuate a soft robot, the researchers apply actuation approaches like fluid-driven methods [19], cable-driven method [20], electroactive polymers [21], shape memory alloy (SMA) [22], etc.Also, there are multiple categories of sensors, e.g., optical markers [23], EM trackers [24], flex sensors [25] and Fiber Bragg Grating (FBG) [26], [27].Some reviews detailedly Robot design (grey) provides a specific soft robot structure and actuation pattern.The sensing system (green) obtains robot information via the sensors of such a robot.Modeling f θ is a function that predicts the robot state p (blue), as end position and robot pose, according to the actuation variables a (red.)Control gτ aims to decide actuation variables a with the desired state p d and sensing inputs.Finally, such a robot system can achieve a variety of tasks.θ and τ represent the parameters in the data-driven methods of modeling and control, respectively.introduce soft robots with mechanical aspects [28], [29].Based on these hardware implementations, soft robot modeling can be seen as a function that takes actuation a as the input and robot state p, such as end position, orientation, and shape, as the output.Meanwhile, soft robot control is the inverse process of modeling, which takes the desired robot state p d and the sensing signal s as the input and actuation a as the output.The input and output choices may change considering the control strategy.For example, the open control strategy in [30] only requires the target positions p d for the trajectory following, but the close loop control approach in [23] utilizes both the target position p d and the previous end positions from the sensing system s.Each mapping requires a model, f or g, and the parameter θ or τ , which are partly influenced by robot design and sensing system.Finally, such control methods are leveraged for simple tasks, like target reach [31] and trajectory following [32], and challenging tasks, like interaction adaptation [3] and navigation [33].
Due to the variety of structures and the complexity of behaviors, it is challenging to propose accurate physical models and corresponding control strategies for every soft robot.However, data-driven methods have shown considerable benefits.Datadriven approaches summarize the features of data collected from robot motions without the necessity of robot design knowledge.Furthermore, compared with physical approaches based on simplifications and hypotheses, data from real robots can illustrate the real features of soft robots.Thanks to these advantages, some data-driven approaches can be proposed for robot modeling and control with optimization [34] or learning [19].Data-driven approaches can be applied for various kinds of modeling and control strategies, like kinematics modeling [35], dynamic modeling [36], open loop control [37], and close loop control [38].Moreover, they can also be utilized for observer [39] which includes modeling as prediction and sensing characterization [40] which can be applied in control.
This review aims to present and compare data-driven methods applied to soft robot modeling and control in parallel.Although some reviews are related to soft robot modeling and control [14], [41]- [43], they do not introduce modeling and control simultaneously.In this review, first we briefly introduce foundations for data-driven approaches, which are physical models and the Jacobian matrix.Physical models provide simulation environments and insight into soft robots, while the Jacobian matrix infers the data relationship for datadriven approaches.Then, we classify data-driven methods applied to soft robot modeling and control into three categories: statistical method, neural network, and reinforcement learning.We introduce the corresponding modeling and control approaches for each kind of data-driven approach parallelly and compare them within and among these categories.Finally, conclusions on data-driven approaches applied to soft robot modeling and control are proposed, and we discuss the challenges and emerging directions of this area.Fig. 2 shows the paper number of different methods cited in this review according to the publishment time.In Fig. 2-(a), one paper may fall into multiple categories if it involves several methods, but one paper only falls into one category considering the main method in this paper in Fig. 2-(b).Fig. 3-(a) and Fig. 3-(b) summarize the number of papers according to the publishment time and method category, respectively.The summary of datadriven method categories is shown in Table I.

II. FOUNDATIONS FOR DATA-DRIVEN APPROACHES
In this section, we introduce two significant foundations for data-driven methods, which are physical models and the Jacobian matrix.Both of them are very popular and effective in rigid robot modeling and control.In this case, the researchers also try to employ them in soft robots.They stimulate the development of data-driven approaches by providing simulators or inferring data relationships.

A. Physical Models
Physical models are significant in soft robotics because they can illustrate the nature of soft robot motion and deformation.Also, they are fundamental to data-driven methods by providing simulation environments and data.Discretization methods like FEM have been applied for robot simulation.The Cosserat model and its discretization method, like geometric variablestrain (GVS), provide simulators specifically for soft robots.Piecewise constant curvature (PCC) can be seen as a special case of GVS and is widely applied to the soft robot simulation based on the assumption of deformation shape.Some specific models, like the pseudo-rigid model and the concentric tube model, are proposed for specific soft robots.Some typical physical models are shown in Fig. 4. In robot control, methods considering physical models or parameters are model-based, while approaches only employing data and not using physical relationships inherent in robots can be seen as model-free [44].
FEM is a popular discretization simulation method in robotics.This method can be applied to various soft robots, for example, soft foam robot hand [45], soft pneumatic actuator [46], and PneuNet bending actuator [47].This model can provide high-accuracy simulation with the requirement of material parameters such as mass density, Young's modulus, and Poisson's ratio.Generally, FEM works as an offline modeling method, such as producing a dataset [46] for NN learning and building an exploring environment for RL [45].FEM is applied in the simulator SOFA [48] for deformation and interaction simulation.Moreover, a series of works [17], [49], [50] focus on FEM applications for real-time control.In order to cope with this highly complicated model, methods like local linearization [17] and reduced-order control model [50] are employed for the trade-off between the model accuracy and control frequency.
Besides the discretization of 3D small elements, the discretization of 1D Cosserat rod is also applied.Cosserat rod models soft continuum robots with a series of rigid crosssections, and includes bend, twist, stretch, and shear.The simulator PyElastica [51] applies a discrete geometric form of the Cosserat rod model for soft robot simulation.The most general strain-based discretization method of the Cosserat rod is GVS [52], which discretizes the continuous Cosserat rod model into a finite set of strain basis functions.SoRoSim [53] integrates GVS for soft, rigid, and hybrid robotic system simulation.PCC is a special GVS, which only includes bending and simulates a continuum robot with several curves in some bending angles [54].Original PCC only shows the geometrical information and dynamic PCC models include spatial dynamic effects [55] and external interactions [56].Of note, PCC is a simplification of the nonlinear Euler-Bernoulli beam, which can be applied for soft robot modeling under external interaction without the constant curvature assumption [57].
There are also some physical models for specific kinds of soft robots.Considering concentric tube robots, the authors of [18] leverage rotations and translations of tubes as actuation variables for the concentric tube robot model, as shown in Fig. 4-(d).The control tasks are achieved by finding the inverse kinematic solutions.Neural networks are utilized for both modeling and control in [58] based on this model.The concentric tube model has also been used for RL control in [31] to provide action variables.Pseudo-rigid models can approximate soft robots with rigid counterparts [59] as shown in Fig. 4-(b), and the sophisticated motions of soft robots are simulated with the help of pseudo springs [60] and dampers [61].
In conclusion, the research of physical models deepens the understanding of soft robot nature and produce various soft robot simulators for data-driven approaches.A comparison of some typical papers applying physical models is shown in Table II.This table includes the simulator SOFA based on FEM [48], the simulator PyElastica based on Cosserat rod [51], the simulator SoRoSim based on GVS [53], and RL works using the specific model concentric tube model [31].

B. Jacobian matrix
The Jacobian matrix can infer the relationship between the actuation and position velocities by model linearization.Thanks to its concision and effectiveness, the Jacobian matrix is widely applied in rigid robots, whose explicit physical models are easy to propose.Also, it is possible to linearize such models.However, due to the difficulties of building accurate and explicit models for soft robots, optimization is  The Jacobian matrix estimation and control process is summarized in Fig. 5.The first work applying the Jacobian matrix in the main approach is [34].For Jacobian matrix initialization, each actuator should be actuated with a small incremental amount in order to estimate the initial Jacobian matrix J 0 [34], [62]- [64].Then, the Jacobian matrix is updated according to the actuation and end position change in the last step.The matrix update strategy can be shown as where △x k and △y k are the end effector displacement and actuation change at step k, respectively.J = [J 1 J 2 . . .J n ] is the Jacobian matrix and n is the dimension of the actuation variable y.
) is a weighting matrix and Ĵ = JW −1 is the unit Jaocbian matrix.Ĵk and Ĵk+1 are the unit Jacobian matrix at step k and k + 1.
The actuation variables are derived as the cost function in an optimization problem considering the constraints of the Jacobian matrix and target end position for control.The control strategy for step k + 1 can be shown as where △x d is the desired displacement for the end effector.The actuation value y k + △y k+1 is constrained between the minimal and maximal actuation value y min and y max according to the robot structure.Except for the optimization control method in Eq. 2, the inverse Jacobian matrix is also utilized for control [39].
Considering nonlinearity and hysteresis of soft robots, the Jacobian matrix in the last step may not be accurate and suitable for this step.It is challenging for the original Jacobian estimation method to face complex tasks due to its oversimplification.Therefore, some researchers try to adapt this method for various applications.For example, the authors of [64] also emphasize that there may be a significant deviation between the estimated and real Jacobian matrix, and they alleviate this issue by smoothing the activation values.Forcedisplacement model is included for force control in [20].When end effector and actuation displacement between only one step is involved in [34], the displacement among multiple steps is utilized in [65].Also, the Jacobian matrix is adjusted in [66] by multiplying a rotating matrix considering the difference between the intended and measured motions.The fusion of sensing information from a camera and FBG provides accurate positions in [63] for stable and precise Jacobian matrix estimation.
Although so many adaptations have been proposed, the Jacobian matrix is still an initial solution for many labs and motivates applying the other approaches, especially datadriven ones.A neural network is applied for control in [67], which is based on the Jacobian matrix estimation process.A honeycomb pneumatic networks arm is controlled by the inverse Jacobian matrix in [68], and the same soft robot is included in [69] for challenging manipulation tasks.The latter work employs the Jacobian matrix in a low-level behavior controller and compares it with RL.
In summary, the Jacobian matrix is a concise and simple method for soft robots.The matrix is updated online with a high frequency in most cases, which can be achieved due to the simple structure.Meanwhile, the oversimplified linearization necessitates online updating and high control frequency.A comparison of some typical papers using the Jacobian matrix is shown in Table III.This table contains the first paper employing the Jacobian matrix for soft robot control [34].Force control is achieved in [20] considering a force model in the control optimization problem.The authors of the last work [69] take inspiration from the Jacobian matrix in [68] and implement RL based on the same robot.

III. STATISTICAL METHOD
Statistical methods are utilized to build the mapping functions between different variable spaces with only data, and physical relationships among these spaces are unnecessary.For example, the authors of [70] infer the relationship between the actuation input and image feedback for kinematic control, and the authors of [21] plan the actuation variables based on the temporal values.There are many regression approaches utilized in soft robots, like linear regression [71], local weight regression (LWR) [72], support vector regression (SVR) [73], Gaussian process regression (GPR) [25], and local weight projection regression (LWPR) [16].Other than the regression methods, the Gaussian mixture model (GMM) [74] summarizes collected data with a joint data distribution, and the extended Kalman filter (EKF) [39] estimates robot states as an observer.

A. Regression Method
Regression methods with different models are employed in soft robot modeling and control.These methods aim to fit the training data with a specific model, like a linear function or a Gaussian process, by optimizing the parameters and decreasing the loss.Then, the trained regression model can take some observation samples as model input and predict the corresponding values.For instance, linear regression, a simple regression approach, is applied in [71] to map FBG signals into soft robot end position for sensing.A linear function is included for fitting, and the parameter matrix is optimized.Similarly, LWR employs the linear function but considers the distance between the collected data and the observation samples for fitting.In [72], data from human demonstrations is used for fitting.The temporal value is taken as input to decide action for control.
SVR is utilized in [73] for the forward kinematic modeling and in [75] for the close loop position controller.It has been proven in [75] that SVR gets better approximation accuracy than NN on a simple function, but this model requires more convergence time than NN on a large amount of data (15625 samples) in [73], which may be caused by the different optimization strategies or mature NN optimization software.The SVR modeling and control algorithm in [75] can be summarized as where d, M are the dimensions of the mapping input x i and output y i , and n is the size of the learning dataset.The function ϕ(x i )W + B aims to estimate the mapping output with a nonlinear transformation ϕ(•) : R 1×d → R 1×h and the optimized matrices is a loss function and the extended Vapnik ϵ-insensitive loss function based on L2-norm is applied in [75].C is the tradeoff parameter that adjusts the estimation errors and regularization.For kinematics modeling, SVR takes the actuation as input x and estimates y as the robot state.For control, which can be seen as the inverse process of modeling, SVR receives the desired displacement as input and decides the actuation.GPR employs the Gaussian process, a probability distribution over functions, for fitting.For modeling, GPR predicts robot states, such as position or shape parameters, based on the training dataset and current actuation variables.For example, GPR is applied in [25] to predict the actuator curvature.Given , where i represents the dimension of each input x.The prediction process based on the test input x t can be shown as where µ(x t ) and Σ(x t ) are the predictive mean and variance.k(•, •) ∈ R is the kernel function applied in GPR, a squared-exponential kernel function in most cases, and . ., N .The above prediction process supposes that the prior mean is zero, and one can preprocess the sample output Y by zeroing the mean before fitting and prediction.It should be noticed that the noise of the mapping is considered in [70], which is assumed as white Gaussian noise with zero mean and variance σ 2 n .In this case, the prediction can be derived as With the forward model derived from GPR, the authors of [4] propose a control strategy by minimizing a cost function containing the predicted errors and actuation variables.GPR is also employed in close-loop kinematics control in [70] by predicting desired actuation variables based on the robot state feedback.The authors of [15] aim to predict the difference of robot states instead of only the next states as the modeling part in optimal control.Based on LWR, which utilizes each training data as a local model, LWPR projects training data into several linear models and weighs them with Gaussian kernels.The covariances of these kernels are decided by an incremental gradient descent based on stochastic leave-one-out cross validation criterion [76], [77].To find the optimization of the parameters in linear models, which is a redundancy problem, a null-space behavior is defined as a guide.This user-defined behavior is applied in a cost function and attracts the actual behaviors.For example, the robot elongation is minimized in [16] for a relatively straight shape, and the overall inflated chamber pressures are minimized during the control process in [78].

B. Gaussian mixture model
GMM encodes collected data into a data distribution composed of multiple Gaussian components.In the fitting step, GMM parameters are optimized to fit the collected data.During the prediction step, the input data works as the prior, and the posterior of GMM under some input data will be used as prediction output.Of note, once a GMM is built, each kind of data plays the same role in principle.In such a joint probability density function, every dimension can be applied as a prior and derive expectations on the remaining dimensions.In this case, GMM in [74] produces both forward kinematics modeling and position control strategy with actuation variables and desired end positions as the prior, respectively.The catheter kinematic GMM is represented by the joint probability density function: where p[k] denotes the robot state at step k.The forward modeling process can be shown as which is the conditional mean of the model Eq. 6 given the robot state p[k] and actuation a.The control strategy is which is the conditional mean of the model Eq. 6 given the robot state p[k] and desired state In addition to modeling and control, such data encoding characteristic develops some planning solutions.For example, the authors of [21] encode pose and temporal value into a GMM for navigating through narrow holes based on human demonstration.Moreover, other task parameters like the rotation matrix of the coordinate system are included as the planning objects in the GMM of [79].Making use of its encoding ability, GMM transfers demonstrations on rigid robots to the STIFF-FLOP continuum robot in [80].

C. Extended Kalman filter
Considering one existing model and its prediction, the Kalman filter can be applied as an observer and corrects the predicted values with measurement.Due to the nonlinear responses from soft robots, most modeling process is nonlinear, and the extended Kalman filter (EKF), instead of the original linear Kalman filter, is widely applied in soft robots.P, Q, R represent the estimation covariance, process noise covariance, and measurement noise covariance, respectively.In prediction, the state at the k + 1 step is predicted as p k+1|k based on the state p k|k and actuation a k at the k step.
where f (•, •) represents the nonlinear forward modeling, A k is the local linearization of f (•, •).The correction process corrects the prediction state to p k+1|k+1 considering measurement output s k from sensing. )), where g(•) represents the measurement process, C k is the local linearization of g(•).K k is the Kalman Gain and evaluates the reliability of measurement and prediction.EKF is commonly leveraged since it adjusts the modeling process and provides an accurate robot state estimation.For example, the authors of [81] map actuator variables and segment parameters into the robot pose to build the nonlinear forward modeling with the transformation matrices, and the sensing signal from position sensors is applied for correction.The authors of [82] aim to estimate robot poses and physical parameters and apply pose measurement for correction.NNs like wavelet networks [83] are also employed as the forward model in EKF for curvature angle estimation.In their following research, the authors of [84] also estimate the external force as the unknown system input based on the state estimation from a similar EKF.Similarly, the unknown external forces are taken as the system state p and accurately estimated in [85].Furthermore, due to the modular structure of the snake robot in [86], the EKF is adapted by changing the dimension of state variables according to the advancing or retracting motion for shape estimation.

D. Summary Statistical Method
Because statistical models only consider data relationships, such models have shown high potential to cope with various kinds of data.At first, statistical models only include robot actuation for modeling [87].Then, position feedback is involved in [75] for adaptive modeling and control.Furthermore, temporal values are applied for control in [21], [72].The authors of [70] even include visual information for kinematic control.Recently, sensing information from various sensors like resistance and force sensors has been employed in [84], [88] for modeling adjustment.Most statistical approaches can be applied for both modeling and control with different inputs, like two SVR models in [75].
Besides data categories, the methods also evolve over time.For instance, the Jacobian matrix is involved in the EKF in [85], and the pseudo rigid robot model takes its place in [89].The authors of [39] apply the adaptative Kalman filter, which shows strong robustness against the model nonlinearity and uncertainty instead of EKF.Recently, the unscented Kalman filter is leveraged in [88], which can apply the implicit Gaussian process for robot modeling.Also, GPR also evolves over time.The whole working space is divided into several regions, and each part requires a single GPR model for local modeling in [70].Then, only one online learning GPR model is employed to model the whole working space in [15], and a meta-learning GPR model is employed in [90] for multiple new unknown working spaces.
Statistical methods make data distribution assumptions from the perspective of statistics.They can attain an acceptable performance even with a small amount of data and become more effective with more data.Moreover, most of them can be leveraged for both modeling and control.A comparison of some typical papers applying statistical methods is shown in Table IV.This table first includes a simple regression model SVR in [75], then includes two regression approaches GPR in [15] and LWPR in [16].Finally, the observer EKF in [81] is introduced.

IV. NEURAL NETWORK
Considerable efforts have been focused on NN applications in the soft robot field.In the early years, extreme learning machines (ELM) [91] and radial basis function (RBF) [35] were popular choices.Nowadays, researchers prefer multilayer perceptron (MLP) [92] and recurrent neural network (RNN) [93] due to their generalized and sequence-related structures, respectively.Moreover, for some special proposes like image processing, autoencoder (AE) [94] and convolutional neural network (CNN) [95] are utilized.Some typical NNs are shown in Fig. 6.

A. ELM and RBF
ELM only comprises an input layer and an output layer.The input layer weights and bias are randomly assigned before training and fixed during training, while the output layer weights are trained to decrease loss.A simple loss is utilized in [96] for kinematics control, which only aims to decrease the estimation error.The ELM is where â, p denote the estimated actuation value and the robot state, W out , W inp denote the output and input layer weights, B denotes bias, and f (x) = (1 + e −x ) −1 is the sigmoid activation function.The training process of the original ELM in [96] can be shown as where A and P are all the real actions and input layer output in the training dataset, and Â is the ELM estimation based on P .The optimized output layer weights are Ŵ out = A • P † , where P † is the pseudo-inverse of the input layer output P .In addition to estimation error, the authors of [97], [98] involve the norm of the output weight matrix into optimization error as a regularization term to avoid too large output weights for kinematics control.Constraints on the range of outputs are   In the network feedforward, the channel number improves while the width and height decrease using kernels.
Finally, CNN outputs a matrix with dimension (Cn, 1, 1), and a fully connected layer is employed to map to the target dimension k.
applied in the ELM controller [3] according to the constraints of the real actuation.The training process can be shown as min where a is the real actuation value, α is the regulation parameter, N is the size of the training dataset, n is the degrees of actuation, and Ω represents a set of samples basically cover the input space.The second condition guarantees that ELM has the same actuation direction as the real conditions, and the third is the actuation range constraint.Recently, ELM has been utilized for online pose estimation in [40] thanks to its simple structure and fewer parameters compared with other complicated NNs.
Based on the structure of ELM, RBF changes the activation function from a sigmoid activation function to multiple Gaussian functions in [35] for forward kinematic modeling.Clustering can be applied to the training dataset to select the center of the Gaussian functions [98].Although ELM and RBF include some basic elements of NN, e.g., activation functions and neurons, they contain some designed constraints for specific tasks and waste a part of their potential mainly due to the fixed parameters.

B. MLP
MLP is the most popular NN.The diagram of MLP is shown in Fig. 6-(a).Generally, MLP can be shown as where W n , B n represent network weight and bias of n-th layer, and y, x are the output and input of MLP.To model a robot, the input is actuation, and the output is the robot state.For control, the desired and real robot states are input, and MLP can provide the estimated actuation.The first paper utilizing MLP in the soft robot field is [99], which is also the first paper utilizing NN.This work designs a particular parameter updating strategy for control instead of the backpropagation widely applied now.Similarly, training data is normalized in [92] with principal component analysis (PCA) instead of batch and layer normalization, which are widely employed currently.MLP has many hyperparameters and changeable components, and plenty of papers have been conducted on them.For example, the authors of [100] compare the performance of MLPs composed of different neuron numbers on position control tasks.Three activation functions (i.e., log-sigmoid, linear, and tan-sigmoid) are tested in [47] for pneumatic robot modeling.The stochastic gradient descent (SGD) optimizer is applied in [101] instead of the popular Adam optimizer.The authors of [24] comprehensively investigate the influence of hidden layer number, neuron number, and batch size on training time and validation loss of sensing tasks.In general, there is no optimal combination of the hyperparameters and component structures for all tasks, but most choices can obtain acceptable performance on most tasks.The detailed influence of each hyperparameter on the MLP performance has yet to be fully explored and explained.Some uncommon MLPs have been applied to soft robots.The authors of [22] provide targets to the hidden neurons of the MLP directly, similar to the conditional generative adversarial network (CGAN), which takes advantage of the information that is indirectly related to the mapping but restricts the potential of NN.A U-Net-like MLP is leveraged in [102] for robot modeling in model predictive control (MPC), which connects the former and latter layers sequentially.Multiple MLPs are connected in [103], which have different applications, e.g., forward kinematics modeling in simulation and sim-to-real transfer learning.The combined MLP can collect actions in simulation and estimate the corresponding robot states.
For soft robot modeling and control, some unique features of MLP are developed.For example, the inputs of some MLPs take physical models into consideration.Based on the concentric tube model, the translations and rotations of the concentric tubes are employed into MLP inputs in [104] for modeling.Furthermore, the authors of [105] compare different joint space forms of the concentric tubes as input on the forward modeling estimation tasks.Similarly, PCC is included in [54] by taking curvatures and curve lengths as MLP input.The authors of [106] utilize MLP to estimate physical parameters like mass inertia matrix, which is a model-based approach with NN.Due to the time-delayed motion, end positions in the current and past timesteps are fed into the MLP in [107] for control.

C. RNN
RNN is a kind of NN that is designed specifically for sequential data.The diagram of RNN is shown in Fig. 6-(b).Although MLP can also receive temporal data, it is fed into networks simultaneously and fails to infer the sequential relationship.RNN takes data in sequence using recurrent structure.The n-th bar in Fig. 6-(b) can be represented as where W * , b * are the RNN weight and bias parameters, S n is the hidden state of the step n, and f (), g() are activation functions.As shown in Fig. 6-(b), the network of each step takes the hidden state from the previous step.Such a structure infers the sequential relationship of data while the other NNs take the data input simultaneously.In this case, RNN is more suitable than the other networks like MLP and CNN for soft robots, which provide delayed responses.The modeling and control with RNN share the same data category requirement with MLP mentioned in Eq. 14, but RNN needs data from multiple time steps instead of a single time step.The first RNN applied is modified Elman NN in [108], which restores information in previous steps with context nodes for dynamic control.Then, researchers leverage a nonlinear autoregressive network with exogenous inputs (NARX) in a series of papers [19], [67], [93] for dynamic control.This kind of RNN receives outputs in previous steps as a part of input in the current step.
Long short-term memory (LSTM) is a kind of RNN proposed for long-term dependence issues.This network has been used on endoscopic robot distal force prediction [109], [110] and external force position and magnitude prediction [111].Moreover, such a network can cope with sensing signals from nonlinear time-variant soft sensors and achieves tasks like position prediction [112], object recognition [113], and shape reconstruction [94].In addition to perception, the authors of [32] dynamically control a robotic catheter with LSTM to decrease contact force.The authors of [114] employ LSTM as an offline dynamic controller to cope with the nonlinear behaviors.In summary, RNNs perform satisfactorily on various tasks due to their sequential structures and memory ability.

D. Special NN
There are some NNs that take advantage of visual information in soft robot control.For example, AE is utilized to extract features from the images of soft robots in [94] and estimate the robot's shape.CNN estimates shapes [115] and joint values [116] based on robot images.The diagram of CNN is shown in Fig. 6-(c).Also, CNN can predict the orientation of the placenta in [95] and encodes robot deformation for shape estimation with the help of markers inside the chamber in [117].Furthermore, although RNN is the most popular choice for sequential information processing, CNN can also be applied for dynamic modeling using rearranged pressure inputs [118].The space relationship in a matrix is leveraged to infer the sequential relationship of an actuation sequence for hysteresis modeling.A 3D NN is employed in [119] for a segmental surgical manipulator, which considers the time sequence between layers and the segmental sequence between neurons for planning.Spatial sequences of soft modular robots are considered in [120] by utilizing bidirectional RNN.A generative adversarial network (GAN) is utilized for synthetic data in [121].

E. Summary Neural Network
The application of NN in soft robots started with [99].First, some simple networks like ELM and RBF are included.Then, with the development of NN, the researchers give up the constraints of ELM, change the activation function of RBF, and enlarge the network.In this case, MLP becomes a good tool for both modeling and control.Furthermore, due to the hysteresis of soft robots, RNN is applied to deal with timerelated data.AE and CNN are employed to process images.
To summarize, owing to the large variety of structures, NN is attractive to soft robot research.For most issues in soft robot modeling and control, it is highly possible to find a related NN solution.However, such models require a large amount of data due to their complicated structures, and it is challenging to update them online.A comparison of some typical papers applying NNs is shown in Table V.This table begins with a simple NN, RBF [122].Then, a common NN, MLP [24], is included in this table.Finally, LSTM for control [32] and CNN for modeling [118] are summarized in Table V.
V. REINFORCEMENT LEARNING RL copes with high-level tasks by exploring the environment and exploiting data collected during exploration.This strategy trains an agent for complicated tasks and requires a long learning time and a massive amount of data.The agent is trained considering defined reward functions.This approach cannot be used for modeling.Moreover, it requires exploring environments, which may be provided by modeling methods.

A. RL implementation
In the early years, statistical models were applied as the agents in RL instead of NN.A GPR model named Gaussian process temporal difference method is employed in [123] to control an octopus arm.As the RL agent, a GMM is trained to estimate robot shape and contact in [124] and control a flexible surgical robot to go through a tube in [125].For robotic catheter control inside a narrow tube, a joint probability distribution is learned considering various variables like tip and entrance points, touch state, and action in [33].Q-learning is implemented with a Q table or Q function as the agent.The agent decides the action A based on the state S and a certain policy like ϵ greedy.The training process of the Q table or function can be shown as where S t is the state at time t, A t is the action decided by the Q table or function according to the state S t , and R(S t , A t ) is the reward function.α and γ are the learning rate and discount rate, respectively.max a Q(S t+1 , a) means the maximal Q value for the state S t+1 and every possible action a.For example, the authors of [69] exploit Q-learning for many sophisticated control tasks like turning a handwheel, unscrewing a bottle cap, drawing a line with a ruler, and so on.
With the help of NN as an agent, RL not only achieves simple tasks like position reach [126] and trajectory following [127] but also addresses some complex issues like gait design [128].A soft snake robot is controlled to move on the ground and arrive at target positions in [128].It is challenging to control the robot gait with traditional control methods, but RL is utilized for gait design and obstacle avoidance in [129].The authors of [130] fuse the visual and shape information with NN in RL and control a flexible endoscopy to navigate.

B. RL in soft robots
Soft robots can take advantage of various unique RL strategies specifically for soft robots to cope with some challenges.
The infinite DOFs lead to wide actuation and state spaces, which bring a burden to explore the whole environment.In this case, most researchers discretize these spaces.The action space in [123] is restricted to only six available actions, and the authors of [37] discretize the workspace into a 3D grid with a resolution of 0.01 m.Although discretization limits the RL potential, it produces simple spaces and decreases the training time.The soft robot in [131] is able to keep the end position invariant while changing the orientation with the help of RL.
One of the most considerable challenges of RL is exploring the real world, which has a high time cost and may damage robots.Therefore, modeling in simulation, especially with physical methods, is widely utilized in the training process.The authors of [132] first train an RL strategy to control a slave robot in a simulator named CoppeliaSim and then test it on the real robot.Constant curvature (CC), a soft robot modeling method, provides a simulation environment for [133] at first, and the NN agent continues to learn in the real world using the Deep Deterministic Policy Gradient method (DDPG).In most cases, the explored environment is modeled by a trained NN.For example, MLP is applied for modeling in [134] as the exploring environment.Also, RNNs are utilized in [23], [38], which are NARX and LSTM respectively, for forward modeling of segmented pneumatic robots.Then, RL agents are trained and validated in reality.

C. Summary Reinforcement Learning
RL application in soft robots first starts with the help of statistical models as agents, and then NNs take the place due to their generalization.Discretization is widely applied for RL in soft robots.Meanwhile, RL leverages soft robot simulators and stimulates their development.With such RL strategies, now soft robots can achieve some complicated tasks.

VI. CONCLUSIONS AND DISCUSSIONS
In this section, we summarize the foundations, data-driven methods, and their representative papers in Subsection VI-A.The benefits and limitations of data-driven approaches involved in the review are included and compared in Subsection VI-B.Finally, we forecast the emerging directions for soft robot research in Subsection VI-C.

A. Summary
This review summarizes the data-driven approaches applied in soft robot modeling and control.The physical approaches provide simulators for data-driven methods, like SOFA [48], Pyelastica [36], and SoRoSim [53].The Jacobian matrices describe soft robots and control the robots with the inverse Jacobian matrix or optimization.The authors of [34] firstly utilize the Jacobian matrix in the main approach.
Statistical models aim to achieve modeling and control with datasets.Regression methods like GPR [70] and LWPR [16] can estimate the mapping functions, and each trained model can be applied for either modeling or control according to the input of the training data.GMM [74] encodes the dataset into a joint data distribution, which can be applied for both modeling and control.Observers like EKF [88] can estimate robot states based on one existing modeling method.Recently, NN has been the most compelling tool for soft robots.ELM [3] was a popular choice in early years, and now MLP [100], which is more complicated, and RNN [19], which can cope with time-related data, have shown their benefits.Similar to statistical models, NN can also be applied for both modeling and control.RL shows good performance on position control [38], planning [31], and even some sophisticated manipulation tasks [69].Generally, RL does not provide a robot model but exploits an existing environment.The papers based on datadriven methods, which are cited in this review and published after 2019, are summarized in Table VII.
It is apparent that sophisticated models like RL require a larger amount of data while achieving better performance, but they also improve the computation cost.Oversimplified approaches like the Jacobian matrix are only feasible for limited simple tasks, but they are easy to understand and can achieve a high control frequency.Considering both cost and performance, each model has its own advantages and disadvantages, and there is no optimal solution for all tasks.

B. Advantages and Limitations
Statistical modeling and control approaches are moderate and flexible approaches in these methods.Generally, they only require a moderate amount of data, which is less than the NN requirement but more than the requirement of the Jacobian matrix.Also, the control frequency is lower than that of the Jacobian matrix but higher than the RL frequency.One statistical model can be applied for both modeling and control online, but some models are only local models and lack generalization ability, which can be proven by the working space segmentation in [70].NN is very suitable for soft robot modeling and control due to the nonlinear activation functions and complex network structure.This model shows its general applicability.However, the data amount requirement accelerates the aging of soft robots and takes a long time for data collection.NN cannot be a fast and online approach due to the slow training process.By exploring the environment and exploiting the data, RL can achieve some high-level tasks like navigation.Careful planning is not required even for sophisticated tasks.Meanwhile, RL has a high requirement for time and computation resources.Exploring the real world may damage the robots, and some environments, like the human body, cannot be applied for exploring.Highly realistic simulators are required if the RL agent is trained in simulation.To summarize, RL is a useful but consuming approach.

C. Emerging Directions
In the other research areas like computer vision and natural language processing, the original NN is simple at first [135].Then, more complex NNs with large sizes and different structures are proposed, like ChatGPT, BERT [136] and YOLO [137].Similarly, the size and complexity of NN in soft robots will improve.Such models can achieve more challenging tasks compared with simple NNs.However, large models lead to low frequency for control implementation, and one should consider balancing the model complexity and computation cost based on the modeling and control requirement.
The research on soft robots begins with the application of one single approach and simple tasks.Recently, there have been some papers combining multiple methods to solve difficult problems like model mismatch.For example, the authors of [114] apply offline RNN and online kinematics model for control.MPC and iterative learning controller are combined in [60].Two NNs are included in [138] for RL agent and model mismatch adjustment.As discussed in Subsection VI-B, each method shows its own advantages and disadvantages.The usage of multiple methods can take advantage of every approach and achieve better performance.
The medical environment, as one of the most significant applications of soft robots, has a high standard for safety, efficiency, and convenience.To involve soft robots in medical applications, some advanced modeling and control strategies are required.Although so many works achieve controlling the robot end pose, it is challenging to control the whole robot shape and avoid contact which may damage the human body.NN and RL lack interpretability and are challenging to apply in real surgery.Also, it is impossible for RL to explore in vivo, and RL agents can only be trained in simulation or physical simulators.The automatic medical soft robot control is still in its nascence from the aspect of safety and data requirements.Cooperation among robotics researchers, doctors, and related departments is required to address these issues.

Fig. 1 .
Fig.1.Diagram of soft robot modeling and control processes.Robot design (grey) provides a specific soft robot structure and actuation pattern.The sensing system (green) obtains robot information via the sensors of such a robot.Modeling f θ is a function that predicts the robot state p (blue), as end position and robot pose, according to the actuation variables a (red.)Control gτ aims to decide actuation variables a with the desired state p d and sensing inputs.Finally, such a robot system can achieve a variety of tasks.θ and τ represent the parameters in the data-driven methods of modeling and control, respectively.

Fig. 4 .
Fig. 4. Diagrams of (a) FEM, (b) PCC with pseudo-rigid model, (c) Cosserat rod model, and (d) concentric tube model.Element motion and deformation in (a) represent the soft robot deformation.The grey soft robot in (b) is represented by a series of constant curves, and the augmented rigid robots model the soft robot dynamically.The Cosserat rod model in (c) includes forces and moments inherently.The concentric tube model in (d) represents the robot motion with the configurations l 1 , l 2 , θ 1 and θ 2 .

Fig. 5 .
Fig.5.diagram of the Jacobian matrix estimation and control.Based on the Jacobian matrix initialization, the Jacobian matrix is updated based on the sensing s.Then, the estimated Jacobian matrix is applied for control.

Fig. 6 .
Fig. 6.Diagrams of (a) MLP, (b) RNN, and (c) CNN.MLP is composed of multiple layers.Parameters W * , B * in one layer are in parallel and can be trained simultaneously.RNN takes the input data x sequentially, and each bar shares the same weights W S , Wx, Wy, bx, and by.CNN obtains matrix with channel C 1 , height H 1 , and width W 1 as input.In the network feedforward, the channel number improves while the width and height decrease using kernels.Finally, CNN outputs a matrix with dimension (Cn, 1, 1), and a fully connected layer is employed to map to the target dimension k.
[34]included robot model is just a general model like p = f θ (a) instead of an explicit and physical one, and it is only employed for illustration instead of calculation[34].