Learning From Demonstration Based on Environmental Constraints

We present a novel learning from demonstration approach which uses environmental constraints as the underlying representation to interpret and reproduce demonstrations. This representation based on environmental constraints separates the information that facilitates generalization from the information specific to object instances. Combined with adaptive controllers which fill in the instance-specific details during execution through explorative interaction, our approach generalizes from a single demonstration on an articulated object to different instances of the same object type. We test our approach in real-world experiments on contact-rich manipulation, using a series of mechanical locks as well as drawers and doors. The high success rate of 95% across all of these experiments provides strong evidence that environmental constraints are a powerful inductive bias for general and robust learning from demonstration.


I. INTRODUCTION
W E PRESENT an approach to kinesthetic learning from demonstration (LfD) in contact-rich manipulation tasks. The approach generalizes from a single demonstration of a task using an articulated object to other objects with similar but not identical kinematic structures, different sizes, and different object placements. To achieve such generalization with only a single demonstration, we must extract from the demonstration the information that facilitates generalization, ignoring information specific to the object instance. The key insight for achieving this stems from human manipulation. Humans extensively leverage contact with the environment during manipulation. We refer to this as the exploitation of environmental constraints [1]. We show that environmental constraints (ECs) are the appropriate representation for LfD, enabling generalization and robustness.
The function of many objects is encoded in their kinematic structure: doors, drawers, scissors etc. This structure constrains the relative motion of the object's parts. To the robot interacting with these objects, the kinematic model appears as an environmental constraint [1] i.e. a feature of the environment that dictates aspects of the motion the robot can perform while in contact with that object. It is an important observation that different objects of the same type can differ in the parameters of their kinematic model but the type of ECs that result from the model remain the same. This enables humans to transfer experience between objects of the same type. By using ECs as the underlying representation for the information extracted from human demonstrations, we can impart similar transfer abilities to robots. Environmental constraints also play an important role in the robust execution of policies learned from demonstrations. When manipulating articulated objects with tight clearances, small errors during motion execution can cause the robot to be blocked. The associated contact forces lead to perceptual aliasing of contact states relevant to the policy. To address these problems, our approach leverages the ECs extracted from the demonstration as an inductive bias for motion execution.
We leverage these insights to devise an approach to LfD based on ECs. We demonstrate the learning and transfer of a policy from a single demonstration with a lock, such as the one shown in Fig. 1, to other locks of the same type but greatly differing This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ physical properties, achieving success in 49 out of 50 trials. We show that success remains high, even when we rotate the lock relative to the lock's orientation in the demonstration. We also demonstrate the approach on a cabinet with three compartments, each having a different opening mechanism. Again, a single demonstration suffices to open all three. Our key contribution is to present this LfD system based on ECs and demonstrate that ECs are a suitable representation to achieve transfer and robustness in LfD for contact-rich manipulation tasks.

II. RELATED WORK
Learning from demonstration is a promising way to teach a robot various manipulation skills efficiently [2]. One common paradigm for LfD is behavior cloning, which learns a policy to map raw observations into low-level actions from demonstrations. This paradigm usually uses all available information in demonstrations. We argue that separating general information from unnecessary details is important for LfD. In this work, we achieve this separation by using ECs as inductive biases.
Another important paradigm of LfD is to learn movement primitives from human demonstrations [3]. These methods assume the motion characteristics in demonstrated movements encapsulate the essential information to reproduce the demonstrated behaviors [4], [5]. However, Kalakrishnan et al. [6] found that simply playback a demonstrated trajectory on a robot is not able to accomplish contact-rich manipulation tasks. One reason is that it is difficult to track trajectories in a chaotic environment. In addition, demonstrations do not contain the information about friction, which is crucial to be compensated to reproduce the demonstrated in-contact skills [7]. We thus hypothesize that demonstrated motion characteristics are not the right representation for contact-rich manipulation tasks. Therefore, our work focuses on deriving a policy based on ECs from the demonstration.
Environmental Constraint Exploitation (ECE) is introduced by Eppener et al. [1] to describe behaviors which leverage useful features provided by the environment. This concept has been successfully applied in various robotic applications, including grasping [8], [9], motion planning [10] and in-hand manipulation [11]. Our research is inspired by [8], [11], in which manipulation tasks are decomposed into a sequence of ECEs. In this study, we leverage this insight for LfD. Two characteristics distinguish our work from them. First, we propose to extract a sequence of ECEs from a human demonstration rather than using manually designed actions or highly engineered controllers. Second, we consider the problem in which it is difficult to derive the knowledge of ECs using visual information. The robot has to reveal additional information by actively interacting with the environment [12].
Recently, Klingebeil et al. showed the experimental result that humans tend to explicitly control and explore only a few contact states during manipulation [13]. Several works investigate extracting contact states using geometric information [14] or data-driven approaches [15]. In contrast, we tackle the problem involving extensive contact states due to the complex geometric model. Moreover, different contact states can produce similar sensory measurements in a chaotic real-world environment. Therefore, applying these approaches to solve our problem is challenging.
One application of ECE for operating unknown mechanisms is introduced in [16]. The idea is to use an adaptive controller to find and follow the admissible motion direction restricted by ECs. This concept exploits helpful guidances provided by ECs and does not require object-specific information, which ensures generalization. However, the major problem of this concept is that the adaptive controller cannot select a specific EC to exploit. Consequently, using this concept to operate articulated objects with multi-DOFs is challenging. To overcome this problem, we propose a method for extracting the necessary parameters from a human demonstration to instantiate a sequence of adaptive controllers to exploit different ECs. In addition, we present the idea of deliberately exerting a force to maintain contact. It reveals useful information about contact-changing events, which facilitate the transitions of ECs.
Research on learning geometric constraints of ECs from demonstrations is related to our work [17], [18]. Our experiments show that it is hard to explicitly learn the geometric constraints for ECs. Furthermore, detailed geometric models are specific to the object instance. Using this information undermines the generalization ability. By contrast, our approach concentrates on extracting general information about EC, leaving unnecessary details to the control. In this way, our approach carefully balances learning and control, thus delivering substantial generalization.

III. ENVIRONMENTAL CONSTRAINTS IN LEARNING FROM DEMONSTRATION
We now present the key arguments for ECs as the appropriate representation for robust and general LfD. While this letter demonstrates this only in the context of contact-rich manipulation, we believe that our general arguments transfer to LfD as a whole.
Environmental constraints encode essential information about the intended interactions between an agent and its environment. There are three possible reasons. First, this can be the consequence of human design: The functionality of articulated objects (scissors, laptops, drawers, can openers...) is encoded and-most importantly-produced based on their articulation. Each articulation provides an environmental constraint that is operated when using the object. Second, environmental constraints are relevant to the robust execution of many skills, for example, when we slide a credit card to the edge of the table to pick it up reliably [1]. And finally, even contact-free actions are naturally expressed through environmental constraints, if one includes visual constraints, such as the servoing of an end-effector to a particular placement (which acts as a constraint). This fundamental interwovenness of environmental constraints with the interactions of embodied agents with their environment justifies ECs central role in LfD.
Interpreting demonstrations as sequences of ECs allows us to separate general information (associated with the EC) and instance-specific information. The properties of the EC (motion and force directions and contact-changing events) will be identical across different object instances of the same type. But the exact trajectory, how far exactly the motion needs to be or how large a force needs to be, those are properties that vary across object instances. The latter is not part of the description of the EC. The use of ECs as the underlying representation for LfD thus enables the identification of motion patterns that generalize within object categories.
The general but incomplete information captured by ECs must be completed before a demonstration can be transferred successfully. Of course, a transferred demonstration must result in a fully specified trajectory. This does require the specification of information not captured by ECs, such as motion distances, force thresholds etc. To achieve this, we rely on adaptive control. We augment the motion template represented as ECs with information obtained directly from the current object instance by these adaptive controllers. This balances the responsibilities between the learning component and the execution component of LfD.
Each component address one-half of the problem: The learning extracts the generalizable motion template and the executive uses sensing in the object-specific case to fill in missing details.
The key advantage offered by the inclusion of adaptive controllers with explorative behavior into the LfD pipeline. Rather than executing a given trajectory, they explore interactions with the world to obtain important information about the object instance that was not present in the EC-based representation. This contributes greatly to the robustness of LfD, as can be illustrated by the following examples. When the execution of a fixed trajectory would get "stuck" due to modeling or execution inaccuracies, the explorative movement of the adaptive controllers will fill in the right details based on sensing, allowing to get "unstuck" again. The explorative movement of the controllers also alleviates the effects of perceptual aliasing. Even though different contact states might give very similar sensor feedback in a particular instance, explorative behavior can quickly disambiguate these states.
For the reasons laid out above, ECs should be used as a representation to achieve robust and general LfD. In the remainder of the letter, we demonstrate this in detail and successfully for contact-rich manipulation.

IV. DEMONSTRATION SYSTEM
We explain here how to extract general information (i.e. motion and force directions) from a human demonstration. To acquire demonstration data, we use a Franka Emika Panda robot arm with a 1-DOF gripper as the robot platform, as shown in Fig. 1. The fingertips of the gripper are covered with silicon to allow the robot to grasp articulated objects with irregular shapes. We mount a Force-Torque sensor on the robot's wrist to record the interaction force. The black square shape handle mounted on the wrist is used in the kinesthetic teaching phase, in which the user will hold the handle and move the robot to finish a task. Neither vision nor motion capture system is used. The demonstration is recorded in 100Hz frequency. We record end-effector position p ∈ R 3 and interaction forces f ∈ R 3 using the Force-Torque sensor. Both position and force measurements are in the robot's Given a demonstration, we first decompose a it into a sequence of segments. The transition between two environmental constraints is a distinct contact-changing event. For example, the robot is sliding the knob to the left. The sliding motion terminates when the pin on the bar hits the constraints, and the robot's velocity will suddenly decrease. Based on this insight, we use such events to segment the demonstrations. Concretely, we add a segment when the velocity suddenly drops or the motion direction changes rapidly. We use zero-velocity crossings (ZVCs) [19] as the segmentation algorithm. As ZVCs often over-segmented the demonstration, we filter out the segmentation points that are very close to each other or with similar motion directions. Then we get T r = {D 1 , D 2 , . . ., D k } with k segments. Given a segment D h with T timesteps, we calculate the relative motion direction m h as a 3D unit vector using the initial p start and end position p end . We calculate the force directionf h as a 3D unit vector perpendicular tom h to maintain contact A human demonstration is interpreted as a sequence of unit vector pairs of motion directionm and force directionf . We extract an ECE from each segment i.e. ECE = (m,f ), as shown in Fig. 3. In the next section, we explain how to use these motion and force directions to instantiate adaptive controllers.

V. MODELING ENVIRONMENTAL CONSTRAINT EXPLOITATION
In this section, we describe how to model an ECE as an adaptive controller which incorporates the idea of Interactive Perception [12]. In addition, we explain why our ECE controller can take care of the object-specific details. Our idea comes from the observation of how humans open a door with a handle. First, they rotate the handle and let the kinematic constraint guide the motion. Simultaneously, they apply a force in the direction of pulling the door. This force reveals a measurable contact-changing event by maintaining contact to help people switch from rotating to pulling.
Therefore, there are two parts for an EC exploitation: 1) a controller which can follow constraints 2) deliberately exerting a force to maintain contact to reveal useful information in contact changes. Note that the force directionf extracted from the demonstration is used to reproduce this maintaining contact behavior.

A. Constraint Following
We explain how to reproduce the constraint following behaviors. Our approach is inspired by the idea of following a motion direction with the least resistance [16]. Our adaptive controller consists of a velocity-based impedance controller and Particle Filter Optimization (PFO) [20]. The main difference between PFO and conventional particle filter is that PFO actively samples and executes an action from the estimated distribution and uses the outcome of the action as the measurement to update the distribution. Instead of estimating a particular distribution, the aim of PFO is to find the optimal p * based on sampling. We parameterize the motion directionm as azimuth θ and elevation φ in the spherical coordinate system. Each particle in PFO represents a motion direction, namely p = (θ, φ). The optimal particle p * = (θ * , φ * ) is the admissible motion direction. In addition, the optimal particle has the largest weight and vice versa.
The PFO algorithm works as follow. During one episode, PFO biases the particles' motions by adding a white noise w ∼ N (0, σ 2 ) (prediction step). Following that, PFO samples a motion direction x sample (θ, φ) and executes it i.e. moving end-effector into this direction. Then PFO will evaluate the action's outcome and assigns a new weight to each particle (update step). The evaluation criteria is based on the observed movement. If the movement is larger than a threshold (2 mm), PFO will increase the weights of the particles surround with the observed motion direction: where w i k+1 denotes the weights of the i particle at the k + 1 iteration, motion direction x k is derived from the observed end-effector movement Δp. The probability density function P (· | x k ) is defined as a normal distribution with mean x k and predefined variance σ.
The key idea is this: if PFO finds an admissible motion direction, the particles will surround this admissible motion direction. Consequently, the subsequent sampling will be close to the admissible motion direction. Therefore, PFO will drive the robot into the admissible motion direction i.e. following constraints. On the other hand, if the sampled action does not result in motion, the particle distribution will become broad, and its range is increased due to the Gaussian noise. As a result, sampling from a large distribution will be explorative. Unlike approaches that only use estimated admissible motion direction to move the robot [16], [21], our approach introduces the explorative property by sampling action from the particle distribution. This extension allows the robot to generate probing actions to actively reveal the information of admissible motion direction, which is an example of Interactive Perception [12]. It also avoids the problem of getting stuck in the friction cone.

B. Maintaining Contact
We then introduce how to use the force direction extracted from the demonstration to maintain contact while following an EC. As the force direction depends on motion direction, as (1), we first calculate the observed motion directionm obs by averaging the observed movementŝ N is the number of observed movements during the execution of the ECE h . The Δp i is the i th observed end-effector movements, norm is the normalization operator. We resample an action if m T obs · x sample ≤ 0. We do so to prevent the robot from moving back and forth while following an EC.
Once them obs is known, we can calculate the corresponding force direction. Assuming that the robot is executing the ECE h , we calculate the rotation matrix R ∈ SO(3) which can map extracted motion directionm h to observedm obs along the sphere geodesic (shortest path) under the condition ofm h =m obs [22]: We apply this rotation matrix to align the force directionf h in the current motion directionf = R ·f h , wheref denotes the corresponded force direction to the observed motion direction m obs , as shown in Fig. 4.

C. Executing an Environmental Constraint Exploitation
When the observed motion directionm obs and the corresponding force directionf are available, we can calculate a virtual Fig. 4. We use them h ×m obs as rotation axis and the ψ as rotation angle to calculate rotation matrix to algin them h tom obs as well asf h tof .
force F v ∈ R 3 to move the robot's end-effector where K m ∈ R 3×3 and K f ∈ R 3×3 are positive symmetric controller gain matrices for motion and force respectively, v d ∈ R 3 is the desired velocity magnitude. We do not control the orientation part. Therefore, we set 0 to angular terms of the desired external force F ext ∈ R 6 namely F ext = [F T v , 0 T ] T . We map the F ext to the motor torques τ c ∈ R 7 of the robot arm under the operational space schema [23] (5) where J ∈ R 6×7 is the Jacobian, J T + ∈ R 6×7 is the pseudoinverse of the Jacobian transpose, τ dyn (q,q,q) represents the to be compensated dynamical forces such as gravity and coriolis force of the arm, and τ null ∈ R 7 represents the null-space torques.

VI. CHAINING A SEQUENCE OF ECES
Given a sequence of ECEs, we construct a hybrid automaton [24] to reproduce the demonstration. Each state in the automaton represents an ECE. The edges are transition events between ECEs. These transitions are based on a contact-changing event that involves two parts: 1) The robot observes movement (≥ 10 mm) in the force direction. 2) The robot hits a constraint i.e. it fails to move after 30 sampling trials.
Once the ECE k−1 terminates and the subsequent ECE k is to be instantiated, the observed motion directionm obs can be used to predict the motion directionm k for the ECE k i.e. choosing the next EC to exploit. Similar to the calculation off as (1), we calculate the rotation matrix R which maps the motion direction m k−1 of ECE k−1 to them obs . We then get the subsequent motion direction bym k = R ·m obs , which is used to initailize the particles for the PFO.
Maintaining contact plays an important role in executing chains of ECEs. First, having to maintain contact reduces the set of admissible motion directions. This facilitates a single ECE. Second, maintaining contact leverages the idea of Interactive Perception [12] to exploit the correlation between forceful interaction and contact-changing events, which indicates the

VII. EXPERIMENTS AND RESULTS
We evaluate our approach on locks and cabinet opening tasks. In all experiments, we assume the robot knows how to grasp the knobs or handles. The quantitative results demonstrate generalization and robustness achieved by exploiting environmental constraints in LfD in contact-rich manipulation tasks. We also compare with approaches which uses geometric parameters as the representation.

A. ECs Enable Generalization in LfD
We examine generalization in the context of a lock-opening task. Based on a single demonstration with a single lock (see Fig. 5), we test the transfer of the extracted policy to other locks. We use five locks that vary in appearance and size but share the same locking principle (see Fig. 2). To unlock, we must first slide the knob to the left while pushing down. This motion ends when the pin on the bar hits the constraint (see inset in Fig. 1). We then must lift the knob while pushing against the constraint. Finally, the pin passes the small slot, and the knob is put down to the open position. The video supplement illustrates this functionality.
After recording a single demonstration on lock 1, our LfD approach segments the recorded data into three parts as described in Section IV. Each segment extracts an ECE, consisting of a motion direction and a force direction. To execute our approach in transfer experiments, the desired velocity v d is set to 0.13 m/s. The control gains are K m = diag(150) and K f = diag(75). Each transfer experiment starts with a known grasp location. Everything else is determined by the extracted policy. We define a transfer trial to be successful if the lock ends up in the opening position.
We use two versions of Dynamic Movement Primitives (DMP) as a baseline, one based on position control (DMP without force) and one based on position/force control (DMP with force) [25]. All three methods use the same demonstration data. We run 10 trials per lock for each method, for a total of 50 transfers on all locks, based on a single demonstration. The results are given in Table I. Our approach succeeds 49 out of 50 times. 1 The single failure occurs with lock 3 and we will discuss it in detail in Section VII-C. This demonstrates that our method generalizes very well, overcoming the three challenges we outlined in III. We interpret this as confirmation that exploiting environmental constraints are indeed the proper representation to achieve generalization in LfD. Fig. 6 illustrates the working of the proposed Fig. 5. This figure shows a kinesthetic demonstration on lock 1. We first slide to the left while pushing down (left). We then lift the knob while pushing against the constraint (middle). Finally, we put the pin down while pushing to the left (right). By providing such a demonstration, we can extract useful (force) information to exploit environmental constraints. Fig. 6. This figure shows five steps in one opening trial on lock 2. The pin is labeled with a red circle. The second row is the particle distribution. From (a) to (b), the robot hits constraints after sliding. It then tries to sample actions near the sliding direction but fails to observe movements. Therefore, the distribution gradually increases. method for a transfer experiment. The second row of that figure shows the distribution of particles for sampling actions. We can see that sampling actions from these distributions involve exploration property, which allows the robot to actively probe the environment and regulate the modeling and actuation errors.
The baseline approaches based on DMPs do not transfer well (see Table I). We attribute this to the specific nature of contactrich manipulation tasks, in which retracing a position-based demonstration is insufficient for generalization. Even worse, the two baselines do not even successfully open the same lock on which the demonstration is provided. The failure modes will be discussed in detail in Section VII-C. This confirms the results obtained in prior works [6], [26].

B. Execution Based on ECs is Robust
We test the robustness of transfer with respect to variations in lock placement. As our method relies on relative motion directions, we only need to test variations in lock orientation; generalization to positional placements is automatic. Also here, we claim that robustness is a consequence of exploiting environmental constraints. We place each lock at varying orientations (20 • , 40 • , and 60 • ) relative to the orientation during the demonstration. For each of the five locks, we run 10 trials per orientation. The setup is identical to the experiments described in Section VII-A, and we use the same demonstration. Table II contains the results.
Our LfD method achieves high success rates even in the presence of orientation errors up to 40 • . 2  opens the lock successfully in 43 out of 50 trials. In 5 out of 7 unsuccessful trials, the method opened the lock by passing the pin through the slot but failed to achieve the final opening position (see Fig. 1). Even when the locks are rotated by 60 • , our approach robustly manipulates lock 2 and lock 5. These results show that the exploitation of ECs enables our approach to compensate for perception and actuation errors, leading to robust transfer of a single demonstration.

C. Failure Analysis
All observed failure cases result from a misestimation of the motion direction associated with the active EC. This false estimation results in a realignment of the expected contact changing event associated with the EC. The motion is thus terminated prematurely and the remainder of the execution is equally misaligned.
There are two reasons for the false estimation of the motion direction. First, significant play between the moved part and the environment can lead to the detection of false motion directions. This happens in the single failure case of our approach in Table I. This type of failure can be avoided by adjusting the motion Fig. 7. Extraction of geometric parameters for lock 1 (left) and lock 4 (right), given a demonstration of a lifting action: end-effector trajectory (black line); orientation of the end-effector (coordinate frames); estimated lock axis (orange arrow); ground truth lock axis (red arrow); note the significant modeling inaccuracies (different between orange and red arrows). The orientation errors is computed as the relative angle between estimated and groundtruth orientations. The estimation errors of the axis of the lock are the deviation of the rotation angles.
threshold for detecting a motion direction (Δp) to the amount of play measured by the controller. The second reason for the false estimation of the motion direction is variability in the environment. When the expected motion direction based on the demonstration differs significantly from the actual motion direction, our controller has a difficult time finding the correct direction. This effect explains the data in Table II. The average success rate decreases with increasing orientation error of the lock placement. This type of failure emphasizes the need for visual information to compensate for the inaccuracy [27].

D. Comparison to Explicit Geometric Constraints
ECs are derived from geometric information but discard some object-specific geometric details in order to generalize better. In this section, we present experiments to show that an attempt to estimate the motion geometry of the lock does not lead to a successful generalization of the demonstration. We use least-squares regression [18], [28] to estimate the position and orientation of the lock and the rotation angle of the knob. This estimation relies on a single demonstration (so as to be comparable to our method) with locks 1 and 4. The estimation error compared to hand-measured ground truth is given in Fig. 7 and Table III. Given the estimated geometric parameters, We attempt to open the locks by generating trajectories for sliding and lifting actions. We track the trajectories using the same impedance controllers and settings as with our method. However, this method fails to open the lock 1, in which the robot gets stuck while lifting the knob, as the protrusion of the black pin constrains the motion of the sliver-colored pin (see middle inset in Fig. 5). Furthermore, the method also fails to open lock 4. Here, the estimated rotation angle is not precise enough to be able to pass through the small slot. These experimental results illustrate how difficult it is to extract geometric parameters of sufficient precision to enable generalization in learning from demonstration, even when generalization should happen on the same object.
In addition to this empirical evidence, there are also fundamental challenges to the estimation of exact geometric information from demonstrations. Visual information is subject to occlusions, in particular when manipulating smaller objects. And proprioception/tactile information is often obfuscated by slippage between the robot and the manipulated object. By relying on EC as a representation for demonstrations, we are able to represent the geometric information that supports generalization while relying on controllers to fill in the object-specific details during execution.

E. Other Manipulation Tasks
We also conducted experiments with opening the compartments of a cabinet, each with a different mechanism. 3 We have a prismatic mechanism (drawer), a revolute mechanism (door with a handle), and a combined mechanism (drawer that must be opened with a door handle), as shown in Fig. 8. We only record a single demonstration for the combined mechanism. Our approach derives the policy from this demonstration which successive exploits two ECs provided by a revolute joint of the door handle and a prismatic joint. To compute the success rate, we run 15 trials per component with three different grasp locations. Our approach succeeds 45/45 times. This superior success rate stems from that these compartments are fully constrained in 1 DOF, different from locks with multi-DOFs. In such cases, there is no ambiguity about the admissible motion direction. Therefore, estimation of the motion direction becomes easier and our approach circumvents the problem of misalignment, as introduced in Section VII-C. Overall, these experiments provide additional evidence that our approach effectively leverages the information obtained in human demonstrations to produce policies that transfer robustly to similar mechanisms.

VIII. LIMITATIONS
The application domain of our approach is restricted due to two crucial assumptions. First, our approach leverages the guidance provided by the environmental constraints, which assumes the manipulated objects are fully constrained and demonstrations can be segmented using contact-changing events. Therefore, our approach is restricted to the task of manipulation of articulated objects, such as doors, drawers or locks. However, we could still apply our central idea of using ECs to a wide range of LfD tasks. For example, learning in-hand manipulation skills from human demonstrations should exploit ECs provided by the fingers to funnel the action outcome [11], and learning grasping strategies from humans should not ignore the insight that humans extensively exploit the constraints present in the environment [1]. The second crucial assumption is that we assume the demonstration is informative i.e. has enough information about ECs. However, if humans are familiar with the manipulated objects, they might give demonstrations which contain much less interaction with the environment [29]. For this reason, it is essential to not only rely on the information from human demonstration, but also actively interact with the environment to fill in the missing information [12], or to online adapt the imperfect information to the environment uncertainty [7], [30].

IX. CONCLUSION
We demonstrate the efficacy of environmental constraints as an underlying representation for general and robust learning from demonstration. Environmental constraints enable the extraction of generalizable information from human demonstrations, separating out instance-specific information that hinders generalization. Using this approach, a single demonstration suffices to achieve generalization to a variety of different instances of the same object type. The proposed method then augments the information by using adaptive controllers. These controllers acquire information about the novel instance through explorative interaction. This combination of general, transferable motion templates and filling in the instance-specific details during execution based on interactive sensing leads to highly robust generalization. We demonstrate this in the context of contact-rich manipulation tasks with articulated objects. Our results validate environmental constraints as a key ingredient for general and robust learning from demonstration.