Integrating Model-based Control and RL for Sim2Real Transfer of Tight Insertion Policies

Isidoros Marougkas*, Dhruv Metha Ramesh*, Joe H. Doerr, Edgar Granados, Aravind Sivaramakrishnan, Abdeslam Boularias, Kostas E. Bekris
In Submission ICRA 2025
MY ALT TEXT

The proposed solution from left to right: A model-based policy is defined first by a potential function that provides a vector field for insertion under full observability. The output action of the potential function is integrated with an action given by a residual RL component. The residual RL policy is trained in simulation given noisy pose observations. A sparse reward is awarded only upon successful insertion without penetration. The final policy π is then zero-shot transferred to the real world, where observations come from an RGB-D-based pose tracking module and a controller translates the policy’s actions into robot joint controls.

MY ALT TEXT

Zero-shot transfer of the policy learned in simulation to a real forceful insertion of an unseen plug and socket.

Abstract

Object insertion under tight tolerances (< 1mm) is an important but challenging assembly task as even slight errors can result in undesirable contacts. Recent efforts have focused on using Reinforcement Learning (RL) and often de- pend on careful definition of dense reward functions. This work proposes an ef- fective strategy for such tasks that integrates traditional model-based control with RL to achieve improved accuracy given training of the policy exclusively in simu- lation and zero-shot transfer to the real system. It employs a potential field-based controller to acquire a model-based policy for inserting a plug into a socket given full observability in simulation. This policy is then integrated with a residual RL one, which is trained in simulation given only sparse, goal-reaching reward. A curriculum scheme over observation noise and action magnitude is proposed for training the residual RL policy. Both policy components use as input the SE(3) poses of both the plug and the socket and return the plug’s SE(3) pose transform, which is executed by a robotic arm using a controller. The integrated policy is deployed on the real system without further training or fine-tuning, given a visual SE(3) object tracker. The proposed solution and alternatives are evaluated across a variety of objects and conditions both in simulation and reality. The proposed approach outperforms state-of-the-art RL methods in this domain, as well as prior efforts in hybrid policies. Ablations highlight the impact of each component of the approach.

Video Presentation

MY ALT TEXT

Figure 6 from the paper: (Left) 3D printed objects from IndustReal [9] with 0.5 − 0.6 mm tolerance. (Middle) 3D printed custom objects with 2 mm (Easy), 1 mm (Medium), and 0.1 mm (Hard) tolerance. (Right) Real-world household objects not seen during training.

MY ALT TEXT

Plug insertion in simulation (top) and real world (bottom) for the same plug.

Comparison with alternative RL-based Object Insertion methods

  • IndustReal [9].
  • Combination of Model-based Control with Res. RL, with an alternative curriculum scheme [11].
MY ALT TEXT

Ablation Study

  • Learning the scaling parameters w of the Potential Field components with the Residual RL module.
  • Learning residual actions with the Residual RL module (set β=1).
  • Learning both the residual actions and the weight β with the Residual RL module.
MY ALT TEXT

NIST Taskboard Challenge - inspired objects - Sim2Real Transfer

Custom Objects - Sim2Real Transfer

Real Objects (unseen during training)

Forceful Insertion (comparison with human)

3 Prong robotic insertion
3 Prong human assisted insertion

Comparison of the trained policy deployed in the real world on a task requiring elevated and convoluted force profile and the same task performed by a human (both in the assembly and disassembly track). The discomfort observed by the human demonstrates how challenging the task really is.

MY ALT TEXT

Potential Field, Asymmetric Actor-Critic Inputs, and PPO Parameter selection.

MY ALT TEXT

Insertion percentage reported when the policy is deployed (out of 10 trials) on real, household objects, never seen during training.

MY ALT TEXT

Number of successful insertions over 10 real-world trials each. (Left) Comparison with IndustReal. (Right) Across different difficulty levels for the objects of Fig.6 - middle.