Ensuring Force Safety in Vision-Guided Robotic Manipulation via Implicit Tactile Calibration

Abstract

Show

In unstructured environments, robotic manipulation tasks involving objects with constrained motion trajectories—such as door opening—often experience discrepancies between the robot's vision-guided end-effector trajectory and the object's constrained motion path. Such discrepancies generate unintended harmful forces, which, if exacerbated, may lead to task failure and potential damage to the manipulated objects or the robot itself. To address this issue, this paper introduces a novel diffusion framework, termed SafeDiff. Unlike conventional methods that sequentially fuse visual and tactile data to predict future robot states, our approach generates a prospective state sequence based on the current robot state and visual context observations, using real-time force feedback as a calibration signal. This implicitly adjusts the robot's state within the state space, enhancing operational success rates and significantly reducing harmful forces during manipulation, thus ensuring manipulation force safety. Additionally, we develop a large-scale simulation dataset named SafeDoorManip50k, offering extensive multimodal data to train and evaluate the proposed method. Extensive experiments show that our visual-tactile model substantially mitigates the risk of harmful forces in the door opening task, across both simulated and real-world settings.

Motivation

End-effector force decomposition: F_z (tangent to the door’s opening path) is the effective force, while F_x and F_y in the x–y plane are orthogonal to the trajectory and potentially harmful. These forces might cause damage to both the robot and the door and are referred to as harmful forces.

Method Overview

SafeDiff is a visual-tactile diffusion policy that generates short-horizon robot state sequences (not actions) for safe door manipulation. An encoder of stacked Vision-Guided Mapping Modules (VMMs) conditions Gaussian noise on the current image and robot state via FiLM and temporal attention to propose a trajectory. A decoder of Tactile-Guided Calibration Modules (TCMs) then refines this trajectory using force feedback with cross-attention, applying residual corrections that suppress harmful forces while keeping motion smooth and feasible.

Experiment Results

Door opening task

Real-world rollout on two different doors
left: seen in training; wooden textured, radius=0.8m
right: unseen in training; metal textured, radius=0.6m

Anti-disturbance study in real: a 20 N force is applied to the robot's end-effector. The robot learnt to resist the disturbance and continue opening the door.

Two more tasks

Snapshot: Drawer opening setup. The robot opened the drawer along the linear constrained track.

Snapshot: Wheel turning setup. The robot rotated the turning wheel along the circular constraint track.

Dataset Visualization

We create a large-scale door-manipulation dataset in Isaac Gym to train and evaluate SafeDiff, covering both seen and unseen doors with ground-truth contact forces.

The type, size, and position of the doors, mechanical properties of hinges, stiffness of robots, and lighting conditions were randomized
47,727 training demonstrations
4,580 test scenarios on the seen-door set;
4,438 on theunseen-door set
45 seen doors and 12 unseen doors
Door mesh is from UnidoorManip, which is gently modified for accurate force measurement in the simulation

Quantitative Results

Our SafeDiff model outperforms baselines by a large margin in both reducing harmful forces and improving task success rates in the door opening task. For the detailed analysis, please refer to our paper.

BibTeX

@article{wei2024ensuring,
  title={Ensuring Force Safety in Vision-Guided Robotic Manipulation via Implicit Tactile Calibration},
  author={Wei, Lai and Ma, Jiahua and Hu, Yibo and Zhang, Ruimao},
  journal={arXiv preprint arXiv:2412.10349},
  year={2024}
}