In unstructured environments, robotic manipulation tasks involving objects with constrained motion trajectories—such as door opening—often experience discrepancies between the robot's vision-guided end-effector trajectory and the object's constrained motion path. Such discrepancies generate unintended harmful forces, which, if exacerbated, may lead to task failure and potential damage to the manipulated objects or the robot itself. To address this issue, this paper introduces a novel diffusion framework, termed SafeDiff. Unlike conventional methods that sequentially fuse visual and tactile data to predict future robot states, our approach generates a prospective state sequence based on the current robot state and visual context observations, using real-time force feedback as a calibration signal. This implicitly adjusts the robot's state within the state space, enhancing operational success rates and significantly reducing harmful forces during manipulation, thus ensuring manipulation force safety. Additionally, we develop a large-scale simulation dataset named SafeDoorManip50k, offering extensive multimodal data to train and evaluate the proposed method. Extensive experiments show that our visual-tactile model substantially mitigates the risk of harmful forces in the door opening task, across both simulated and real-world settings.
SafeDiff is a visual-tactile diffusion policy that generates short-horizon robot state sequences (not actions) for safe door manipulation. An encoder of stacked Vision-Guided Mapping Modules (VMMs) conditions Gaussian noise on the current image and robot state via FiLM and temporal attention to propose a trajectory. A decoder of Tactile-Guided Calibration Modules (TCMs) then refines this trajectory using force feedback with cross-attention, applying residual corrections that suppress harmful forces while keeping motion smooth and feasible.
Real-world rollout on two different doors
left: seen in training; wooden textured, radius=0.8m
right: unseen in training; metal textured, radius=0.6m
Anti-disturbance study in real: a 20 N force is applied to the robot's end-effector. The robot learnt to resist the disturbance and continue opening the door.
Snapshot: Drawer opening setup. The robot opened the drawer along the linear constrained track.
Snapshot: Wheel turning setup. The robot rotated the turning wheel along the circular constraint track.
We create a large-scale door-manipulation dataset in Isaac Gym to train and evaluate SafeDiff, covering both seen and unseen doors with ground-truth contact forces.
@article{wei2024ensuring,
title={Ensuring Force Safety in Vision-Guided Robotic Manipulation via Implicit Tactile Calibration},
author={Wei, Lai and Ma, Jiahua and Hu, Yibo and Zhang, Ruimao},
journal={arXiv preprint arXiv:2412.10349},
year={2024}
}