'In-the-wild' mobile manipulation aims to deploy robots in diverse real-world environments, which requires the robot to (1) have skills that generalize across object configurations; (2) be capable of long-horizon task execution in diverse environments; and (3) perform complex manipulation beyond pick-and-place. Quadruped robots with manipulators hold promise for extending the workspace and enabling robust locomotion, but existing results do not investigate such a capability. This paper proposes WildLMa with three components to address these issues: (1) a learned low-level controller for VR-enabled whole-body teleoperation and traversability; (2) WildLMa-Skill — a library of generalizable visuomotor skills acquired via imitation learning or an analytical planner, and (3) WildLMa-Planner — an LLM planner that interfaces and coordinates these skills. WildLMa exploits CLIP for language-conditioned imitation learning that empirically generalizes to objects unseen in training demonstrations. We then show these skills can be effectively interfaced with an LLM planner for autonomous long-horizon execution. Besides extensive quantitative evaluation, we qualitatively demonstrate practical robot applications, such as cleaning up trash in university hallways or outdoor terrains, operating articulated objects, and rearranging items on a bookshelf.
@article{qiu-song-peng-2024-wildlma,
title={WildLMa: Long Horizon Loco-Manipulation in the Wild},
author={Ri-Zhao Qiu, Yuchen Song, Xuanbin Peng, Sai Aneesh Suryadevara, Ge Yang, Minghuan Liu, Mazeyu Ji, Chengzhe Jia, Ruihan Yang, Xueyan Zou, Xiaolong Wang},
journal={arXiv preprint arXiv:2411.15131},
year={2024}
}