WildLMA: Long Horizon Loco-MAnipulation in the Wild

Ri-Zhao Qiu*             Yuchen Song*             Xuanbin Peng*             Sai Aneesh Suryadevara            
Ge Yang          Minghuan Liu          Mazeyu Ji          Chengzhe Jia          Ruihan Yang          Xueyan Zou          Xiaolong Wang



TL;DR: Long horizon loco-manipulation with LLM planner and wholebody imitation learning skill sets

Long-Horizon Tasks Demonstration

We chain imitation learning skills using LLM planner.

"Dispose a cup of water on the table"

"Pickup the takeout food delivery"

"Clean the spilt juice on the table"

WildLMa-Skill: In-the-wild Skill Learning

Grasp the bottle on the table.
Pick up the trash on the ground.

Grasp the water bottle on the ground.
Pick up the wallet on the concrete ledge.

Press the button to open the door.
Call an elevator.

Enter the office.
Rearrange the book on the bookshelf.

WildLMa-Skill: Generalizabilty

Task 1: Grasp the bottle on the table.

In distribution


Out-of distribution: Different Color
Out-of distribution: Transparent

Task 2: Press the button to open the door.

In distribution


Out-of distribution: Different Background
Out-of distribution: Completely Different Scene and Lighting Conditions

Whole-body Controller for Efficient Data Collection

Extended Workspace and end-effector-centric Tele-operation.


Extended Workspace and end-effector-centric Tele-operation.

Ours
Arm-only Flat Base
Decoupled Model-based

More Qualitative Teleop Results.

"Clean up the coke can."

"Pick the ball and place it into the basket."

"Good boy, hand over the ball for me."

"Hey, I am thirsty, find something in the fridge."

"Please heat it up."

Abstract

'In-the-wild' mobile manipulation aims to deploy robots in diverse real-world environments, which requires the robot to (1) have skills that generalize across object configurations; (2) be capable of long-horizon task execution in diverse environments; and (3) perform complex manipulation beyond pick-and-place. Quadruped robots with manipulators hold promise for extending the workspace and enabling robust locomotion, but existing results do not investigate such a capability. This paper proposes WildLMa with three components to address these issues: (1) a learned low-level controller for VR-enabled whole-body teleoperation and traversability; (2) WildLMa-Skill — a library of generalizable visuomotor skills acquired via imitation learning or an analytical planner, and (3) WildLMa-Planner — an LLM planner that interfaces and coordinates these skills. WildLMa exploits CLIP for language-conditioned imitation learning that empirically generalizes to objects unseen in training demonstrations. We then show these skills can be effectively interfaced with an LLM planner for autonomous long-horizon execution. Besides extensive quantitative evaluation, we qualitatively demonstrate practical robot applications, such as cleaning up trash in university hallways or outdoor terrains, operating articulated objects, and rearranging items on a bookshelf.

BibTeX


@article{tbd,
    title={tbd},
    author={tbd},
    journal={tbd},
    year={2024}
}