Long Horizon Loco-Manipulation in the Wild

'In-the-wild' mobile manipulation aims to deploy robots in diverse real-world environments, which requires the robot to (1) have skills that generalize across object configurations; (2) be capable of long-horizon task execution in diverse environments; and (3) perform complex manipulation beyond pick-and-place. Quadruped robots with manipulators hold promise for extending the workspace and enabling robust locomotion, but existing results do not investigate such a capability. This paper proposes WildLMa with three components to address these issues: (1) a learned low-level controller for VR-enabled whole-body teleoperation and traversability; (2) WildLMa-Skill — a library of generalizable visuomotor skills acquired via imitation learning or an analytical planner, and (3) WildLMa-Planner — an LLM planner that interfaces and coordinates these skills. WildLMa exploits CLIP for language-conditioned imitation learning that empirically generalizes to objects unseen in training demonstrations. We then show these skills can be effectively interfaced with an LLM planner for autonomous long-horizon execution. Besides extensive quantitative evaluation, we qualitatively demonstrate practical robot applications, such as cleaning up trash in university hallways or outdoor terrains, operating articulated objects, and rearranging items on a bookshelf.

WildLMA: Long Horizon Loco-MAnipulation in the Wild

Media coverage

ICRA 2025

TL;DR: Long horizon loco-manipulation with LLM planner and wholebody imitation learning skill sets

Abstract

Long-Horizon Tasks Demonstration

We chain imitation learning skills using LLM planner.

"Dispose a cup of water on the table"

"Pickup the takeout food delivery"

"Clean the spilt juice on the table"

"Clean the trash in the hallway"

WildLMa-Skill: In-the-wild Skill Learning

Grasp the bottle on the table.

Pick up the trash on the ground.

Grasp the water bottle on the ground.

Pick up the wallet on the concrete ledge.

Press the button to open the door.

Call an elevator.

Enter the office.

Rearrange the book on the bookshelf.

WildLMa-Skill: Generalizabilty

Task 1: Grasp the bottle on the table.

In distribution

Out-of distribution: Different Color

Out-of distribution: Transparent

Task 2: Press the button to open the door.

In distribution

Out-of distribution: Different Background

Out-of distribution: Different Light

Whole-body Controller for Efficient Data Collection

Extended Workspace and end-effector-centric Tele-operation.

Extended Workspace and end-effector-centric Tele-operation.

Ours

Arm-only Flat Base

Decoupled Model-based

BibTeX