First Order Model-Based RL through Decoupled Backpropagation (DMO)

1 New York University, 2 LAAS-CNRS, 3 ANITI

Abstract

We present Decoupled forward-backward Model-based policy Optimization (DMO), a first-order gradient RL method that unrolls trajectories using a high-fidelity simulator while computing gradients via a learned differentiable dynamics model. This decoupling avoids compounding prediction errors in model rollouts and preserves the benefits of analytical gradients without requiring differentiable physics. Empirically, DMO improves sample and wall-clock efficiency across locomotion and manipulation benchmarks and deploys on a Unitree Go2 robot for both quadrupedal and bipedal locomotion tasks with robust sim-to-real transfer.

Additional Clips

Quadrupedal Hardware Experiments (Go2 Walking)

Bipedal Hardware Experiments (Go2 Front-Legs Balancing)

Simulation Demos

BibTeX

@inproceedings{amigo2025dmo,
  title={First Order Model-Based RL through Decoupled Backpropagation},
  author={Amigo, Joseph and Khorrambakht, Rooholla and Chane-Sane, Elliot and Righetti, Ludovic and Mansard, Nicolas},
  booktitle={Conference on Robot Learning (CoRL)},
  year={2025}
}