We present Decoupled forward-backward Model-based policy Optimization (DMO), a first-order gradient RL method that unrolls trajectories using a high-fidelity simulator while computing gradients via a learned differentiable dynamics model. This decoupling avoids compounding prediction errors in model rollouts and preserves the benefits of analytical gradients without requiring differentiable physics. Empirically, DMO improves sample and wall-clock efficiency across locomotion and manipulation benchmarks and deploys on a Unitree Go2 robot for both quadrupedal and bipedal locomotion tasks with robust sim-to-real transfer.
Quadrupedal Hardware Experiments (Go2 Walking)
Bipedal Hardware Experiments (Go2 Front-Legs Balancing)
Simulation Demos
@inproceedings{amigo2025dmo,
title={First Order Model-Based RL through Decoupled Backpropagation},
author={Amigo, Joseph and Khorrambakht, Rooholla and Chane-Sane, Elliot and Righetti, Ludovic and Mansard, Nicolas},
booktitle={Conference on Robot Learning (CoRL)},
year={2025}
}