We present Decoupled forward-backward Model-based policy Optimization (DMO), a first-order gradient RL method that unrolls trajectories using a high-fidelity simulator while computing gradients via a learned differentiable dynamics model. This decoupling avoids compounding prediction errors in model rollouts and preserves the benefits of analytical gradients without requiring differentiable physics. Empirically, DMO improves sample and wall-clock efficiency across locomotion and manipulation benchmarks and deploys on a Unitree Go2 robot for both quadrupedal and bipedal locomotion tasks with robust sim-to-real transfer.
Quadrupedal Hardware Experiments (Go2 Walking)
Bipedal Hardware Experiments (Go2 Front-Legs Balancing)
Simulation Demos
First-order gradient reinforcement learning (RL) computes policy updates using analytical gradients of the RL objective with respect to policy parameters. Unlike zero-order methods, which estimate gradients using sampled perturbations, first-order methods leverage the chain rule, requiring access to derivatives of both the reward and environment dynamics. This enables more informative, lower-variance policy updates and often dramatically improves sample efficiency—provided that these gradients are available.
Previous first-order methods have taken two major paths:
DMO (Decoupled forward-backward Model-based policy Optimization) is a new first-order gradient RL method that decouples trajectory generation from gradient computation:
DMO thus combines the best of both worlds, bringing high sample efficiency, robust optimization, and reliable sim-to-real transfer. DMO can be applied on top of any first-order RL algorithm via this forward-backward decoupling.
We evaluated DMO across a suite of diverse continuous control benchmarks—locomotion and manipulation—using the DFlex GPU-accelerated simulator, as well as on real Unitree Go2 quadruped hardware, using IsaacGym for training. Our benchmarks compared against strong baselines: PPO (model-free), SAC (model-free), and MAAC (first-order model-based).
@inproceedings{amigo2025dmo,
title={First Order Model-Based RL through Decoupled Backpropagation},
author={Amigo, Joseph and Khorrambakht, Rooholla and Chane-Sane, Elliot and Righetti, Ludovic and Mansard, Nicolas},
booktitle={Conference on Robot Learning (CoRL)},
year={2025}
}