Disney’s 2-step technique transforms raw data into robot dance moves
Engineers at Disney Research have enabled robots to learn to dance by accessing unstructured motion data.
The team used a two-stage technique to control a character using the kinematic movement of the entire body.
First, they trained a variational autoencoder to create a latent spatial encoding by processing short motion segments from unstructured data. They then used this encoding to train a conditional policy that associates kinematic inputs with dynamics-aware outputs.
By separating these phases, the team improved the quality of latent codes and avoided problems such as mode collapse. They demonstrated the efficiency and robustness of the method in simulations and on a bipedal robot, successfully bringing dynamic motion to life.
Efficient movement training
Physics-based character animation has improved significantly in recent years through imitation-based reinforcement learning, which allows for accurate tracking of many skills. However, current methods fail to do this with just one strategy that can handle various dynamic raw movements while achieving full body control.
Learning-based character control methods have evolved significantly, especially in kinematic and physics-based motion synthesis. Kinematic approaches use compact motion representations or generative models to generate seamless and plausible motion, sometimes integrating physics engines to avoid artifacts.
Physics-based techniques based on deep reinforcement learning mainly focus on mimicking reference animations, but often require elaborate configurations for different skills. In this case, approaches use latent spaces to train policies while balancing data diversity and control accuracy. Nevertheless, this often requires custom setups or extensive retraining.
The Disney research team’s new technique efficiently trains a single guideline and provides robust, diverse and highly precise full-body control.
Dynamic control framework
The proposed method for controlling character motion consists of two parts. First, the Variational Autoencoder (VAE) is trained to generate a latent representation of motion from randomly selected short windows of data. This latent space captures essential motion features of a large and diverse collection of clips.
In the second phase, a reinforcement learning policy is trained that uses this latent code and current motion data to control the character, with the goal of achieving accurate tracking and smooth motion. The RL policy’s constraints on both the current kinematic state and the latent code help match new inputs with learned motions.
The authors pointed out that there are also rewards for tracking accuracy, survival and smoothness, as well as domain randomization to improve robustness and avoid overfitting. This method effectively handles unseen inputs and ensures high fidelity in motion control for both virtual and robot characters.
In addition, the technology can be effectively scaled with the variety of movements and training complexity, invisible dynamic movements can be precisely tracked and combined with common animation techniques.
Robust movement techniques
Researchers claim that demonstrations on virtual and physical humanoid characters show that this method robustly executes expressive movements even at the physical limits of the hardware.
Users can precisely control the characters’ movements using the kinematic motion interface, and the two-stage training method can handle a wide range of talents. The method is intended to be combined with different control modalities and generative tasks, but has not yet been directly tested.
However, there are problems with movements that require long-term planning, such as acrobatics, which may require more complex designs. Although the approach works well for tracking kinematic references, its generative potential is still unknown.
The researchers claim that by demonstrating expressive motion on robot hardware, this work brings together advances in computer graphics and robotics and suggests that self-supervised and RL techniques could lead to universal control policies.
ABOUT THE PUBLISHER
Jijo Malay Jijo is an automotive and business journalist based in India. He holds a BA in History (with distinction) from St. Stephen’s College, Delhi University and a PG Diploma in Journalism from the Indian Institute of Mass Communication, Delhi and has worked for news agencies, national newspapers and automotive magazines. In his free time, he enjoys off-roading, participating in political discussions, travelling and teaching languages.