CVPR 2026 Findings · 4D Animal Motion Recovery

WildAni4D: Towards 4D Animal Mesh Reconstruction

A video-native framework for recovering temporally coherent animal meshes, articulated motion, and world-space trajectories from monocular in-the-wild videos.

Gyeongsu Cho1, Hezhen Hu2, Donghyeon Soon3, Changwoo Kang1, Kyungdon Joo1
1UNIST · 2The University of Texas at Austin · 3DGIST
Synthetic animal video generation Animal Video Transformer DROID-SLAM camera trajectory Sequence-level shape consistency
WildAni4D teaser: synthetic video generation and recovered world-frame 4D animal mesh motion
Abstract

Recovering animal motion in 4D.

WildAni4D tackles the data scarcity and temporal instability that make animal video reconstruction difficult.

Recovering 4D animal motion, including 3D geometry and global trajectory, is essential for quantitative biomechanics and behavioral analysis. Existing methods lack sufficient annotated video data and suffer from per-frame temporal instability. WildAni4D unites a synthetic animal video generation pipeline with the Animal Video Transformer, a reconstruction model that estimates temporally coherent motion using a single sequence-level shape and per-frame pose predictions. The resulting system reduces temporal pose flicker and shape drift, enabling large-scale 4D animal reconstruction and downstream applications including motion annotation, animatable reconstruction, and text-to-motion generation.
InputMonocular animal video
OutputMesh, pose, shape, trajectory
TrainingSynthetic videos with 3D labels
StabilityOne shape per sequence
Comparative Demo

Ours produces cleaner 4D recovery on Veo3 videos.

Three Veo3-generated animal samples are reconstructed with AniMer + DROID-SLAM, GenZoo + DROID-SLAM, and WildAni4D. The demo highlights temporal stability, shape consistency, and world-grounded motion recovery.

Baseline 1AniMer + DROID-SLAM
Baseline 2GenZoo + DROID-SLAM
OursWildAni4D

Across the three samples, our method preserves a consistent animal shape and predicts stable poses without jitter. In contrast, the baseline reconstructions show noticeable noise, including unstable tail motion and frame-to-frame shape fluctuations.

Contributions

A complete data-and-model pipeline.

Data

WildAni4D-Gen

Scalable synthetic video generation combining dynamic textured SMAL animals, diverse 3D scenes, and realistic camera motion.

Model

Animal Video Transformer

A video reconstruction model that uses temporal features and camera trajectory estimation to recover world-grounded animal motion.

Stability

Sequence-level shape

Predicting one shape for the full sequence suppresses frame-wise drift while preserving per-frame articulated motion.

Method

Synthetic videos meet video-native reconstruction.

WildAni4D first creates fully annotated animal videos, then trains a temporal reconstruction model for stable 4D mesh recovery.

WildAni4D synthetic animal video generation pipeline

WildAni4D-Gen. Dynamic animals, textured SMAL shapes, diverse 3D scenes, and camera trajectories are rendered into annotated training videos.

Animal Video Transformer architecture

Animal Video Transformer. Temporal modeling after the ViT backbone predicts per-frame pose and translation together with a sequence-level shape parameter.

Why it matters

Frame-wise methods often produce plausible single frames but flicker across time. WildAni4D makes the reconstruction video-native: the animal identity is consistent across the sequence while pose and global motion evolve frame by frame.

Results

Stable reconstruction across challenging sequences.

WildAni4D qualitative comparison results

Qualitative reconstruction results on challenging animal videos.

Additional animal reconstruction comparison results

Additional comparison visualizations across frames.

Applications

From reconstruction to reusable animal motion.

Temporally coherent 4D outputs can support annotation, animation, and generation pipelines.

WildAni4D downstream applications

Downstream applications include animal motion data annotation, animatable animal reconstruction, and text-to-motion generation.

Citation

Cite WildAni4D.

@inproceedings{cho2026wildani4d,
  title     = {WildAni4D: Towards 4D Animal Mesh Reconstruction},
  author    = {Cho, Gyeongsu and Hu, Hezhen and Soon, Donghyeon and Kang, Changwoo and Joo, Kyungdon},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings},
  year      = {2026}
}