CoInteract Paper Proposes Spatially-Structured Co-Generation for Consistent HOI Video
The CoInteract paper, surfaced on HuggingFace Papers, introduces a spatially-structured co-generation approach for synthesising physically consistent human-object interaction (HOI) video. Current generative video models struggle to maintain physical plausibility when a human and object must interact in a constrained, contact-dependent way — CoInteract addresses this by co-generating the human and object trajectories under shared spatial constraints rather than independently.
Why It Matters
HOI plausibility is one of the last major failure modes of current video generation models for practical applications in film, training data synthesis, and simulation. Spatially-structured co-generation approaches this constraint directly rather than hoping the model learns it implicitly from data, suggesting a tractable engineering path toward physically reliable video synthesis.