
A team of researchers led by Chung-Ang University and KAIST has introduced MoBluRF, a deep-learning method that reconstructs sharp 4D scenes and enables novel-view synthesis (NVS) from single-camera videos marred by motion blur, tells Tech Xplore. Blurry handheld videos are a big problem for reconstruction because motion (either from camera shake or moving objects) distorts frames, making geometry, texture, and even camera poses unreliable. MoBluRF addresses these issues with a two-stage pipeline incorporating motion decomposition.
In the first stage, i.e., Base Ray Initialization (BRI), the framework takes a rough reconstruction of the dynamic 3D scene from the blurred video. It refines the “base rays” (rays corresponding to camera views) to improve accuracy, correcting the errors that occur when using raw, blurry input directly. This provides a better starting point for detailed deblurring.
The second stage, Motion Decomposition-based Deblurring (MDD), uses an Incremental Latent Sharp-rays Prediction (ILSP) technique. Here, motion blur is split into two parts: global camera motion and local motion of objects. By disentangling these two sources, the algorithm can more accurately predict latent sharp rays (the true, clear view rays) for each pixel. MoBluRF also introduces new loss functions: one that separates dynamic from static regions without needing explicit motion masks; another that enforces geometric accuracy for moving objects.
Tests show MoBluRF significantly outperforms existing state-of-the-art NeRF-based methods on both synthetic and real datasets, especially under heavy blur. It’s more robust across a range of blur intensities. Because it works from standard handheld video, it could improve video-based 3D capture for smartphones, drones, AR/VR content, and robotics.
MoBluRF does more than sharpen; it recovers geometry and viewpoint information that blur tends to wash out. Though computationally intensive, it points toward a future where crisp, immersive 3D scenes can be generated from everyday video captures rather than specialized setups.