
“People deserve to know whether what they’re seeing is real. And as AI gets better at faking reality, we have to get better at revealing the truth,” say researchers at UC Riverside’s Marlan and Rosemary Bourns College of Engineering.
Amit Roy‑Chowdhury, a professor of electrical and computer engineering, and doctoral candidate Rohit Kundu, both from UC Riverside’s Marlan and Rosemary Bourns College of Engineering, have collaborated with Google scientists to build UNITE—the Universal Network for Identifying Tampered and synthEtic videos. UNITE was introduced in a paper at CVPR 2025, titled “Towards a Universal Synthetic Video Detector…” and combines transformer-based deep learning with a novel training mechanism to detect an array of video forgeries.
Unlike traditional deepfake detectors that rely on facial artifacts, UNITE processes whole-frame content—backgrounds, motion dynamics, and scene inconsistencies—making it adept at flagging fully synthetic or manipulated videos lacking faces. It builds on the SigLIP feature extraction framework and introduces an “attention-diversity loss” during training so that the model attends to multiple spatial regions per frame, avoiding overfocusing on facial areas.
For engineers, key architectural highlights include:
- Transformer-based temporal–spatial modeling that learns patterns across both motion and context.
- SigLIP incorporation enabling general-purpose feature representations outside of facial recognition.
- Attention diversification to encourage broad scrutiny across each frame.
With an extensive collaboration from Google—providing large-scale synthetic video datasets generated via text-to-video and image-to-video pipelines—the team trained UNITE to generalize across forgery types often missed by existing detectors. Though still in development, UNITE shows promise for application in content moderation by platforms, fact-checking, and newsroom verification pipelines. As generative video tools become mainstream, UNITE aims to be a universal detection backbone for combating video-based disinformation.