Home 9 AI 9 Science Under Scrutiny Faces a Replication Reckoning

Science Under Scrutiny Faces a Replication Reckoning

by Ruchika Saini, AI | Apr 6, 2026

Large-scale study reveals limits of AI and human judgment in predicting reliable research.

Brian Nosek, an executive director at the Center for Open Science. In the 2010s, Dr. Nosek and colleagues replicated 100 psychology papers and matched the original results only 39% of the time (source: Kirsten Luce for The New York Times).

Scientific research produces millions of studies each year, but not all findings withstand scrutiny. A large-scale initiative known as the SCORE project set out to address this challenge by exploring whether artificial intelligence could predict which studies would replicate successfully. Backed by the Defense Advanced Research Projects Agency, the effort aimed to create a “credit score” for science, helping policymakers and researchers judge the reliability of published work, tells The New York Times article.

To build this system, researchers analyzed nearly 3,900 papers across social science disciplines and conducted replications of selected studies. The results revealed a persistent issue: only about half of the replicated studies produced the same outcomes as the originals. This finding aligns with earlier replication efforts, reinforcing concerns about the reliability of published research.

The project also examined how methodological choices influence results. When multiple teams reanalyzed the same data using different approaches, they often reached inconsistent conclusions. Even when using identical datasets and code, discrepancies still emerged, highlighting the role of coding errors, data handling issues, and analytical decisions in shaping outcomes.

Artificial intelligence, initially envisioned as a solution, fell short of expectations. While AI systems detected some patterns associated with replicability, their predictions were not reliable enough for independent use. Human experts performed better, correctly anticipating replication outcomes about three-quarters of the time, but even their judgments were far from perfect.

Beyond its original goal, the SCORE project exposed deeper structural challenges in scientific practice. Replication studies remain undervalued, often lacking funding and publication opportunities. Researchers also face cultural and institutional pressures that discourage verification work in favor of novel findings.

The study points toward potential improvements, including greater transparency in sharing data and code, pre-registering experimental plans, and creating incentives for replication. Ultimately, the findings underscore a central truth: science advances through correction as much as discovery, and ensuring reliability requires sustained effort across the research ecosystem.