Home 9 AI 9 CDL Reports Cielara Code Leads Coding Benchmarks

CDL Reports Cielara Code Leads Coding Benchmarks

by | May 6, 2026

Cielara uses REASONARA graph memory to check agent output before deployment
CDL Team

SAN FRANCISCO, CA, May 6, 2026 – Causal Dynamics Lab (CDL) published research showing that AI coding agents spend most of their work finding code, not editing it. The study found that 56.8% of agent actions involved reading files, 24.2% used grep, and less than 1% involved direct code edits. CDL also introduced Cielara Code, an AI coding tool for code localization across complex software projects.

CDL reported that Cielara Code recorded the highest code-localization accuracy among tested AI coding tools. It posted higher results than Claude Code (Opus-4.6) and OpenAI Codex (GPT-5.4) across three independent tests. The data indicated that writing code was not the main failure point. Agents struggled to find the correct files and code sections to change.

“Every coding agent out there today uses grep, which is like a surgeon operating without imaging,” said Hasibul Haque, CEO at Causal Dynamics Lab. “We created Cielara Code to help agents see better: it provides a clear understanding of the working environment, making the reasons behind each change clear and verifiable.”

The 2025 DORA report linked AI coding tool use to a 7.2% drop in deployment stability. AWS CTO Werner Vogels described the issue as “dynamic verification debt.” Claude Code (GitHub issue #42796) shows a related limitation: current agents process code as flat text, without mapping file links, function calls or system-level effects.

How Cielara Code Works?

Cielara Code maps a customer’s production environment through a 6-layer causal graph. The graph records code function, development rationale, ownership, known limitations, deployment location and runtime behavior. When a failure occurs, the tool can trace the incident to a specific code change, the developer who approved it and the reason for the change. Before an agent starts code search, Cielara Code builds a Code Dependency Causal Graph that tracks four relationship types, allowing the agent to follow dependency links instead of scanning files one by one.

Image: CDL

Benchmark Results

Across three independent benchmarks, Cielara Code reported higher code-localization results than Claude Code (Opus-4.6) and OpenAI Codex (GPT-5.4). Overall localization accuracy reached 0.774, compared with 0.738 for Claude Code and 0.707 for Codex. On MULocBench (1,033 issues across 46 repositories), Cielara reached 0.752 recall@5, compared with 0.727 for Claude Code. It also reduced mean task time from 141.84 to 128.62 seconds. CDL said the results reduced wrong-file edits, failed runs and compute cost per task by 30% to 40%.

REASONARA: Causal Memory at Enterprise Scale

Cielara Code uses REASONARA, a graph-based causal memory layer that stores more than 125 million tokens of context and retrieves a smaller query-specific context set. A lookup uses 1,000–2,500 tokens, compared with 23,000–115,000 for full-context retrieval methods, reducing token use by up to 98%. On independent benchmarks, REASONARA records 94% on UltraDomain, 92% on LoCoMo, 73% on LoCoMo-plus and 87.4% on LongMemEval. REASONARA ran 5–8× faster than Codex high-reasoning mode. The roadmap targets a one-billion-token context window.

Image: CDL

Cielara Code checks AI coding-agent output rather than replacing the agents. 11 Fortune 100 companies and more than 40 Fortune 500 companies currently use Cielara Code on their codebases.

“Board members and auditors expect more proactive risk management. Leaders now want proof that security can anticipate risks caused by fast-moving AI and automation, instead of just reacting after incidents,” said the CISO, who is also a Cielara Code customer.

Phillip Miller, vice president, global chief information security officer, H&R Block added: “Enterprises need solutions to problems they cannot solve with people alone. Cielera’s technology is a generational leap towards the original promise of AI: tackling complexity 7×24 with acquired knowledge, deep reasoning, and unbeatable accuracy. For engineering teams, this means a single engine to discover faults in real-world deployments (including legacy, cloud) and provide clear resolution steps. When I wrote, Hacking Success, I described a world where AI needs strong, directive policy (not rules / guardrails) to be safe and effective. Information Security lags behind the innovation curve, as most options rely on legacy thinking including posture, gateways, and logging. Enterprises now have an option to leverage Cielera’s models to oversee deployments of AI agents, models, and their supporting infrastructure.”

The Team

CDL’s leadership team brings experience from software infrastructure, cloud systems and AI research. CEO Hasibul Haque previously led platform engineering at Uber. CTO Ryan Turner was a staff engineer at Uber and contributed to maintaining the SPIRE Project under the Cloud Native Computing Foundation (CNCF). R&D is led by Dr. Xuchao Zhang, a former Microsoft research scientist, and Dr. Liang Zhao of Emory University. CDL also has a research partnership with Emory’s AI Lab.

“AI has already changed how people find information. The next step is to change how people make decisions by exploring possibilities, comparing options, and understanding the outcomes before making a choice,” said Matt Fisher, former co-founder and CTO of Daydream and an Adjunct Professor at Brown University. “That shift towards exploring outcomes is what CDL is focusing on.”

Image: CDL

What’s Next?

The Production World Model provides the base architecture for Cielara Code and REASONARA, its first two products. CDL said it plans to extend the model to simulate how changes in code, infrastructure, policy and operations could affect production systems. The capability would give enterprise AI agents a persistent reasoning layer to check production impact before making changes.

Source: Causal Dynamics Lab

About Causal Dynamics Lab

Causal Dynamics Lab (CDL) is an AI software company based in San Francisco, CA. Founded in 2024, the company builds validation infrastructure for AI-generated software, production systems and enterprise engineering teams. Its work centers on the Production World Model, which links code, configuration, infrastructure, policy and change decisions. Its products include Cielara Code and REASONARA. The company serves software engineering, cloud infrastructure, reliability, security and AI development teams. Its tools apply to code localization, change analysis, deployment review and production risk assessment.