Engineering brief
⚡️Making DeepSeek v4 outperform Opus 4.7 with Taste — @AhmadAwais , CommandCode.ai
The Brief
Deterministic repair logic for open-model tool calling dramatically improves performance, matching or exceeding closed models like Opus.
Decision relevance
Read this for workflow impact, implementation trade-offs, and the claims that need technical scrutiny before they reach team planning.

Summary
Ahmad Awais and the CommandCode team discovered a critical, fixable failure pattern in open-weight models like DeepSeek V4 and Kimi. These models exhibit 'tool confusion': when a tool call schema fails (e.g., due to a Zod validation error), the model ignores the error and retries the exact same malformed call dozens of times. This isn't a fundamental capability gap; it's a brittle contract between the agent harness and the model's training.
The fix is a deterministic repair layer that intercepts failing tool calls, fixes the schema (e.g., converting JSON strings to arrays, providing missing offsets), returns the successful result, and attaches a repair hint. This "save first, teach second" approach drops failure rates from ~50 retries to near-zero in a single step. The operational impact is massive: models previously deemed 'useless' become competitive with frontier closed models, radically altering cost-performance calculations.
The same pattern-fixing philosophy is being applied to 'design slop'—the generic AI aesthetic. By encoding designer-vetted heuristics (e.g., using OKLCH color spaces, work-pattern-first composition) into deterministic or skill-based corrections, the team can reduce AI-generated design homogeneity significantly. The counterintuitive finding is that many AI deficiencies are not intelligence gaps but contract and steering problems solvable through deterministic repair and constrained context workflows. The team plans to open-source the CommandCode agent, betting that hackability and curated model selection (Apple-like, not Windows-like) is a winning strategy against both closed verticals and fully open 'every model' platforms.
Why It Matters
Fixing tool-calling brittleness makes cheap open models a viable alternative to expensive closed models, slashing inference costs for coding agents.
Editorial analysis
Key claims
- Dogmatic reliance on closed models is a tax; open models work if you fix the tool-calling plumbing.
Practical use cases
- Use this as input for tooling evaluation, workflow planning, and technical due diligence.
Risks / caveats
- The specific brand of coding agent; the repair logic is a transferable technique for any harness.
Who should care
- Engineering managers, tech leads, and CTOs evaluating AI or developer tooling decisions.
Related topics
Bottom Line
Dogmatic reliance on closed models is a tax; open models work if you fix the tool-calling plumbing.
Watch
This video is blocked due to your privacy settings. To watch this video, please accept YouTube marketing cookies.
Related breakdowns
Cloudflare's Agent Infra: State + Sandboxed Code Execution
Durable state + sandboxed dynamic code could shrink tool catalogs. Cloudflare's bet: two primitives, not a thousand API tools. Strong guardrails required.
Railway's Agent-Native Cloud: Vertical Integration or Bust
Railway's pivot to an agent-native cloud challenges conventional PaaS and version control. Cooper's bet: own the metal to survive the compute demand of agent workflows.
Cursor's Composer 2.5: Walled Garden, Real Gains
Composer 2.5 delivers near top-tier coding performance at low cost, but it's locked inside Cursor's IDE. Great for existing users; a wait-and-see for everyone else.
Get TL;DW
Too Long; Didn't Watch.
A concise breakdowns of the AI and devtools videos that actually matter for engineering leaders.
Free. Weekly. No hype.
Video and thumbnails remain the property of their respective creators. tldw.news provides editorial analysis, commentary, and discovery links to original content.
