For nearly a year, the question "which frontier model for production coding work" had a default answer (Claude Sonnet 4.5 for most work, Claude Opus 4.6 for the hardest tasks) and a contender (OpenAI's o1/o3 family for math-heavy reasoning). That has been substantially reset twice in the last nine months.
OpenAI shipped GPT-5 on August 7, 2025 as its first unified reasoning and chat model, replacing the GPT-4o / o1 / o3 family with a single model that dynamically routes between fast and deep thinking (OpenAI, Introducing GPT-5). Anthropic followed with Claude Opus 4.7 on April 16, 2026, shipping a 1M-token context window, a new top-tier xhigh effort level, and an SWE-bench Pro score of 64.3% (Anthropic, What's new in Claude Opus 4.7; llm-stats.com, Opus 4.7 benchmarks).
Both are production-ready for freelance coding work in May 2026. Both have real failure modes. The honest answer to "which one for client work" depends on the shape of the engagement, not the headline benchmark numbers.
What each one is actually good at
OpenAI's published numbers and independent benchmarks consistently show GPT-5 at or near the top on: