Claude Sonnet 4 5 Review Claude Sonnet 4 5 Is Anthropic S Medium

Bonisiwe Shabane
-
claude sonnet 4 5 review claude sonnet 4 5 is anthropic s medium

Claude Sonnet 4.5 achieved 77.2% on SWE-bench Verified—the highest ever. See real developer feedback, pricing, and why it's the 'world's best coding model.' On September 29, 2025, Anthropic released Claude Sonnet 4.5 and immediately claimed the title of "the world's best coding model." According to official benchmarks, it scored 77.2% on SWE-bench Verified—the highest score any model... But does that translate to real-world developer productivity? After analyzing hands-on testing, developer feedback, and verified performance data, here's everything you need to know. All data in this article is sourced from Anthropic's official releases, InfoQ technical analysis, developer testimonials, and verified benchmark leaderboards (November 2025).

Unlike previous models that lose context or start hallucinating after a few hours, Claude Sonnet 4.5 can maintain focus on complex tasks for more than 30 hours straight without degradation. Claude Sonnet 4.5 is the best coding model in the world. It's the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains in reasoning and math. Code is everywhere.

It runs every application, spreadsheet, and software tool you use. Being able to use those tools and reason through hard problems is how modern work gets done. Claude Sonnet 4.5 makes this possible. We're releasing it along with a set of major upgrades to our products. In Claude Code, we've added checkpoints—one of our most requested features—that save your progress and allow you to roll back instantly to a previous state. We've refreshed the terminal interface and shipped a native VS Code extension.

We've added a new context editing feature and memory tool to the Claude API that lets agents run even longer and handle even greater complexity. In the Claude apps, we've brought code execution and file creation (spreadsheets, slides, and documents) directly into the conversation. And we've made the Claude for Chrome extension available to Max users who joined the waitlist last month. We're also giving developers the building blocks we use ourselves to make Claude Code. We're calling this the Claude Agent SDK. The infrastructure that powers our frontier products—and allows them to reach their full potential—is now yours to build with.

This is the most aligned frontier model we’ve ever released, showing large improvements across several areas of alignment compared to previous Claude models. Anthropic just released Claude Sonnet 4.5, and the benchmark numbers are honestly absurd. According to Anthropic, the model scored 77.2% on SWE-bench Verified (70.6 on SWEBench official leaderboard)—a test that throws real GitHub issues at AI models to see if they can actually fix code like a... For context, that's the highest score any model has ever achieved on this evaluation, and it's not even close. But here's what makes Sonnet 4.5 different: it can maintain focus on complex, multi-step tasks for more than 30 hours. Not 30 minutes.

Not 3 hours. Thirty. Hours. Try it yourself through the Claude API using the model string 'claude-sonnet-4-5'. You can also read the Claude 4.5 System Prompt here. We'll break down more about that below.

Feel like there is a groundbreaking AI announcement every other week? I get it. The fatigue is real. It’s hard to distinguish the hype from the tools that will actually change how we work. But if you only pay attention to one release this season, make it this one. Anthropic just released Claude Sonnet 4.5 (as of late September 2025).

If you look at the raw numbers, they are absurdly good. It’s crushing benchmarks left and right. However, the benchmarks aren’t the real story here. The real story is stamina. Imagine hiring a brilliant intern who forgets everything you said after 30 minutes. That’s been the reality of most AI models until now.

Sonnet 4.5 changes the game. It can maintain focus on complex, multi-step projects for over 30 hours. TLDR: Claude Sonnet 4.5 scores 77.2% on SWE-bench Verified (82.0% with parallel compute), 50.0% on Terminal-Bench, and 61.4% on OSWorld. It reaches 100% on AIME with Python and 83.4% on GPQA Diamond. Pricing is $3 per million input tokens and $15 per million output tokens; you can use it on web, iOS, Android, the Claude Developer Platform, Amazon Bedrock, and Google Cloud Vertex AI. Anthropic released Claude Sonnet 4.5 on September 29, 2025, as the latest model in the Claude 4 family.

It improves coding performance, supports long-running agent workflows, and handles computer-use tasks more reliably. Let’s analyze its benchmarks, pricing, and how it compares with GPT-5 and Gemini 2.5 Pro in production use. Fewer misaligned behaviors; stronger defenses Code checkpoints, VS Code extension, Agent SDK Katie Parrott is a staff writer and AI editorial lead at Every. She writes Working Overtime, a column about how technology reshapes work, and builds AI-powered systems for the Every editorial team.

Five tests across blind comparisons, editorial standards, and deadlines—here's what changed our setup Early bird pricing for our Claude Code for Beginners class taught by Dan Shipper on November 19 ends tonight. Save $500 and reserve your spot today.—Kate Lee Since GPT-5 came out three months ago, my writing workflow has been straddling LLM providers: ChatGPT for drafting, Claude for editing. The setup works, but the back-and-forth is tedious: Copy a draft from one window, paste it into another, wait for feedback, then hop back to revise. I’ve been starting to feel a bit like a glorified traffic conductor.

Then Anthropic dropped Sonnet 4.5, and within 48 hours my workflow collapsed from two chat interfaces into one.

People Also Search

Claude Sonnet 4.5 Achieved 77.2% On SWE-bench Verified—the Highest Ever.

Claude Sonnet 4.5 achieved 77.2% on SWE-bench Verified—the highest ever. See real developer feedback, pricing, and why it's the 'world's best coding model.' On September 29, 2025, Anthropic released Claude Sonnet 4.5 and immediately claimed the title of "the world's best coding model." According to official benchmarks, it scored 77.2% on SWE-bench Verified—the highest score any model... But does t...

Unlike Previous Models That Lose Context Or Start Hallucinating After

Unlike previous models that lose context or start hallucinating after a few hours, Claude Sonnet 4.5 can maintain focus on complex tasks for more than 30 hours straight without degradation. Claude Sonnet 4.5 is the best coding model in the world. It's the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains in reasoning and math. Code ...

It Runs Every Application, Spreadsheet, And Software Tool You Use.

It runs every application, spreadsheet, and software tool you use. Being able to use those tools and reason through hard problems is how modern work gets done. Claude Sonnet 4.5 makes this possible. We're releasing it along with a set of major upgrades to our products. In Claude Code, we've added checkpoints—one of our most requested features—that save your progress and allow you to roll back inst...

We've Added A New Context Editing Feature And Memory Tool

We've added a new context editing feature and memory tool to the Claude API that lets agents run even longer and handle even greater complexity. In the Claude apps, we've brought code execution and file creation (spreadsheets, slides, and documents) directly into the conversation. And we've made the Claude for Chrome extension available to Max users who joined the waitlist last month. We're also g...

This Is The Most Aligned Frontier Model We’ve Ever Released,

This is the most aligned frontier model we’ve ever released, showing large improvements across several areas of alignment compared to previous Claude models. Anthropic just released Claude Sonnet 4.5, and the benchmark numbers are honestly absurd. According to Anthropic, the model scored 77.2% on SWE-bench Verified (70.6 on SWEBench official leaderboard)—a test that throws real GitHub issues at AI...