Anthropic's Claude Opus 4.7 makes a big leap in coding, while deliberately scaling back cyber capabilities Key Points - Anthropic has released Claude Opus 4.7, which delivers a major jump in autonomous coding: it scores 64.3 percent on the SWE-bench Pro coding benchmark, up from 53.4 percent for its predecessor Opus 4.6. - New features include triple the image resolution and deliberately throttled cybersecurity capabilities—Anthropic tried to reduce risky cyber capabilities during training and automatically blocks related requests. - Per-token prices stay the same, but a new tokenizer maps the same text to up to 35 percent more tokens, meaning the actual cost per request can rise significantly. Anthropic's new flagship model Claude Opus 4.7 delivers major improvements in coding tasks.
During training, the company deliberately tried to reduce certain cybersecurity capabilities. Anthropic has released Claude Opus 4.7, a direct upgrade to its predecessor, Opus 4.6. The company positions the model primarily as a step forward in autonomous coding. On the SWE-bench Pro coding benchmark, Opus 4.7 scores 64.3 percent, up from 53.4 percent for its predecessor and ahead of OpenAI's GPT-5.4 at 57.7 percent. Anthropic's own top model, Claude Mythos Preview, still leads by a wide margin at 77.8 percent. Anthropic says Opus 4.7 follows instructions more precisely than its predecessor.
The company notes that prompts written for older models may now produce unexpected results, as Opus 4.7 interprets instructions more literally than Opus 4.6, which sometimes loosely interpreted or skipped parts of them entirely. Image resolution triples for better visual understanding Opus 4.7 processes images at up to 2,576 pixels on the long edge, which Anthropic says works out to roughly 3.75 megapixels, more than three times what earlier Claude models could handle. This isn't an API setting but a model-level change: images are automatically processed at higher resolution, though they consume more tokens as a result.
Users who don't need the extra detail can downscale images before sending them. Anthropic sees this as a major advantage for computer-use agents that need to read dense screenshots and for extracting data from complex diagrams. On the Document Reasoning benchmark (OfficeQA Pro), the company reports 80.6 percent accuracy, up from 57.1 percent with Opus 4.6. The benchmarks also show significant gains in biomolecular reasoning and visual navigation (ScreenSpot-Pro). Anthropic deliberately throttles cyber capabilities One of the more notable aspects of this release is how Anthropic handles the model's cybersecurity capabilities. The company says it experimentally tried to reduce certain cyber capabilities differentially during training.
New safeguards are designed to automatically detect and block requests that suggest prohibited or high-risk cybersecurity use. The background here is the recently announced Project Glasswing, in which Anthropic addressed the risks and benefits of AI models for cybersecurity. The company had explained that it would restrict the release of the more capable Mythos Preview and first test new safeguards on less capable models.
Opus 4.7 is the first test case for this strategy. Security researchers who want to use the model for penetration testing or red-teaming can sign up for a new Cyber Verification Program. Hallucinations drop but don't disappear According to the system card, Anthropic distinguishes between two types of hallucinations: factual hallucinations - wrong claims about the world, like fabricated quotes or incorrect data - and input hallucinations, where the model acts as if it has access to a tool or attachment that doesn't actually exist. For factual hallucinations, Opus 4.7 performs better than or on par with Opus 4.6 across four benchmarks but falls short of Mythos Preview.
Anthropic says the gap comes mainly from Mythos Preview's higher hit rate on obscure facts, not from a higher error rate in Opus 4.7. For input hallucinations, Opus 4.7 achieves the lowest hallucination rate of all tested models when users request a tool that isn't available. When context information is missing, it comes close to Mythos Preview and sits well ahead of older models. Anthropic acknowledges, however, that the test cases for the tool set were tailored to Opus 4.6's weaknesses, which skews that model's results. When dealing with questions based on made-up facts Opus 4.7 performs on par with Opus 4.6 and below Mythos Preview.
Under pressure, such as when users or system prompts push the model to contradict its own assessment, Opus 4.7 is more honest than Opus 4.6 but less firm than Mythos Preview. Alignment results are a mixed bag Overall, Anthropic describes Opus 4.7's safety profile as similar to Opus 4.6, with low rates of deception, sycophancy, and cooperation with misuse. The model is more resistant to prompt injection attacks. A known issue from earlier Claude models partially persists: refusing to help with legitimate AI safety research. According to the system card, Opus 4.7 still refuses to assist in 33 percent of simulated safety research tasks.
That's a significant drop from 88 percent with Opus 4.6, but still a substantial share. Same per-token prices, potentially much higher real-world costs Pricing stays at $5 per million input tokens and $25 per million output tokens. However, Opus 4.7 uses a new tokenizer that can map the same text to up to 1.35 times as many tokens. The model also generates more output tokens at higher effort levels. In practice, the cost per request can rise significantly even though the per-token prices remain unchanged. A new effort level called "xhigh" slots in between "high" and "max." Claude Code also gets a new "/ultrareview" command for dedicated code reviews and an expanded "Auto Mode" for Max users, where Claude makes decisions on its own.
Opus 4.7 is available through the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. More details and tips are available in the migration guide for Opus 4.7. AI News Without the Hype – Curated by Humans Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section. Subscribe now.