Anthropic evaluated the model’s programming capabilities using a benchmark called SWE-bench Verified. Sonnet 4.5 set a new industry record with a 82% score. The next two highest scores were also ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results