| #1 | gemini-3.1-pro-preview | 27.610 ± 1.366 | 521 | 26.58 ± 2.44 (0.867) | 27.84 ± 2.41 (25.35) | 28.93 ± 1.81 (9.36) |
| #2 | qwen/qwen3.6-plus | 25.681 ± 1.401 | 357 | 26.60 ± 2.51 (0.877) | 23.40 ± 2.46 (23.20) | 27.41 ± 1.85 (9.49) |
| #3 | gpt-5.4-mini | 25.373 ± 1.351 | 548 | 23.17 ± 2.37 (0.802) | 23.21 ± 2.41 (26.61) | 31.93 ± 1.83 (5.88) |
| #4 | gemini-3-flash-preview | 25.321 ± 1.334 | 557 | 25.38 ± 2.38 (0.849) | 25.73 ± 2.35 (24.29) | 24.64 ± 1.79 (9.63) |
| #5 | deepseek/deepseek-v4-pro | 25.139 ± 1.407 | 364 | 25.77 ± 2.50 (0.880) | 27.57 ± 2.48 (24.04) | 20.72 ± 1.90 (16.01) |
| #6 | claude-sonnet-4-6 | 24.413 ± 1.340 | 541 | 24.71 ± 2.37 (0.863) | 27.30 ± 2.37 (25.61) | 19.89 ± 1.85 (14.60) |
| #7 | llama-4-maverick-17b-128e-instruct-maas | 23.885 ± 1.469 | 352 | 25.42 ± 2.61 (0.830) | 23.67 ± 2.61 (21.66) | 21.73 ± 1.94 (10.73) |