Claude code reigns supreme
...and it's not close, not even a little bit. I spent money for all the subscription plans so you don't have to. Don't be me. Just get Claude Code Pro.
There’s a lot of purported AI code assist tools, that supposedly enable developer productivity galore and reduce the time of development of complex features. As a developer, I have to say, there’s still a lot to be desired from even the most advanced Gemini pro 3.1 flash model. The same goes for the Nvidia Nemotron Super 3, these two are supposed to the frontier models for code assist.
I’m here to tell you regardless of what the quantitative benchmarks say, these are lightyears behind Claude Code. I compared these supposed state-of-the-art code assist tools across a variety of data engineering tasks including data analysis and data validation. In addition to comparing against Claude Opus 4.6, I also compared with Claude Sonnet 4.5.
None of the other models came close to being as useful as Claude Sonnet 4.5; let alone, being comparable to Claude Opus 4.6. The only drawback to using Claude Code is the recently introduced throttling. I’ve been burning through pro session tokens within a fifth of the session window on average. For the last month or so, I’ve also been hitting the weekly cap, in a casual weekend of tinkering.
The value of Claude Code tokens compared to Gemini tokens isn’t even comparable on a continuous function, I have to say. There seems to be a step function type drop off in the utility of Gemini when compared to Claude Code. For context, Gemini pro offers a generous 1 million tokens per session. This is roughly 5x as much as the Claude Code pro subscription, though the amount of useful work being achieved with Gemini pro is currently 1/100th what Claude Code can achieve.
Let’s start with a basic data analysis task like data de-duplication. In many scenarios, we often have to reconcile similar looking records and determine a strategy to maintain rigorous data quality while not blindly including duplicate records. When you have multiple overlapping data sources of the same information, it can be quite an arduous task to get the most out of each source without sacrificing data quality. If you’re trying to make any sort of meaningful predictions, making sure you have unique data and preventing overfitting is a constant struggle.
Claude Sonnet 4.5 excels, Gemini Pro 3.1 struggles mightily, and Nemotron 3 Super babbles along like Dory from Finding Nemo.
I’ve found that the doc strings each assist tool generates say volumes about the level of sophistication of each tool.
Take a look at the examples below.
Claude Sonnet 4.5
"""validate_model.py — Out-of-sample validation for the prediction model.
Two modes:
--holdout Score the temporal test split from validation_set
(matches on/after the cutoff used in compute.py --validate).
Uses models/train_only.pkl so the model has never seen
these matches.
--new-only Score any matches added to scores since the last
validation run.
Appends results to validation_results for rolling tracking.
--fotmob Score all finished matches in match_results
using team-average features from scores — the same
inference path used by predict_upcoming_matches.py. Adds rows to
both validation_set (split='test') and
validation_results.
In all modes the script:
1. Loads the appropriate pickle
2. Reconstructs the full feature vector (base 9 + transforms)
3. Predicts result direction from score differential
4. Computes accuracy by competition, confidence tier, and result class
5. Writes output/validation_report.json
6. Upserts rows to validation_results
Usage:
python scripts/validate.py --holdout
python scripts/validate.py --new-only
python scripts/validate.py --fotmob
python scripts/validate.py --holdout --no-db # report only, no DB write
"""
Gemini Pro 3.1 Flash
"""
validate.py — Validation engine for the model.
Tests model performance against match results by constructing
inferred feature vectors and calculating accuracy against actual outcomes.
"""
Nvidia Nemotron 3 Super
""" Verify that main logic correctly validated set validation set validated set
"""It’s almost a waste of time to use any tool but Claude Code. Gemini 3.1 pro for example, continually excludes files that are already explicitly included in it’s context. Even Nemotron Super 3 doesn’t try to lie that a preceding user action didn’t happen.
Claude Code on the other hand, doesn’t ask you to repeatedly reset your context window. When it does make a mistake, they’re very easy to correct and Claude Code does NOT make the same mistake twice in a row. The iteration speed is rapid and it seems to use a more exhaustive search when traversing a graph of information. Gemini Pro 3.1 and Nemotron Super 3 both get stuck on flip flopping between the same two options, even when neither are working. They seemingly lack the comprehensive session memory that Claude Code demonstrates extremely well.
The tough part of Gemini Pro 3.1 is what seems to be a great degree of “over-confidence”. It’s continually reminding me that it’s a “world class software engineer” while requiring me to provide increasingly specific instructions, then proceeding to completely misunderstand intent.
I’ve had a few side projects that have been collecting dust on my hard drive as things I always mean to get to. They largely consist of what I would consider boring, repetitive tasks. In other words, a volume problem. The exact type of problem ML and AI models are supposed to excel at. This is the type of benefit these AI tools offer me. I can’t clone myself ever, but I can set an agentic tool on a not so fun task of muddling through upgrading package dependencies.
I’m going to leave Claude Opus 4.6 out of this discussion — it’s never nice when the 7th grade bully comes and tries to beat up on kindergarteners — so I won’t compare that to Nvidia Nemotron Super 3 and Gemini Pro 3.1 .
Claude Opus 4.6 is a production ready code assist agentic tool, that you can charge with multiple complex, ambitious tasks when you need results. I’ve been refining multiple side projects in parallel with the help of Claude Opus 4.6, and I’ve seldom been disappointed. Now…. if only Anthropic would stop throttling me….. Well I can hope I guess.

