$8/100k tokens strikes me as potentially a TON if the idea is that we're going to be running this as part of the iterative local development cycle (or god forbid letting agents run it whenever they decide). As you mentioned, one of the issues with AI generated code is often that it writes too much and needs direction on shrinking down.
I could easily see hitting 10k+ LOC on routine tickets if this is being run on each checkpoint. I have some tickets that require moving some files around, am I being charged on LOC for those files? Deleted files? Newly created test files that have 1k+ lines?
It's $8/100K lines of code. Since we're using a mix of models across our main agent and sub-agents, this normalizes our cost.
> I could easily see hitting 10k+ LOC on routine tickets if this is being run on each checkpoint. I have some tickets that require moving some files around, am I being charged on LOC for those files? Deleted files? Newly created test files that have 1k+ lines?
We basically look at the files changed that need to be reviewed + the additional context that is required to make a decision for the review (which is cached internally, so you'd not be double-charged).
That said, we're of course open to revising the pricing based on feedback. But if it's helpful, when we ran the benchmarks on 165 pull requests [1], the cost was as follows:
- Autofix Bot: $21.24
- Claude Code: $48.86
- Cursor Bugbot: $40/mo (with a limit of 200 PRs per month)
We have several optimization ideas in mind, and we expect pricing to become more affordable in the future.
We haven't included Gemini Code Assist or Gemini CLI's code review mode in our benchmarks[1] (we should do that), but functionally, it'll do the same thing as any other AI reviewer. Our differentiator is that since we're using static analysis for grounding, you'll see more issues with lower false positives.
We also do secrets detection out of the box, and OSS scanning is coming soon.
What is the difference between this and let's say Claude Code using something like semgrep as a tool?
Also I don't think this tool should be in the developer flow as in my experience it is unlikely to run it on the regular. It should be something that is done as part of the QA process before PR acceptance.
On the OpenSSF CVE Benchmark[1], Semgrep CE hits 56.97% accuracy vs our 81.21%, and nearly 3x higher recall (75.61% vs 26.83%).
On when to run it, fair point. Autofix Bot is currently meant for local use (TUI, Claude Code plugin, MCP). We're integrating this pipeline into DeepSource[2], which will have inline comments in pull requests, that fits the QA/pre-merge flow you're describing.
That said, if you're using AI agents to write code, running it at checkpoints locally keeps feedback tight.
$8/100k tokens strikes me as potentially a TON if the idea is that we're going to be running this as part of the iterative local development cycle (or god forbid letting agents run it whenever they decide). As you mentioned, one of the issues with AI generated code is often that it writes too much and needs direction on shrinking down.
I could easily see hitting 10k+ LOC on routine tickets if this is being run on each checkpoint. I have some tickets that require moving some files around, am I being charged on LOC for those files? Deleted files? Newly created test files that have 1k+ lines?
> $8/100k tokens strikes me as potentially a TON
It's $8/100K lines of code. Since we're using a mix of models across our main agent and sub-agents, this normalizes our cost.
> I could easily see hitting 10k+ LOC on routine tickets if this is being run on each checkpoint. I have some tickets that require moving some files around, am I being charged on LOC for those files? Deleted files? Newly created test files that have 1k+ lines?
We basically look at the files changed that need to be reviewed + the additional context that is required to make a decision for the review (which is cached internally, so you'd not be double-charged).
That said, we're of course open to revising the pricing based on feedback. But if it's helpful, when we ran the benchmarks on 165 pull requests [1], the cost was as follows:
- Autofix Bot: $21.24 - Claude Code: $48.86 - Cursor Bugbot: $40/mo (with a limit of 200 PRs per month)
We have several optimization ideas in mind, and we expect pricing to become more affordable in the future.
[1] https://github.com/ossf-cve-benchmark/ossf-cve-benchmark
Ah sorry, you were very clear on the pricing page and I meant 100k LoC, not tokens.
In your explanation here, you mention running it per PR - does this mean running it once? Several times?
Congratulations!! Anchoring is important. What about other parts of the code review like coding guidelines, perf issues etc?
We flag performance issues today alongside security and code quality. We're working on respecting AGENTS.md, detecting code complexity (AI generated code tends toward verbose, tangled logic), and letting users/teams define custom coding guidelines.
How does this compare to gemini-code-assist? Rn its one of the best imo
We haven't included Gemini Code Assist or Gemini CLI's code review mode in our benchmarks[1] (we should do that), but functionally, it'll do the same thing as any other AI reviewer. Our differentiator is that since we're using static analysis for grounding, you'll see more issues with lower false positives.
We also do secrets detection out of the box, and OSS scanning is coming soon.
[1] https://autofix.bot/benchmarks/
What is the difference between this and let's say Claude Code using something like semgrep as a tool?
Also I don't think this tool should be in the developer flow as in my experience it is unlikely to run it on the regular. It should be something that is done as part of the QA process before PR acceptance.
I hope this helps and good luck.
On the OpenSSF CVE Benchmark[1], Semgrep CE hits 56.97% accuracy vs our 81.21%, and nearly 3x higher recall (75.61% vs 26.83%).
On when to run it, fair point. Autofix Bot is currently meant for local use (TUI, Claude Code plugin, MCP). We're integrating this pipeline into DeepSource[2], which will have inline comments in pull requests, that fits the QA/pre-merge flow you're describing.
That said, if you're using AI agents to write code, running it at checkpoints locally keeps feedback tight.
Thanks for the feedback!
[1] https://github.com/ossf-cve-benchmark/ossf-cve-benchmark
[2] https://deepsource.com/
"shifted bottleneck to code review"... understatement of decade.