Built this after spending too many hours manually debugging why our AI agent kept failing on real world examples.
evalfix runs your evals, identifies what's failing, and uses a multi-agent loop to diagnose and fix your prompt — then verifies the fix doesn't break anything that was passing.
It also ships with an SDK to capture production failures and automatically pull them into your eval suite.
Built this after spending too many hours manually debugging why our AI agent kept failing on real world examples.
evalfix runs your evals, identifies what's failing, and uses a multi-agent loop to diagnose and fix your prompt — then verifies the fix doesn't break anything that was passing.
It also ships with an SDK to capture production failures and automatically pull them into your eval suite.