Any AI is is non-deterministic by nature, this is more about statistics, but not dedicated test cases. So, even if you swich to the model which is better in metric results you still cannot guarantee it will be better in your specific use case or do the same. What you can probably do is to automate your important test scenarious and run each every time you change model or system prompt
Any AI is is non-deterministic by nature, this is more about statistics, but not dedicated test cases. So, even if you swich to the model which is better in metric results you still cannot guarantee it will be better in your specific use case or do the same. What you can probably do is to automate your important test scenarious and run each every time you change model or system prompt