A point worthy of much more discussion! However, this article is oversimplifying things.
We are going down tumultuous, uncharted, river rapids. There are many models of varying sizes and types being trained on hardware large and small, with finely tuned, bespoke weighting competing alongside industrial, committee-driven inference with massive budgets.
There is a wide spectrum of sizes of models and the hardware/environments they run in. What works today in one place may not work tomorrow in another. What stopped working yesterday may start working better next month.
The only way to have enough control to be scientific about it is to run your own hardware and provision a large amount of it to R&D. For anything big, this is very expensive.
In summary: You cannot predict how role-playing style prompts influence output until you thoroughly test it against a proper 'control group' on whatever stack you're currently running.
A point worthy of much more discussion! However, this article is oversimplifying things.
We are going down tumultuous, uncharted, river rapids. There are many models of varying sizes and types being trained on hardware large and small, with finely tuned, bespoke weighting competing alongside industrial, committee-driven inference with massive budgets.
There is a wide spectrum of sizes of models and the hardware/environments they run in. What works today in one place may not work tomorrow in another. What stopped working yesterday may start working better next month.
The only way to have enough control to be scientific about it is to run your own hardware and provision a large amount of it to R&D. For anything big, this is very expensive.
In summary: You cannot predict how role-playing style prompts influence output until you thoroughly test it against a proper 'control group' on whatever stack you're currently running.
I must admit, all my prompts start with this. Have to do some testing now.
[dead]