Replacing direct input interfaces with LLM chatbots is not unlike “upgrading” from a modern videogame (be that Call of Duty, Disco Elysium or Dwarf Fortress) to a retro text-based adventure. And if you have a direct input interface, why do we need an extra expensive layer of non-determinism?
I think text interface sucks, but at the same time I like how Claude code solve that with questionnaires, I think that’s the most elegant solution to get a lot of valuable context from users in a fast way
You can still have “chat interface” but if you use it for specialized applications you can do better than that.
If I can do some actions with a press of a button that runs code or even some LLM interaction without me having to type that’s so much better.
Feedback interface with plain text is awful, would be much better if there is anything that I have to repeat or fix on my end standing out - or any problem that LLM is looping over quickly discoverable.
Unless I am wildly misreading this, this is actually worse that both GUIs and LLMs combined.
LLMs offer a level of flexibility and non-determinism that allow them to adapt to different situations.
GUIs offer precision and predictability - they are the same every time. Which means people can learn them and navigate them quickly. If you've ever seen a bank teller or rental car agent navigate a GUI or TUI they tab through and type so quickly because they have expert familliarity.
But this - with a non-determinstic user interface generated by AI, every time a user engages with a UI its different. So they a more rigid UI but also a non-deterministic set of options every time. Which means instead of memorising what is in every drop down and tabbing through quickly, they need to re-learn the interface every time.
I don't think you have to use this if it's not working in your case. I think the idea is to try to anticipate the next few turns of the conversation, so you can pick the tree you want to go down in a fast way. If the prediction is accurate, I could see that being effective.
It’s intended for conversations that are probably different every time too. It’s like a more expressive form of what Claude Code already does with the “AskUserQuestion” interface.
I get that you want to save the world by reducing processing, and I agree that using an LLM to develop deterministic and efficient code is just a better idea overall, but “stop using natural language interfaces” is overly restrictive.
Interactive fiction / text-adventures written in the 20th century used a deterministic natural language interface with low load as an intentional flexible puzzle to solve, so the problem today is efficiency.
You could just as well argue to stop using modern bloated operating systems, websites, and apps. I understand that the processing required for LLMs can be much higher. But the side-effect of additional power needs will be a global push for more energy, which will result in more power stations being available for future industries once LLMs become more efficient.
If you want to reduce complexity overall and have simple, flexible interfaces and applications that use fewer of the worlds resources, I’m all for it. But don’t single out LLMs assuming they will always be less efficient. Cost will drive them to be more efficient over time.
>As a clear obvious example: interactive fiction / text-adventures use a deterministic natural language interface with low load as an intentional flexible puzzle to solve.
Even though games can technically do this, should they? Do consumers actually find it fun and engaging? Considering there has never been a AAA game of that genre I don't think there is true consumer demand for games with such an interface.
If you think the Infocom games were like Zork I-III you don't understand how the ZMachine itself was improved over the years upon creating masterpieces such as Trinity or A Mind Forever Voyaging.
Then Curses!/Jigsaw are something else, and Anchorhead/Spider and Web/Inside Woman/All Things Devour are the king of games with thematics you won't see in 3D AAA games in decades.
And over the years the parser from Zork was so improved that could do chained phrases in English in the 90's on a 16 bit machine with the Z5 version of the Z-Machine with games designed for it. For Z8 machine games, the size of the games was even higher with far more objects and interactivity for puzzles thanks to Inform6 and Inform6lib depending on the build target.
I played Spanish IF games too from the ZX, but emulated, with PAWS (the adv system) adapted into Spanish. As the English grammar it's simpler than the Spanish one and the words are shorter, you could put tons of in game content and potential actions and effects; that's why The Hobbit shines. Altough further Spanish games were much better, such as Aventura Espacial.
Pulling up games from decades ago instead of the last few years isn't a good benchmark either especially considering the technical limitations that existed at that time.
I'm the last person to propose more LLM usage, but there is a reason DnD has exploded in popularity in recent years despite fancier games and graphics existing, and it's not because people find text/story telling restrictive on an immersion or technical level. If a Zork was released today with a hypothetical adaptive parser with world coherent output (big ifs) I think it'd be a huge hit personally. Though to be clear, I'm not saying someone could build it on an LLM.
As long as people still enjoy books I believe they will still want to interact with it if possible.
That's just their nature: they are very inexpensive to make. The original question was whether people find them fun and engaging. Clearly they did in the past. Though nowadays their standards have risen a lot. Even graphical adventure games (like Monkey Island) have long fallen out of favor due to a lack of action elements.
Are there any successful examples of LLM text adventures? Last time I heard someone here said it's hard to develop robust puzzles and interactions, because it's hard to control and predict what the LLM will do in a dialogue setting. E.g. the user can submit reasonable but unintended solutions to a puzzle, which breaks the game.
UI’s also reduce human comprehension times. Give me a well-crafted UI and I can quickly scan it and comprehend the logic. Reading a long blob of text is less efficient and probably more error-prone. I like this approach.
Love this, this is what I have been envisioning as a LLM first OS! Feels like truly organic computing. Maybe Minority Report figured it out way back then.
The idea of having the elements anticipated and lowering the cognitive load of searching a giant drop down list scratches a good place in my brain. Instantly recognize it as such a better experience than what we have on the web.
I think something like this is the long term future for personal computing, maybe I'm way off, but this the type of computing I want to be doing, highly customized to my exact flow, highly malleable to improvement and feedback.
Human abstract language, particularly the English language, is a pretty low-fidelity way to represent reality and in countless instances it can fail to represent the system to any useful or actionable degree.
Interfaces are hard, abstraction is hard. Computer science has been working on making these concerns easier to reason about, and the industry has put a lot of time and effort into building heuristics (software / dev mgmt / etc frameworks) to make achieving an appropriate abstraction (qua ontology) feasible to implement without a philosophy degree. We, like biological systems, have settled on certain useful abstraction layers (OOP, microservice arch, TDD, etc.) that have broad appeal for balancing ease of use with productivity.
So it should be with any generative system, particularly any that are tasked with being productive toward tangible goals. Often the right interface with the problem domain is not natural language. Constraining the "information channels" (concepts/entities and the related semantics, in the language of ontology) to the best of your ability to align with the inherent degrees of freedom, disambiguated as best as possible into orthogonal dimensions (leaning too hard on the geometric analogy now). For generating code, that means interacting with tokens on ASTs, not 1D sequences of tokens. For comprehending 3D scenes, a crude text translation from an inherently 2D viewpoint will not have physics, even folk physics, much in mind except by what it can infer from the dataset. For storing, recalling, and reciting facts per se, the architecture shall not permit generating text from nonverifiable sources of information such as those vector clouds we find between the layers of any NN.
These considerations early in the project massively reduce the resource requirements for training at the expense of SME time and wages to build a system that constrains where there are constraints and learns where there are variables.
I think one of the issues I find with text based interfaces, which is not often discussed, is they are not good at expressing what they can and can't do.
Their very strength, of not being limited, is also a weakness - you only find the boundaries of what's possible by trial and error.
This isn't inherent, just a side effect of poorly designed text UI. Suggestions on the input, manual commands, or honest answers in response to the question "what can you do" all do as good a job as a GUI does, and sometimes a better job.
So many of the complaints I hear about TUIs just come down to bad design. Even one input and textual responses require thoughtful design.
That's design as in function, not color palette. Although... that too.
OK - so in the case of text interface to a constrained tool, you are effectively mapping free text down to some underlying set of function calls and parameters, and you could ask the tools to describe those.
For more general AI tools, I guess it becomes harder to give a succinct description - and so that's still a bit trial and error ( even if you have good feedback ).
The term you're looking for is discoverability, and in my experience it's the most discussed concept when it comes to critiques of text based user interfaces.
Thinking about it - for traditional text based interfaces like a unix shell, perhaps I'd argue that with stackoverflow and google search they became more discoverable than GUI's.
And perhaps even more with LLMs.
ie it's easier to find out how to do X in bash and cut and paste the solution than watch a video on which series of things to click.
Not sure how that extends to specific chat interfaces - can you ask the general models how best to use specific chat from ends over specific tools?
With a full screen browser on a 14 inch laptop, the content takes up less than half the width of the browser window. The screenshots are slightly narrower. As a result I can barely make out the text in the dialogue box screenshots. Is it really that hard to format content well?
Anyway, interesting tool and nice that it is implemented in Rust. Where is the prompt that tells the agent when to call the popup tool?
Veering offtopic a bit... Google lost its (search) way years ago. See the "The Man who Killed Google Search" [1], and the room they left for alternatives like DuckDuckGo.
At work, we have full access to Claude, and I find that I now use that instead of doing a search. Sure it's not 100% reliable, but neither is search anyhow, and at least I save time from sifting through a dozen crappy content farms.
The same, I suppose, as using Wikipedia to get an overview of a topic, a surface understanding, before following the citations to dig deeper and fully validate the summary.
The latency argument is terrible. Of course frontier LLMs are slow and costly. But you don't need Claude to drive a natural language interface, and an LLM with less than 5B parameters (or even <1B) is going it be much faster than this.
Nah natural language interfaces are great. What shit is most implementations.
Natural language MUST be mixed with traditional UIs. Our world is filled with new software, new features, new concepts every day even for a regular person and certainly much more for developers than almost anyone else.
The thing I find most helpful with this sort of thing is "where the fuck is that settings" and "how do I get it to/I want to do x" navigating complex UX that is so feature filled that even the very best UX designers just can't hack it.
I feel like in many of these cases sure, let me use the regular UI. But also being able to ask "Hey, can I set my background to an image, where do I do that?" and being presented with the dedicated UI, or behind the scenes tool calls if no UI available.
Anecdotally: things I use ALL the time are, Help->Search on MacOS toolbar, cmd+shift+P menu in VSC, the search in Android settings, etc.
Ubuntu's Unity had that. IDK about Gnome, but users are saying that the search options for it are a joke. With Ubuntu's Dash you could search even in menu items from a running application.
I wonder if anyone can brink Unity back to Trisquel...
EDIT: not Dash, but HUD.
I'm a CWM (calm window manager) guy, but the Dash concept is not that far to my usage in CWM:
win key+a = launch software with autocomplete
win key+s = search between the open windows
And so on, but searching in the menus (and maybe semantically with sinonyms) it's superior to anything else, and no LLM it's required.
Replacing direct input interfaces with LLM chatbots is not unlike “upgrading” from a modern videogame (be that Call of Duty, Disco Elysium or Dwarf Fortress) to a retro text-based adventure. And if you have a direct input interface, why do we need an extra expensive layer of non-determinism?
I think text interface sucks, but at the same time I like how Claude code solve that with questionnaires, I think that’s the most elegant solution to get a lot of valuable context from users in a fast way
You can still have “chat interface” but if you use it for specialized applications you can do better than that.
If I can do some actions with a press of a button that runs code or even some LLM interaction without me having to type that’s so much better.
Feedback interface with plain text is awful, would be much better if there is anything that I have to repeat or fix on my end standing out - or any problem that LLM is looping over quickly discoverable.
Unless I am wildly misreading this, this is actually worse that both GUIs and LLMs combined.
LLMs offer a level of flexibility and non-determinism that allow them to adapt to different situations.
GUIs offer precision and predictability - they are the same every time. Which means people can learn them and navigate them quickly. If you've ever seen a bank teller or rental car agent navigate a GUI or TUI they tab through and type so quickly because they have expert familliarity.
But this - with a non-determinstic user interface generated by AI, every time a user engages with a UI its different. So they a more rigid UI but also a non-deterministic set of options every time. Which means instead of memorising what is in every drop down and tabbing through quickly, they need to re-learn the interface every time.
I don't think you have to use this if it's not working in your case. I think the idea is to try to anticipate the next few turns of the conversation, so you can pick the tree you want to go down in a fast way. If the prediction is accurate, I could see that being effective.
It’s intended for conversations that are probably different every time too. It’s like a more expressive form of what Claude Code already does with the “AskUserQuestion” interface.
> GUIs offer precision and predictability - they are the same every time.
Except after an update everything is in a different place.
Yep - I'm looking at you MS office ribbon. Just as I learnt where things are some update decides to move stuff around.
The people responsible for stuff like this should be put in stocks in public squares and pelted with tomatoes ;-)
I get that you want to save the world by reducing processing, and I agree that using an LLM to develop deterministic and efficient code is just a better idea overall, but “stop using natural language interfaces” is overly restrictive.
Interactive fiction / text-adventures written in the 20th century used a deterministic natural language interface with low load as an intentional flexible puzzle to solve, so the problem today is efficiency.
You could just as well argue to stop using modern bloated operating systems, websites, and apps. I understand that the processing required for LLMs can be much higher. But the side-effect of additional power needs will be a global push for more energy, which will result in more power stations being available for future industries once LLMs become more efficient.
If you want to reduce complexity overall and have simple, flexible interfaces and applications that use fewer of the worlds resources, I’m all for it. But don’t single out LLMs assuming they will always be less efficient. Cost will drive them to be more efficient over time.
>As a clear obvious example: interactive fiction / text-adventures use a deterministic natural language interface with low load as an intentional flexible puzzle to solve.
Even though games can technically do this, should they? Do consumers actually find it fun and engaging? Considering there has never been a AAA game of that genre I don't think there is true consumer demand for games with such an interface.
> never been a AAA game
Infocom sold 450k copies of Zork I and 250k copies of The Hitchhiker's Guide among their many other titles.
Beam Software sold over 1M copies of The Hobbit.
Sierra On-Line sold ~400k copies of King’s Quest VI in a week.
Indeed.
"Thorin sits down and starts singing about gold" or "You are in a maze of twisty little passages, all alike"
became early memes as a result.
However those memes also come from player frustration of being stuck in repeated patterns. The same can also happen with chat interfaces to LLMs.
However I'm not sure whether that's a function of the chat interface or the nature of LLMs.
If you think the Infocom games were like Zork I-III you don't understand how the ZMachine itself was improved over the years upon creating masterpieces such as Trinity or A Mind Forever Voyaging.
Then Curses!/Jigsaw are something else, and Anchorhead/Spider and Web/Inside Woman/All Things Devour are the king of games with thematics you won't see in 3D AAA games in decades.
And over the years the parser from Zork was so improved that could do chained phrases in English in the 90's on a 16 bit machine with the Z5 version of the Z-Machine with games designed for it. For Z8 machine games, the size of the games was even higher with far more objects and interactivity for puzzles thanks to Inform6 and Inform6lib depending on the build target.
Not knocking them - I played the Hobbit on a machine with 48K memory and the game included graphics! It was a marvel of it's day.
Just musing that some of the frustration I found playing them is reminiscent of trying to wrangle many billions of parameter models today.
I played Spanish IF games too from the ZX, but emulated, with PAWS (the adv system) adapted into Spanish. As the English grammar it's simpler than the Spanish one and the words are shorter, you could put tons of in game content and potential actions and effects; that's why The Hobbit shines. Altough further Spanish games were much better, such as Aventura Espacial.
Those games did not have a high enough budget to be considered AAA.
And did the concept of "AAA" truly exist back then? Let alone "AAAA"? It's not really a good benchmark, imo.
Pulling up games from decades ago instead of the last few years isn't a good benchmark either especially considering the technical limitations that existed at that time.
I'm the last person to propose more LLM usage, but there is a reason DnD has exploded in popularity in recent years despite fancier games and graphics existing, and it's not because people find text/story telling restrictive on an immersion or technical level. If a Zork was released today with a hypothetical adaptive parser with world coherent output (big ifs) I think it'd be a huge hit personally. Though to be clear, I'm not saying someone could build it on an LLM.
As long as people still enjoy books I believe they will still want to interact with it if possible.
All of this subthread comes from an attempt to refute the statement, "Considering there has never been a AAA game of that genre...".
Never is a long time. However, now we're arguing the counter-examples aren't "AAA games".
>Never is a long time.
It is bounded by the time AAA games became financial viable to create.
False. Tons of games in the 80's and 90's where stupidly cheap to create and still AAA games on their own.
That's just their nature: they are very inexpensive to make. The original question was whether people find them fun and engaging. Clearly they did in the past. Though nowadays their standards have risen a lot. Even graphical adventure games (like Monkey Island) have long fallen out of favor due to a lack of action elements.
>never an AAA game
From the non-Infocom titles:
- Curses!
- Jigsaw
- Anchorhead
- Slouching towards Bedlam
- Spider and Web
and literally dozens more of outstanding quality.
From Infocom, most titles will qualify.
Are there any successful examples of LLM text adventures? Last time I heard someone here said it's hard to develop robust puzzles and interactions, because it's hard to control and predict what the LLM will do in a dialogue setting. E.g. the user can submit reasonable but unintended solutions to a puzzle, which breaks the game.
"Cost will drive them to be more efficient over time."
Why are you certain of this? It's just a database. Does this hold for e.g. Postgres?
stop using natural language interfaces is just a hyperbole.
Well, maybe an overreaching generalism. Not really hyperbole.
UI’s also reduce human comprehension times. Give me a well-crafted UI and I can quickly scan it and comprehend the logic. Reading a long blob of text is less efficient and probably more error-prone. I like this approach.
Love this, this is what I have been envisioning as a LLM first OS! Feels like truly organic computing. Maybe Minority Report figured it out way back then.
The idea of having the elements anticipated and lowering the cognitive load of searching a giant drop down list scratches a good place in my brain. Instantly recognize it as such a better experience than what we have on the web.
I think something like this is the long term future for personal computing, maybe I'm way off, but this the type of computing I want to be doing, highly customized to my exact flow, highly malleable to improvement and feedback.
The post suggests how to optimize the LLM text with UI elements that reduce the usage of pure/direct prompts.
And that’s perfectly fine.
Though the title in that sense is more of a click-bait.
Human abstract language, particularly the English language, is a pretty low-fidelity way to represent reality and in countless instances it can fail to represent the system to any useful or actionable degree.
Interfaces are hard, abstraction is hard. Computer science has been working on making these concerns easier to reason about, and the industry has put a lot of time and effort into building heuristics (software / dev mgmt / etc frameworks) to make achieving an appropriate abstraction (qua ontology) feasible to implement without a philosophy degree. We, like biological systems, have settled on certain useful abstraction layers (OOP, microservice arch, TDD, etc.) that have broad appeal for balancing ease of use with productivity.
So it should be with any generative system, particularly any that are tasked with being productive toward tangible goals. Often the right interface with the problem domain is not natural language. Constraining the "information channels" (concepts/entities and the related semantics, in the language of ontology) to the best of your ability to align with the inherent degrees of freedom, disambiguated as best as possible into orthogonal dimensions (leaning too hard on the geometric analogy now). For generating code, that means interacting with tokens on ASTs, not 1D sequences of tokens. For comprehending 3D scenes, a crude text translation from an inherently 2D viewpoint will not have physics, even folk physics, much in mind except by what it can infer from the dataset. For storing, recalling, and reciting facts per se, the architecture shall not permit generating text from nonverifiable sources of information such as those vector clouds we find between the layers of any NN.
These considerations early in the project massively reduce the resource requirements for training at the expense of SME time and wages to build a system that constrains where there are constraints and learns where there are variables.
I think one of the issues I find with text based interfaces, which is not often discussed, is they are not good at expressing what they can and can't do.
Their very strength, of not being limited, is also a weakness - you only find the boundaries of what's possible by trial and error.
This isn't inherent, just a side effect of poorly designed text UI. Suggestions on the input, manual commands, or honest answers in response to the question "what can you do" all do as good a job as a GUI does, and sometimes a better job.
So many of the complaints I hear about TUIs just come down to bad design. Even one input and textual responses require thoughtful design.
That's design as in function, not color palette. Although... that too.
OK - so in the case of text interface to a constrained tool, you are effectively mapping free text down to some underlying set of function calls and parameters, and you could ask the tools to describe those.
For more general AI tools, I guess it becomes harder to give a succinct description - and so that's still a bit trial and error ( even if you have good feedback ).
The term you're looking for is discoverability, and in my experience it's the most discussed concept when it comes to critiques of text based user interfaces.
Thinking about it - for traditional text based interfaces like a unix shell, perhaps I'd argue that with stackoverflow and google search they became more discoverable than GUI's.
And perhaps even more with LLMs.
ie it's easier to find out how to do X in bash and cut and paste the solution than watch a video on which series of things to click.
Not sure how that extends to specific chat interfaces - can you ask the general models how best to use specific chat from ends over specific tools?
Of course not. Users love the chatbot. It's fast and easier to use than manually searching for answers or sticking together reports and graphs.
There is no latency, because the inference is done locally. On a server at the customer with a big GPU
> There is no latency
Every chat bot I was ever forced to use has built-in latency, together with animated … to simulate a real user typing. It’s the worst of all worlds.
> to simulate a real user typing
The models return a realtime stream of tokens.
This was already the case before LLMs became a thing. This is still the case for no-intelligence step by step bots.
Because they are all using some cloud service and external LLM for that. We not.
We sell our users a strong server, where he has all his data and all his services. The LLM is local, and trained by us.
With a full screen browser on a 14 inch laptop, the content takes up less than half the width of the browser window. The screenshots are slightly narrower. As a result I can barely make out the text in the dialogue box screenshots. Is it really that hard to format content well?
Anyway, interesting tool and nice that it is implemented in Rust. Where is the prompt that tells the agent when to call the popup tool?
And on mobile we can't even zoom. How it's rendered in the browser isn't important, we're supposed to run the content through our LLM.
/s
My boss used to say: "there is an easy way and there is the cool way".
We no longer have StackOverflow. We no longer have Google, effectively.
I used to be able to copy pasta code with incredible speed - now all of that is gone.
Chatbots is all we have. And they are not that bad at search, with no sponsored results to weed through. For now.
> We no longer have Google, effectively
Veering offtopic a bit... Google lost its (search) way years ago. See the "The Man who Killed Google Search" [1], and the room they left for alternatives like DuckDuckGo.
At work, we have full access to Claude, and I find that I now use that instead of doing a search. Sure it's not 100% reliable, but neither is search anyhow, and at least I save time from sifting through a dozen crappy content farms.
[1] https://www.wheresyoured.at/the-men-who-killed-google/
When you use a search engine, you can evalute the "trustness" of the source (webpage), this essentially disappears when using a chatbot
Only if you're dumb about it. Asking for source links is one thing I do all the time, and chatgpt gives citations by default.
What's the benefit of using a chatbot if you still have to go and read all of it's sources on your own?
The same, I suppose, as using Wikipedia to get an overview of a topic, a surface understanding, before following the citations to dig deeper and fully validate the summary.
You can get precise citations supporting the facts of interest to you, so that you don't have to dig through the sources on your own.
At least for the time being the chatbot isn't optimised to deliver as many ads or sponsored results to your eyeballs as physically possible.
> I used to be able to copy pasta code with incredible speed - now all of that is gone.
What do you mean by that?
Meaning I would get an answer from SO pretty quickly and it would most often work.
What is SO?
SO is the website Stack Overflow.
The latency argument is terrible. Of course frontier LLMs are slow and costly. But you don't need Claude to drive a natural language interface, and an LLM with less than 5B parameters (or even <1B) is going it be much faster than this.
And it's highly circumstancial, as LLM efficiency keeps improving as the tech matures.
Is this a bad bait or is it a bad post? I can't decide.
Conversational UI + MCP + deterministic widget GUI = ChatGPT apps. These will become more prevalent.
And useless over time because of the lack of both reproducility in output and existence of human curated content.
This is something I agree with.Will be interesting to see if more and more people take this philosophy up.
Let's go further. Why not have a well specified prompt programming language for LLMs then?
Nah natural language interfaces are great. What shit is most implementations.
Natural language MUST be mixed with traditional UIs. Our world is filled with new software, new features, new concepts every day even for a regular person and certainly much more for developers than almost anyone else.
The thing I find most helpful with this sort of thing is "where the fuck is that settings" and "how do I get it to/I want to do x" navigating complex UX that is so feature filled that even the very best UX designers just can't hack it.
I feel like in many of these cases sure, let me use the regular UI. But also being able to ask "Hey, can I set my background to an image, where do I do that?" and being presented with the dedicated UI, or behind the scenes tool calls if no UI available.
Anecdotally: things I use ALL the time are, Help->Search on MacOS toolbar, cmd+shift+P menu in VSC, the search in Android settings, etc.
Ubuntu's Unity had that. IDK about Gnome, but users are saying that the search options for it are a joke. With Ubuntu's Dash you could search even in menu items from a running application.
I wonder if anyone can brink Unity back to Trisquel...
EDIT: not Dash, but HUD.
I'm a CWM (calm window manager) guy, but the Dash concept is not that far to my usage in CWM:
win key+a = launch software with autocomplete win key+s = search between the open windows
And so on, but searching in the menus (and maybe semantically with sinonyms) it's superior to anything else, and no LLM it's required.
> just because we suddenly can doesn't mean we always should
Author should take his own advice.
Yeah … no. It’s really nice interface. It’s here to stay.