The problem is… When (if) we pick up the phone today it’s because we want to speak to a human.
Most people, avoid phone calls if possible.
If I get a call and it’s an AI, I, like everybody else, is putting down.
If I’m picking up the phone to call a company, it’s because I can’t achieve what I want to on their website.
These AI phone calls are as or more limited than the website.
There is a use-case for voice AI - most of these demoes really miss the mark with “we’re going to replace your call center”.
If founders had any idea how much performance matters in a call center, and how hard it is to achieve, they’d focus on a use case better served by voice AI.
> Existing voice AI solutions are a pain to set up for complex use cases. They require months of prompting all edge cases before going live, and then months of monitoring and improving prompting afterwards
I wonder why! Most (or all) of customer support calls are recorded. Have you tried (or proposed) to train on that corpus on your Customers premises? You can do multiple evals in that setting - replay user calls into corpus trained ai agent vs generic ai agent and see the difference. Agents can be run on a 24x7 self-test, analysis, adjustment, and reporting loop. Continuously run that loop and compare the prompts of your ai agent vs human operators.
It's too funny. I tried the voice chat and it was the typical frustrating shit, misunderstanding words, then slowly answering to them - "das Ding" it understood as "Singen" etc. You could film a comedy with that, but a company that owns something like it - I'd never call them.
I want this as an option to handle all my personal calls
I built a skeleton of an iOS app that managed my calls such that I could choose to answer, decline or send to my chat bot
So it gets real data from all my regular calls and in my state (1 party consent) I don’t need anyone’s permission to record every call. So that data kicks off a fine tuning running that can run overnight or locally to improve my personal model
My plan was to use whisper and a local model with my voice clone and it would talk with everyone I didn’t want to eventually to the point where I don’t ever talk with any person I don’t want to
I would pay you for a local way to do that, however I’d NEVER give you that data - but I’m sure plenty of people would
Super awesome demo! The contact center market, including inbound customer support, is incredibly ripe for disruption, and I'm sure you guys will be on the forefront of that.
Kinda funny how many amazing CX companies start in Germany!
I’m the CEO & founder of Rime, so I’ve been following your progress with real interest. Feel free to reach out and I’d love to explore ways we might collaborate. Until then, wishing you tons of success on this big milestone!
The German call center market is very large, established and well-organised. Also pricing power here is often higher, because you cannot outsource call center work to outside Germany (because no one speaks German).
Weird, because it seems like the demo video is pretend data anyway ("Mr. Smith", etc). I agree, I would like to see a more fully-baked demo where you connect it to a testing CRM and a toy order api and get it to answer several customer queries using live information.
It's a huge industry, so a lot. Job is really stressful and has a lot of employee churn, so it's not really something I feel bad about. Pressing elevator buttons was a job too back then
the demo is pretty impressive ngl. knowing it's a bot though makes you want to phrase your questions a certain way, so i tried to just pretend like i was talking to an actual support person.
i always feel with these bots its like way too "polished" in the responses or how it speak. maybe that's a good thing and we are just so used to hearing someone speaking more casually be less well spoken lol. it makes it feels inauthentic, but perhaps that will change over time.
Congrats on the launch! I work in this space, and fwiw I strongly agree with the idea of A/B testing + continuous improvement. I have found that it is relatively easy to setup A/B tests, much harder for stakeholders to draw the right conclusions.
Livekit started as an infra for real-time audio/video applications. We are actually using them for WebRTC. They recently started growing into the voice AI space, but are still more of an infra solution, while we are an end-to-end platform.
What sets us apart is multi-stage conversation modeling, out-of-the-box evals, and self-improvement!
congrats! Some time ago we were giving client intake in legal a try with a voice AI product, but we never were able to get the success rate higher than really low numbers (especially with sensitive use cases like legal where people will reject the call instantly if it's a bot). Have you guys seen use cases like this? What ranges of success rates/engagement times have you seen?
Why do you think it didn't work out in legal? We currently don't focus on that domain.
In general, we currently have really high success rates with relatively constrained use cases, such as lead qualification and well scoped customer service use cases (e.g., appointment booking, travel cancellation).
In general, voice AI is hard because WYSIWYG (there is no human in the loop between what the bot is saying and what the person on the other side gets to hear). Not sure about legal, but for more complex use cases (e.g., product refunds in retail), there are many permutations in how two different customers might frame the same issue and so it might be harder to accurately instruct the AI agent in a way to guarantee high automation results (given plentitude of edge cases).
It is our belief therefore that voice AI works the best, when the bot is leading the conversation and it is always very clear what the next steps are...
I think the problem relates to the core value proposition of automating an intake department with voice AI. The best voice AI customer is in an industry in which there is a clear increase in value that comes with the ability to handle a larger mass of calls. This was not the case in the legal world, when one missed client might be a loss of millions (and many firms would live off of < 10 successful cases a year).
Therefore I think the verticals of customer service and lead pre-qualification make a lot more sense. Since you guys have the numbers, I am curious to learn more about the way you define constraints for the bot and how often calls in these verticals deviate from these constraints.
I'm also curious about your opinions/if you've seen any successful use cases where the bot has to be a bit more "creative" to either string together information given to it or make reasonable extrapolations beyond the information it has.
We see the main value prop of voice AI to be to enable higher volumes of calls in a cost-efficient manner. It is clear that there is a slight trade-off on quality, because humans will do a better job in "high-stakes" calls and where creativity is more required.
It thus makes sense why it might not work for legal, since every call there might be high stakes.
Having the bot be "creative" is actually an interesting proposition. We currently do not focus on it, since the majority of our customers want the bot to be predictable and not hallucinate.
Thanks! We provide eval templates that can be applied on specific stages or the whole conversation. Users can specify their own evals that can be as granular as they'd like. We're also working on conversation simulation feature that lets users quickly iterate on evals via simulating previous real conversations and seeing if the eval output aligns with human judgement.
P.S. Arkadiy is locked out of his HN account due to the anti-procrastination settings. HN team, can you plz help? :)
Comparable in some aspects. Their focus is dev-tooling, while we are a mid-market and enterprise solution, geared towards enabling non-technical users at those companies, by using our tools, to easily create voice AI agents for customer service, lead qualification and ops use cases
feedback is generated based on evals.
example:
eval: function foo wasn't triggered even though [...]
feedback (exaggerated):
1. change stage prompt
2. change function description
3. add extra instructions to the end of the context
metrics are easy to generalize (e.g. call transfer rate), but baseline is different for each agent, so we're interpreting only the changes, not the absolute values (in the context of self-improvement).
Noisy is ok, but it doesn't work that well when there are multiple clear speakers and not much noise. We are planning to add speaker diarization to address this.
The problem is… When (if) we pick up the phone today it’s because we want to speak to a human.
Most people, avoid phone calls if possible.
If I get a call and it’s an AI, I, like everybody else, is putting down.
If I’m picking up the phone to call a company, it’s because I can’t achieve what I want to on their website.
These AI phone calls are as or more limited than the website.
There is a use-case for voice AI - most of these demoes really miss the mark with “we’re going to replace your call center”.
If founders had any idea how much performance matters in a call center, and how hard it is to achieve, they’d focus on a use case better served by voice AI.
> Existing voice AI solutions are a pain to set up for complex use cases. They require months of prompting all edge cases before going live, and then months of monitoring and improving prompting afterwards
I wonder why! Most (or all) of customer support calls are recorded. Have you tried (or proposed) to train on that corpus on your Customers premises? You can do multiple evals in that setting - replay user calls into corpus trained ai agent vs generic ai agent and see the difference. Agents can be run on a 24x7 self-test, analysis, adjustment, and reporting loop. Continuously run that loop and compare the prompts of your ai agent vs human operators.
Edit: Grammar
This is on the roadmap. We haven’t yet been able to execute on this, since we are still laying the foundations, but this is a great idea!!
It's too funny. I tried the voice chat and it was the typical frustrating shit, misunderstanding words, then slowly answering to them - "das Ding" it understood as "Singen" etc. You could film a comedy with that, but a company that owns something like it - I'd never call them.
It seems you tried the English demo. Wondering was the website in English or German for you?
I want this as an option to handle all my personal calls
I built a skeleton of an iOS app that managed my calls such that I could choose to answer, decline or send to my chat bot
So it gets real data from all my regular calls and in my state (1 party consent) I don’t need anyone’s permission to record every call. So that data kicks off a fine tuning running that can run overnight or locally to improve my personal model
My plan was to use whisper and a local model with my voice clone and it would talk with everyone I didn’t want to eventually to the point where I don’t ever talk with any person I don’t want to
I would pay you for a local way to do that, however I’d NEVER give you that data - but I’m sure plenty of people would
Interesting and seems like a great use case. We currently focus on working with businesses though:/
Super awesome demo! The contact center market, including inbound customer support, is incredibly ripe for disruption, and I'm sure you guys will be on the forefront of that.
Kinda funny how many amazing CX companies start in Germany!
I’m the CEO & founder of Rime, so I’ve been following your progress with real interest. Feel free to reach out and I’d love to explore ways we might collaborate. Until then, wishing you tons of success on this big milestone!
The German call center market is very large, established and well-organised. Also pricing power here is often higher, because you cannot outsource call center work to outside Germany (because no one speaks German).
Your demo is nice, but why don't you show a call? That would be a lot more convincing...
Only for the data privacy reasons
Weird, because it seems like the demo video is pretend data anyway ("Mr. Smith", etc). I agree, I would like to see a more fully-baked demo where you connect it to a testing CRM and a toy order api and get it to answer several customer queries using live information.
Ah, I misunderstood the question. Let me see if we can get something up.
Very impressive! How many jobs do you estimate this could displace?
It's a huge industry, so a lot. Job is really stressful and has a lot of employee churn, so it's not really something I feel bad about. Pressing elevator buttons was a job too back then
the demo is pretty impressive ngl. knowing it's a bot though makes you want to phrase your questions a certain way, so i tried to just pretend like i was talking to an actual support person.
i always feel with these bots its like way too "polished" in the responses or how it speak. maybe that's a good thing and we are just so used to hearing someone speaking more casually be less well spoken lol. it makes it feels inauthentic, but perhaps that will change over time.
Essentially you are currently optimizing for the majority. Looking forward to how that develops over time as conversations get more personalized
Congrats on the launch! I work in this space, and fwiw I strongly agree with the idea of A/B testing + continuous improvement. I have found that it is relatively easy to setup A/B tests, much harder for stakeholders to draw the right conclusions.
Also every stakeholder might value different things: call deflection rate, CSAT, number of bookings, etc. Important to align expectations upfront
Impressive demo, just wish I didn't have to request a demo and could just sign up.
Request a demo button also does nothing other than change the text on success - not sure if it even went through...
I got the demo request:) Let me reply to you
How well does this scale? Like how many simultaneous calls can a single voice agent handle through your platform?
It is very scalable. We currently handle >100k calls per day on our platform.
How do you compare to livekit? I don't see any docs on your website.
Livekit started as an infra for real-time audio/video applications. We are actually using them for WebRTC. They recently started growing into the voice AI space, but are still more of an infra solution, while we are an end-to-end platform.
What sets us apart is multi-stage conversation modeling, out-of-the-box evals, and self-improvement!
congrats! Some time ago we were giving client intake in legal a try with a voice AI product, but we never were able to get the success rate higher than really low numbers (especially with sensitive use cases like legal where people will reject the call instantly if it's a bot). Have you guys seen use cases like this? What ranges of success rates/engagement times have you seen?
Why do you think it didn't work out in legal? We currently don't focus on that domain.
In general, we currently have really high success rates with relatively constrained use cases, such as lead qualification and well scoped customer service use cases (e.g., appointment booking, travel cancellation).
In general, voice AI is hard because WYSIWYG (there is no human in the loop between what the bot is saying and what the person on the other side gets to hear). Not sure about legal, but for more complex use cases (e.g., product refunds in retail), there are many permutations in how two different customers might frame the same issue and so it might be harder to accurately instruct the AI agent in a way to guarantee high automation results (given plentitude of edge cases).
It is our belief therefore that voice AI works the best, when the bot is leading the conversation and it is always very clear what the next steps are...
I think the problem relates to the core value proposition of automating an intake department with voice AI. The best voice AI customer is in an industry in which there is a clear increase in value that comes with the ability to handle a larger mass of calls. This was not the case in the legal world, when one missed client might be a loss of millions (and many firms would live off of < 10 successful cases a year).
Therefore I think the verticals of customer service and lead pre-qualification make a lot more sense. Since you guys have the numbers, I am curious to learn more about the way you define constraints for the bot and how often calls in these verticals deviate from these constraints.
I'm also curious about your opinions/if you've seen any successful use cases where the bot has to be a bit more "creative" to either string together information given to it or make reasonable extrapolations beyond the information it has.
We see the main value prop of voice AI to be to enable higher volumes of calls in a cost-efficient manner. It is clear that there is a slight trade-off on quality, because humans will do a better job in "high-stakes" calls and where creativity is more required.
It thus makes sense why it might not work for legal, since every call there might be high stakes.
Having the bot be "creative" is actually an interesting proposition. We currently do not focus on it, since the majority of our customers want the bot to be predictable and not hallucinate.
congrats on launching! how are ya'll managing evals?
Thanks! We provide eval templates that can be applied on specific stages or the whole conversation. Users can specify their own evals that can be as granular as they'd like. We're also working on conversation simulation feature that lets users quickly iterate on evals via simulating previous real conversations and seeing if the eval output aligns with human judgement.
P.S. Arkadiy is locked out of his HN account due to the anti-procrastination settings. HN team, can you plz help? :)
Is this comparable to VAPI?
Comparable in some aspects. Their focus is dev-tooling, while we are a mid-market and enterprise solution, geared towards enabling non-technical users at those companies, by using our tools, to easily create voice AI agents for customer service, lead qualification and ops use cases
what does the feedback loop look like to your agents - wonder how hard it will be to generalize metrics across these agents!
feedback is generated based on evals. example: eval: function foo wasn't triggered even though [...]
feedback (exaggerated): 1. change stage prompt 2. change function description 3. add extra instructions to the end of the context
metrics are easy to generalize (e.g. call transfer rate), but baseline is different for each agent, so we're interpreting only the changes, not the absolute values (in the context of self-improvement).
What framework did you use for flow building?
We are using Vercel
Have you tried your solution in noisy environments? Like a call to a person in a restaurant.
Noisy is ok, but it doesn't work that well when there are multiple clear speakers and not much noise. We are planning to add speaker diarization to address this.