Astro/Solid - Hacker News

$mathverse 2 hours ago

Even being in Tauri this application just by doing these things takes around 120MB on my M3 Max. It's truly astonishing how modern desktop apps are essentially doing nothing and yet consume so much resources.

- it sets icon on the menubar - it display a window where I can choose which model to use

That's it. 120MB FOR doing nothing.

[-]

$3oil3 an hour ago

I feel the same astonishment! Our computers surely are today faster and stronger and smaller than yesterdays', but did this really translate in something tangible for a user? I feel that besides boot-up, thanks to SSDs rather than gigaHertz, it's not any faster. It's like, all this extra power is used to the maximum, for good and bad reasons, but not focused on making 'it' faster. I get a bit puzzled to why my mac could freeze half a second when I 'cmd+a' in some 1000+ files-full folder.

Why doesn't Excel appear instantly, and why is it 2.29GB now when Excel 98 for Mac was.. 154.31MB? Why is a LAN transfer between two computers still as slow as 1999, 10ishMB/s, when both can simultaneously download at > 100MB/s? I'm not starting with GB-memory-hoarding tabs, when you think about it, it's managed well as a whole, holding 700+ tabs without complaining.

And what about logs? This is a new branch of philosophy, open Console and witness the era of hyperreal siloxal, where computational potential expands asymptotically while user experience flatlines into philosophical absurdity?

$daakus 7 hours ago

Shameless plug: A brutally minimalist Linux only, whisper.cpp only app: https://github.com/daaku/whispy

I wanted speech-to-text in arbitrary applications on my Linux laptop, and I realized that loading the model was one of the slowest parts. So a daemon process, which triggers recording on/off using SIGUSR2, records using `pw-record` and passes the data to a loaded whisper model, which finally types the text using `ydotool` turned out to be a relatively simple application to build. ~200 lines in Go, or ~150 in Rust (check history for Rust version).

[-]

$DoctorOW 34 minutes ago

Why Linux only? Isn't Go and Whisper.cpp cross platform?

$efskap 7 hours ago

I'm very curious about the rewrite. Was Rust slowing you down too much?

[-]

$daakus 6 hours ago

Just for fun. I like both languages. I thought Rust would be better fit on account of interop with whisper.cpp, but turns out the use of cgo was straight forward in this case. I like that the Go version has minimal 3rd party dependencies compared to the Rust version.

$b_e_n_t_o_n 13 hours ago

Why does the title specify the language used when it's not even mentioned on the home page?

[-]

$Leftium 8 hours ago

I just copied the title verbatim from the original Show HN: https://hw.leftium.com/#/item/44302416

In case you also have a problem with not using the original HN link: https://news.ycombinator.com/item?id=44302416

(I think the first link is easier to read (CSS/formatting/dark mode), slightly more compact, and contains a link to the original HN post. It's also simple to recreate the HN link manually by inspecting the ID.)

$01HNNWZ0MV43FF 8 hours ago

If it's Rust or Go it means I won't have to fuss with a runtime like Python or JS, nor a C++ build system

[-]

$mathverse 2 hours ago

You dont have that using Electron app as well. The runtime is bundled with the binary.

$nicce 12 hours ago

Marketing. Honestly, might not be good here since it is not library and not completely written in Rust.

[-]

$ajsnigrutin 12 hours ago

Marketing for what exactly?

I mean... why would I want this app instead of some other app? Just because it's written in the language of the week? If it said "20% faster than xyz" it would be a much better marketing than saying it's written in rust, even though more than half the code is typescript.

[-]

$leoedin 5 hours ago

I think there are tangible benefits to this being “not Java or JavaScript”. Or any language that brings a resource intensive runtime with it.

[-]

$nicce 2 hours ago

More than half is TypeScript to be fair.

$ktosobcy 3 hours ago

I'm not sure if it's purely down to "hype".

For me I do tend to prefer apps written in rust/go(/c/etc-copiled) as they are usually less problematic to install (quie often single binary; less headache compared to python stuff for example) and most of the time less resource hungry (anything JS/electron based)... in the end "convenient shortcut to convey aforementioned benefits" :)

$quicklime 8 hours ago

The title also mentions that it’s open source, so it could be marketing for potential contributors.

$tiberriver256 12 hours ago

It's targeting a very specific group of devs who like to follow trendy stuff..

To that group saying something is "made in rust" is equivalent to saying "it's modern, fast, secure, and made by an expert programmer not some plebe who can't keep up with the times"

[-]

$hamandcheese 10 hours ago

> and made by an expert programmer

Quite the opposite. You have to be more of an expert programmer to achieve those same goals in C. Rust lowers the skill bar.

Anyways, I agree that the editorialization here is silly.

But also, I am unashamed that "in Rust" does increase my interest in a piece of software, for several of the reasons you mentioned.

$Leftium 17 hours ago

Read the creator's description in the original Show HN: https://hw.leftium.com/#/item/44302416

[-]

$tempodox 6 hours ago

I love it.

How do you clear the history of recordings?

[-]

$Leftium 6 hours ago

I don't think it's possible (yet), but only the last five recordings are stored.

$primaprashant 7 hours ago

built something similar for terminal lovers. It's a CLI tool built in Python called hns [1] and uses faster-whisper for completely local speech-to-text. It automatically copies the transcription to the clipboard as well as writes to stdout so you seamlessly paste the transcription in any other application or pipe/redirect it to other programs/files.

[1]: https://github.com/primaprashant/hns

$kwar13 5 hours ago

Nicely done! Seeing that it uses a port of Whisper, here's my shameless plug for a gnome extension I made using Whisper:

https://extensions.gnome.org/extension/8238/gnome-speech2tex...

$efskap 9 hours ago

Cool, you just might've saved me some carpal tunnel in the long run xD.

I guess there's no way for the AppImage to use GPU compute, right? Not that it matters much because parakeet is fast enough on CPU anyway.

[-]

$Leftium 8 hours ago

I think the Whisper models will all use GPU. Only the Parakeet model is limited to CPU.

(I'm unfamiliar with AppImage. Was the model included in the app image, or was there a download after selecting the model?)

[-]

$mamonoleechi 3 hours ago

not sure this might help, but when you launch the .appimage in a terminal, it shows you the command to extract the files it contains (to speed the loading) ; this might help you find the files you're searching for, maybe :)

$m13rar 13 hours ago

Awesome . I was looking to build this on my own. Will look at the code and consider contributing cheers.

[-]

$sipjca 12 hours ago

Hey author of Handy here! Would absolutely love any help, please let me know if there's any way I can make contributing easier!

$jonahx 14 hours ago

How good will this local model be compared to, say, your iphone builtin STT?

[-]

$dcre 13 hours ago

It’s way better. iPhone’s is awful. On macOS, interestingly, the built in dictation seems a bit better than on iOS, but still not as good as Whisper and Parakeet. Worth noting I have never used Whisper Small, only large and turbo. Another comment says Parakeet is the default now, though, despite what the site says.

[-]

$sipjca 12 hours ago

Author here!

The default recommendation is Parakeet (mainly because it runs fast on a lot more hardware), but definitely think people should experiment with different models and see what is best for them. Personally I found Whisper Medium to be far better than Turbo and Large for my speech, and Parakeet is about on par with Medium, but each have their own quirks.

I'll update the site soon!

[-]

$dcre 8 hours ago

That's really interesting about medium being better than large. I never bothered trying the smaller models since the big ones were fast enough.

$skeptrune 5 hours ago

Amazing! I have been desperately wanting this. Livecaptions doesn't seem to be maintained super well.

$thelittleone 4 hours ago

Is it able to isolate the speaker from background noises / voices?

$hu3 15 hours ago

Very cool. Uses whiper small uder the hood.

https://github.com/openai/whisper

[-]

$geor9e 14 hours ago

nvidia parakeet v3 was the default out of the box and it works surprisingly well

it offers all the different sizes of openai models too

$vladstudio 7 hours ago

+1, happy user and a humble contributor.

$amelius 5 hours ago

How handy is this for coding? ;)

$majorchord 15 hours ago

TypeScript 53.9% Rust 44.9%

FYI

[-]

$yoavm 13 hours ago

The README is very clear about it:

Frontend: React + TypeScript with Tailwind CSS for the settings UI Backend: Rust for system integration, audio processing, and ML inference

$typpilol 15 hours ago

Lmao. At least it's typescript and not JavaScript!

[-]

$shakabrah 14 hours ago

Who’s gonna tell him?

[-]

$nicce 12 hours ago

Yeah. Rust compiles to machine code.

$loloquwowndueo 13 hours ago

Don’t you dare!

$typpilol 10 hours ago

I thought it was a clever joke

$areeba_iqbal 16 hours ago

That's great, nice to see more and more projects of Machine learning being written in rust

[-]

$dcre 13 hours ago

It’s not really a machine learning project. It’s an application that calls existing models.

[-]

$amelius 5 hours ago

Repo says:

CPU-optimized speech recognition with Parakeet models

$ajsnigrutin 12 hours ago

More than half the code is typescript.

[-]

$sipjca 12 hours ago

It's typescript because it is a Tauri app which uses the system webview to render the UI.

Most of the audio code/inference code is Rust or bindings to libraries like whisper.cpp

$rgbrgb 13 hours ago

this is a great landing page. I downloaded.

great onboarding too, using it now.

Very handy, thanks!

[-]

$ashu1461 7 hours ago

Landing page is indeed very refreshing

$precompute 6 hours ago

This is local, but I've found that external inference is fast enough, as long as you're okay with the possible lack of privacy. My PC isn't beefy enough to really run whisper locally without impacting my workflow, so I use Groq via a shell script. It records until I tell it to stop, then it either copies it to the clipboard or writes it into the last position the cursor was in.

$oulipo2 6 hours ago

Nice! There's also the VoiceInk open-source project https://github.com/Beingpax/VoiceInk/

[-]

$atmanactive 2 hours ago

MacOS only.

$ranger_danger 15 hours ago

Anyone know of the opposite? A really easy-to-use text-to-speech program that is cross-platform?

[-]

$geor9e 14 hours ago

I've tried a lot of them, and the best I found so far is Edge browsers built in microsoft (natural) voices, which I call via javascript or the browsers read aloud function.

[-]

$yoavm 13 hours ago

Checkout https://github.com/rany2/edge-tts , which exposes it as a Python library and a CLI tool.

$derekja 8 hours ago

I’ve been enjoying Kokoro

Amazing what it can do with only 82M parameters

https://www.kokorotts.io/

$jszymborski 15 hours ago

I've used Speech Note, which works well for STT and TTS.

$ompogUe 12 hours ago

Been having fun with this one

https://addons.mozilla.org/en-CA/firefox/addon/read-aloud/

Read Aloud allows you to select from a variety of text-to-speech voices, including those provided natively by the browser, as well as by text-to-speech cloud service providers such as Google Wavenet, Amazon Polly, IBM Watson, and Microsoft. Some of the cloud-based voices may require additional in-app purchase to enable.

...

the shortcut keys ALT-P, ALT-O, ALT-Comma, and ALT-Period can be used to Play/Pause, Stop, Rewind, and Forward, respectively.

$mzimbres 4 hours ago

Another day in HackerNews, another whatever written in Rust.

[-]

$amelius 3 hours ago

How can I call this library from C++?

$perfmode 13 hours ago

how’s it differ from macos dictation?

[-]

$dcre 13 hours ago

I find state of the art speech to text models like Whisper and Nvidia Parakeet are a lot better than macOS dictation. I use them through MacWhisper, but this is basically the same.

$roscas 16 hours ago

It downloads the model at first execution and also checks versions in github.

That is ok for what is brings. Nice program. Very "handy".

[-]

$sipjca 12 hours ago

If you prefer a more stripped down version: the original releases (0.1.0 and 0.1.1) shipped with Whisper tiny included and no auto-update feature

Handy – Free open-source speech-to-text app written in Rust