I started working on Alki because wrangling and deploying AI models on edge devices felt clunky. Going from a Hugging Face model to something production-ready required too many manual steps. Alki aims to simplify that by offering more conversion options, packaging models into reproducible bundles, and making them easy to publish or run on edge hardware.
Llama.cpp was a great starting point, but it led me down the rabbit hole of building a toolchain that could handle different quantization methods, export formats, and deployment bundles for orchestration.
It’s still early and I’d love feedback from infra engineers and startups running LLMs outside the cloud, especially around packaging formats, performance, and what would make the biggest quality-of-life improvements.
Hi HN,
I started working on Alki because wrangling and deploying AI models on edge devices felt clunky. Going from a Hugging Face model to something production-ready required too many manual steps. Alki aims to simplify that by offering more conversion options, packaging models into reproducible bundles, and making them easy to publish or run on edge hardware.
Llama.cpp was a great starting point, but it led me down the rabbit hole of building a toolchain that could handle different quantization methods, export formats, and deployment bundles for orchestration.
It’s still early and I’d love feedback from infra engineers and startups running LLMs outside the cloud, especially around packaging formats, performance, and what would make the biggest quality-of-life improvements.