EdgeFoundry – Deploy and Monitor Local LLMs

(github.com)

2 points | by allaffa 13 hours ago ago

1 comments

$allaffa 13 hours ago

Hey HN,
I’ve been working on EdgeFoundry, an open-source DevOps and observability toolkit that makes it easy to deploy, monitor, and manage local LLMs on your own machine or private server.
What it does EdgeFoundry helps you: • Run quantized LLMs locally (like TinyLlama or Phi-3) using LlamaCPP • Monitor telemetry such as latency, tokens per second, and memory usage • Use a simple CLI to deploy, start, stop, and view models • Store and visualize metrics in a local SQLite database and React dashboard • Keep everything offline-first and privacy-friendly
In short: Ollama runs your model — EdgeFoundry helps you deploy and observe it like a production system.
Key Features (MVP) • CLI: edgefoundry deploy/start/stop/status • Local agent (FastAPI + LlamaCPP) to run the model • Telemetry logging for latency, memory, and token throughput • Local dashboard (React) for visualizing metrics • SQLite backend for offline data storage • Support for TinyLlama and Phi-3 Mini out of the box
Why I built this While building local AI projects like offline RAG assistants, I realized there was no easy way to deploy and track local models with observability and lifecycle management like we have in the cloud. Developers want control, privacy, and insight — but tools like Ollama lack monitoring, telemetry, or multi-device orchestration.
EdgeFoundry fills that gap by offering the DevOps and observability layer for edge AI.
Who it’s for • Developers running quantized models locally • Teams building offline-first AI apps • Startups needing on-prem AI for compliance • Anyone who wants visibility into local LLM performance
Quick Start
# 1. Install pip install edgefoundry
# 2. Deploy a local model edgefoundry deploy --model tinyllama-1b-3bit.gguf
# 3. Start the agent edgefoundry start
# 4. Open the dashboard edgefoundry dashboard
You’ll see live metrics like latency, memory usage, and tokens per second for each inference.
Future Plans The next phase of EdgeFoundry is to enable mass deployment and testing of local AI models across devices. The goal is to make it possible for companies to: • Deploy local models at scale to phones, laptops, or IoT devices • Collect telemetry and performance data from real devices or simulations (for example, using Android Studio or local emulators) • Use this data to evaluate, tune, and monitor model performance before and after rollout
This would let teams building privacy-first or on-device AI systems manage fleets of local deployments with the same level of visibility and control they have in the cloud.
Feedback wanted This is an early MVP. I’d love feedback on: • What features you’d want for multi-device orchestration • Whether cloud sync or over-the-air updates would be useful • What matters most for large-scale local deployments on phones or computers
GitHub: https://github.com/TheDarkNight21/edge-foundry
If you try it, please share your experience or open an issue. I’m eager to hear from others building privacy-first AI tools or deploying LLMs locally.
Thanks for reading. I’ll be in the comments to answer questions and discuss next steps.