I built this "home rig" because I wanted a transparent way to see how LLMs actually perform on my own hardware.
Why I built this:
Idle Power: I wanted to utilize the idle processing power of my gaming rig for automated testing loops.
Layered Logic: It uses a 4-layer logic system (Layer 0-3) to iterate through ideas and evaluation results. front end controls and defines the process.
Weighted Grading: Instead of "black-box" scores, it uses attribute-weighted rubrics (like Logic vs. Tone) that are entirely user-defined.
Full Transparency: The frontend is designed to let you review logs and model comparisons in real-time with full control over the testing process.
I decided to release this today because I’m curious how others in the community are defining their evaluation rubrics or automating prompt improvements based on local testing and visualize the process.
I built this "home rig" because I wanted a transparent way to see how LLMs actually perform on my own hardware.
Why I built this:
Idle Power: I wanted to utilize the idle processing power of my gaming rig for automated testing loops.
Layered Logic: It uses a 4-layer logic system (Layer 0-3) to iterate through ideas and evaluation results. front end controls and defines the process.
Weighted Grading: Instead of "black-box" scores, it uses attribute-weighted rubrics (like Logic vs. Tone) that are entirely user-defined.
Full Transparency: The frontend is designed to let you review logs and model comparisons in real-time with full control over the testing process.
I decided to release this today because I’m curious how others in the community are defining their evaluation rubrics or automating prompt improvements based on local testing and visualize the process.
I'd love to hear your thoughts!