Interesting approach. Progressive disclosure helps with token limits, but I'm curious how you handle state across multi-step tasks where Layer 2/3 context from an earlier step becomes relevant again later?
The "lost in the middle" problem is also about the model losing track of what happened 10 steps ago even if it was loaded at the time.
We don't try to keep everything in context. Instead, we maintain a lightweight "memory index" (summaries) that's always present, and use LLM-as-a-judge to decide when to reload full content. This mirrors how humans work: we remember the gist of documents, and go back to re-read when we need details.
This approach trades 1 extra LLM call (for recall detection) for significant context window savings while preserving the ability to access full details when needed.
Interesting approach. Progressive disclosure helps with token limits, but I'm curious how you handle state across multi-step tasks where Layer 2/3 context from an earlier step becomes relevant again later? The "lost in the middle" problem is also about the model losing track of what happened 10 steps ago even if it was loaded at the time.
We don't try to keep everything in context. Instead, we maintain a lightweight "memory index" (summaries) that's always present, and use LLM-as-a-judge to decide when to reload full content. This mirrors how humans work: we remember the gist of documents, and go back to re-read when we need details.
This approach trades 1 extra LLM call (for recall detection) for significant context window savings while preserving the ability to access full details when needed.