Interesting approach. How are you handling the DOM processing inside the sandbox without spiking CPU usage? If it's not making constant API calls, is the vision model running locally (WASM/WebGPU), or are you using a clever way to diff the page state before sending it to the LLM?
Yes, it is kind of a clever way to diff the page state. The ai agent has some built in tools to deal with this. The user gives a prompt say "Watch for X words" The LLM then runs the provided tool with the necessary args. The tool then runs a python loop to check for it in the DOM while the LLM sleeps. Then once it's found the LLM is awoken. Also, there's a tool for watching for changes in pixels in certain regions. It works in a similar way.
Interesting approach. How are you handling the DOM processing inside the sandbox without spiking CPU usage? If it's not making constant API calls, is the vision model running locally (WASM/WebGPU), or are you using a clever way to diff the page state before sending it to the LLM?
Yes, it is kind of a clever way to diff the page state. The ai agent has some built in tools to deal with this. The user gives a prompt say "Watch for X words" The LLM then runs the provided tool with the necessary args. The tool then runs a python loop to check for it in the DOM while the LLM sleeps. Then once it's found the LLM is awoken. Also, there's a tool for watching for changes in pixels in certain regions. It works in a similar way.
[dead]
[dead]
[dead]