I also built a hosted option for the OmniParser network since running it locally might be more setup than the average person wants to deal with, and there's a simple interface to create keys on the website. Give it a shot if you wanna try the tool but please don't abuse it too hard I'm paying out of pocket for the GPU. This ain't some VC funded project it's literally just me.
And please let me know if/when you find bugs, I'm definitely expecting you'll encounter some and I'll do my best to fix them ASAP.
Yeah speed is definitely a weakness right now. One option is to switch the models around and try a faster one, that will probably give you some speed-up at the potential expense of accuracy. You can do that in the .skipperrc file. There's some gains to be made for prompting most likely, as well as in the Omniparser model.
A more future thing I'd like to do is have a model specifically for the action selection task which could be a much smaller network. That would really improve the latency.
I also built a hosted option for the OmniParser network since running it locally might be more setup than the average person wants to deal with, and there's a simple interface to create keys on the website. Give it a shot if you wanna try the tool but please don't abuse it too hard I'm paying out of pocket for the GPU. This ain't some VC funded project it's literally just me.
And please let me know if/when you find bugs, I'm definitely expecting you'll encounter some and I'll do my best to fix them ASAP.
tried it out and honestly it worked pretty well, but it’s way slower than me just doing the task myself, needs to be faster
Yeah speed is definitely a weakness right now. One option is to switch the models around and try a faster one, that will probably give you some speed-up at the potential expense of accuracy. You can do that in the .skipperrc file. There's some gains to be made for prompting most likely, as well as in the Omniparser model.
A more future thing I'd like to do is have a model specifically for the action selection task which could be a much smaller network. That would really improve the latency.