That‘s actually quite similar to the nearness library [1]. The main difference appears to be vicinity‘s focus on simplicity while nearness tries to expose most of the functionality of the underlying backends.
I would actually perhaps think the next step would be to add some sugar that allows you to run a random / fixed grid of hyper-parameters and get a report of accuracy and speed for your specific data set.
Thanks! This is actually something that we have been experimenting with a bit already (auto-tuning on a specific dataset basically). It turned out to be quite complicated given how many index and parameter combinations you get with a grid-search (making it very costly on larger datasets), which is why we first opted for this approach where you can evaluate with a chosen index + parameter set, but it's definitely something we are still planning to do.
Some backends/algorithms don't natively support dynamic inserts, and require you to rebuild your index when you want to add vectors to it (Annoy and Pynndescent are the only backends that don't support it).
Hybrid search is a really cool idea though; it's not something we support at the moment, but definitely something we could investigate and add as an upcoming feature, thanks for the suggestion!
1: that could be something for the future, but at the moment this is just meant as a way to quickly try out and evaluate various algorithms and libraries without having to learn the syntax for them (we call those backends).
2: we adopted the same methodology as ann-benchmarks for our evaluation, so technically the benchmarks there are valid for the backends we support. However it's a good suggestion to add those explicitly to the repo, I'll add a todo for that.
3: mainly because a: it's the language we are most the comfortable with developing in, b: it's the most widely used and adopted language for ML and c: (almost) all the algorithms we support are written in C/C++/Cython already.
That‘s actually quite similar to the nearness library [1]. The main difference appears to be vicinity‘s focus on simplicity while nearness tries to expose most of the functionality of the underlying backends.
[1] https://github.com/davnn/nearness
This is great.
I would actually perhaps think the next step would be to add some sugar that allows you to run a random / fixed grid of hyper-parameters and get a report of accuracy and speed for your specific data set.
Thanks! This is actually something that we have been experimenting with a bit already (auto-tuning on a specific dataset basically). It turned out to be quite complicated given how many index and parameter combinations you get with a grid-search (making it very costly on larger datasets), which is why we first opted for this approach where you can evaluate with a chosen index + parameter set, but it's definitely something we are still planning to do.
What does it mean that insertion is only supported for a few of the indexes? Also will this allow hybrid search for the backends that support it?
Some backends/algorithms don't natively support dynamic inserts, and require you to rebuild your index when you want to add vectors to it (Annoy and Pynndescent are the only backends that don't support it).
Hybrid search is a really cool idea though; it's not something we support at the moment, but definitely something we could investigate and add as an upcoming feature, thanks for the suggestion!
Some questions
1. When you say backends, do you plan to integrate like a client with some "vector" stores. 2. Also any benchmarks? 3. Lastly, why python?
1: that could be something for the future, but at the moment this is just meant as a way to quickly try out and evaluate various algorithms and libraries without having to learn the syntax for them (we call those backends).
2: we adopted the same methodology as ann-benchmarks for our evaluation, so technically the benchmarks there are valid for the backends we support. However it's a good suggestion to add those explicitly to the repo, I'll add a todo for that.
3: mainly because a: it's the language we are most the comfortable with developing in, b: it's the most widely used and adopted language for ML and c: (almost) all the algorithms we support are written in C/C++/Cython already.
Not author of the library, but the documentation lists the backends here: https://github.com/MinishLab/vicinity?tab=readme-ov-file#sup...
So these are nearest neighbor search implementations, not database backends.