> Can a model trained on satellite data really find brambles on the ground?
No, as per researcher, "However, it is obvious that most of the generated findings aren’t brambles" and obviously no.
All the model did was think they followed roads, all roads.
If it was oil and gas where people put in effort and their results where checked vs universities where meaningless citations matter and results are never confirmed, it would be more believable.
What they are asking is impossible, increasing the likelihood without silly hacks like it's not in rivers or on top of buildings is an interesting problem but out of scope for academics.
So after transforming multispectral satellite data into a 128-dimensional embedding vector you can play "Where's Wally" to pinpoint blackberry bushes? I hope they tasted good! I'm guessing you can pretty much pinpoint any other kind of thing as well then?
Yes it's very good fun just exploring the embeddings! It's all wrapped by the geotessera Python library, so with uv and gdal installed just try this for your favourite region to get a false-colour map of the 128-dimensional embeddings:
# for cambridge
# https://github.com/ucam-eo/geotessera/blob/main/example/CB.geojson
curl -OL https://raw.githubusercontent.com/ucam-eo/geotessera/refs/heads/main/example/CB.geojson
# download the embeddings as geotiffs
uvx geotessera download --region-file CB.geojson -o cb2
# do a false colour PCA down to 3 dimensions from 128
uvx geotessera visualize cb2 cb2.tif
# project onto webmercator and visualise using leafletjs over openstreetmap
uvx geotessera webmap cb2.tif --output cb2-map --serve
Downstream classifiers are really fast to train (seconds for small regions). You can try out a notebook in VSCode to mess around with it graphically using https://github.com/ucam-eo/tessera-interactive-map
The berries were a bit sour, summer is sadly over here!
This is all far outside of my wheel house but I'm curious if there's any way to use this for rocks and geology? Identifying dikes and veins on cliff sides from satellites would be really cool.
A major limitation is that most different rock types look essentially identical in visual+NIR spectral ranges. Things separate once you get out to SWIR bands. Sentinel2 does have some SWIR bands and it may work reasonably well with embeddings. But a lot of the signal the embeddings are going to be focused on encoding may not be the right features to distinguish rock types. Methods more focused specifically on the SWIR range are more likely to work reliably. E.g. simple band ratios of SWIR bands may give a cleaner signal than general purpose embeddings in this case.
Hyperspectral in the SWIR range is what you really want for this, but that's a whole different ball game.
Usually airplanes because the instruments are heavy. But yeah, that's the most common case. Hyperspectral sats are much rarer than aerial hyperspectral.
It might work. TESSERA's embeddings are at a 10 metre resolution, so it might depend on the size of the features you are looking for. If those features have distinct changes in colour or texture over time or they scatter radar in different ways compared with their surroundings then you should be able to discriminate them.
The easiest way to test is to try out the interactive notebook and drop some labels in known areas.
Is there a way to cluster the embeddings spatially or look for patterns isolated so some dimensions? (Again, way out of my wheel house)
What I mean is a vein is usually a few meters wide but can be hundreds of meters long so ten meter resolution is probably not very helpful unless the embeddings can encode some sort of pattern that stretches across many cells.
I haven’t done this kind of thing since undergrad, but hyperspectral data is really frickin cool this way. Not only can you use spectral signatures to identify specific things, but also figure out what those things are made out of by unmixing the spectra.
For example, figure out what crop someone’s growing and decide how healthy it is. With sufficient temporal resolution, you can understand when things are planted and how well they’re growing, how weedy or infiltrated they are by pest plants, how long the soil remains wet or if rainwater runs off and leaves the crop dry earlier than desired. Etc.
If you’re a good guy, you’d leverage this data to empower farmers. If you’re an asshole, you’re looking to see who has planted your crop illegally, or who is breaking your insurance fine print, etc.
Hyperspectral data is really neat though it's worth pointing out that TESSERA is only trained on multispectral (optical + SAR) data.
You are very right on the temporal aspect though, that's what makes the representation so powerful. Crops grow and change colour or scatter patterns in distinct ways.
It's worth pointing out the model and training code is under an Apache2 license and the global embeddings are under a CC-BY-A. We have a python library that makes working with them pretty easy: https://github.com/ucam-eo/geotessera
> If you’re a good guy, you’d leverage this data to empower farmers. If you’re an asshole, you’re looking to see who has planted your crop illegally, or who is breaking your insurance fine print, etc.
How does using it to speculate on crop futures rank?
Every time someone explains the way short selling or speculative markets work, I have a “oh, I get it…” moment and then forget months later.
Same with insurance… socialized risk for our food supply is objectively good, and protecting the insurance mechanism from fraud is good. People can always bastardize these things.
That's should be a pretty good usecase; if you do just a few labels manually of known hotsprings you should be able to find others quite quickly using the TESSERA interactive notebook. The embeddings capture the annual spectral-temporal signature, so a hotspring should be fairly distinctive vs the surroundings.
Not much detail on the method? Like what data it takes from iNaturalist - for example if it's taking in GPS coordinates of observations of brambles then it's not clear what there is for the ML model to do.
What detail was in the satellite images, was it taking signals of the type of spaces brambles are in, or was it just visually identifying bramble patches?
In the UK you get brambles in pretty much every non-cultivated green space. I wonder how well the classifier did?
Hi! You can find a bit more about Gabriel's model through some of his posts over the last few weeks: https://gabrielmahler.org/posts/
When it comes to the satellite images, the model actually used TESSERA (https://arxiv.org/abs/2506.20380) which is a model we trained to produce embeddings for every point on earth that encodes the temporal-spectral properties over a year.
Think of it like a compression of potentially fifty or a hundred observations of a particular point in earth down to a single 128 dimension vector.
That's actually a great idea! I wonder what kind of feature size would be needed though - TESSERA's embeddings are at a 10 metre resolution so for larger structures you might need some kind of spatial aggregation.
FarmLogs (YC 12) did exactly this. We used sat imagery in the near-infrared spectrum to determine crop health remotely. Modern farming utilizes a practice called precision ag - where your machine essentially has a map of zones on the field for where treatments are or aren't needed and controllers that can turn spray nozzles on/off depending on boundaries. We used sat imagery as the base for an automated prescription system, too. So a farmer can reduce waste by only applying fertilizer or herbicide in specific areas that need it.
There is the issue of just how visible truffles are from space though, if they grow under cover. That said, it may still work because you can find habitats that are very likely to have truffles. We've had some promising results looking at fungal biomass.
A model I have trained on ASTER and LANDSAT data has major difficulties identifying spots for agate hunting. Even after I've given it extra instruction such as looking only in volcanic terrain (with USGS map provided,) or focusing on mixed signals of hydrous silica and iron, checking near known fault zones in said volcanic areas, it still gave me results everywhere, and almost none matching my criteria.
Plants are a way different and more difficult ballgame (they like to mess up my satellite data) so as I read I am not surprised to see that this didn't really give proper results.
> Can a model trained on satellite data really find brambles on the ground?
No, as per researcher, "However, it is obvious that most of the generated findings aren’t brambles" and obviously no.
All the model did was think they followed roads, all roads.
If it was oil and gas where people put in effort and their results where checked vs universities where meaningless citations matter and results are never confirmed, it would be more believable.
What they are asking is impossible, increasing the likelihood without silly hacks like it's not in rivers or on top of buildings is an interesting problem but out of scope for academics.
https://gabrielmahler.org/environment/ai/ml/%F0%9F%A6%94/202...
For the "However, it is obvious that most of the generated findings aren’t brambles"
isn't this the same findings as the old "we trained to identify huskies, but instead we identified snow" problem?
So after transforming multispectral satellite data into a 128-dimensional embedding vector you can play "Where's Wally" to pinpoint blackberry bushes? I hope they tasted good! I'm guessing you can pretty much pinpoint any other kind of thing as well then?
Yes it's very good fun just exploring the embeddings! It's all wrapped by the geotessera Python library, so with uv and gdal installed just try this for your favourite region to get a false-colour map of the 128-dimensional embeddings:
Because the embeddings are precomputed, the library just has to download the tiles from our server. More at: https://anil.recoil.org/notes/geotessera-pythonDownstream classifiers are really fast to train (seconds for small regions). You can try out a notebook in VSCode to mess around with it graphically using https://github.com/ucam-eo/tessera-interactive-map
The berries were a bit sour, summer is sadly over here!
This is all far outside of my wheel house but I'm curious if there's any way to use this for rocks and geology? Identifying dikes and veins on cliff sides from satellites would be really cool.
A major limitation is that most different rock types look essentially identical in visual+NIR spectral ranges. Things separate once you get out to SWIR bands. Sentinel2 does have some SWIR bands and it may work reasonably well with embeddings. But a lot of the signal the embeddings are going to be focused on encoding may not be the right features to distinguish rock types. Methods more focused specifically on the SWIR range are more likely to work reliably. E.g. simple band ratios of SWIR bands may give a cleaner signal than general purpose embeddings in this case.
Hyperspectral in the SWIR range is what you really want for this, but that's a whole different ball game.
> Hyperspectral in the SWIR range is what you really want for this, but that's a whole different ball game.
Are there any hyperspectral surveys with UAVs etc instead of satellites?
Usually airplanes because the instruments are heavy. But yeah, that's the most common case. Hyperspectral sats are much rarer than aerial hyperspectral.
It might work. TESSERA's embeddings are at a 10 metre resolution, so it might depend on the size of the features you are looking for. If those features have distinct changes in colour or texture over time or they scatter radar in different ways compared with their surroundings then you should be able to discriminate them.
The easiest way to test is to try out the interactive notebook and drop some labels in known areas.
Is there a way to cluster the embeddings spatially or look for patterns isolated so some dimensions? (Again, way out of my wheel house)
What I mean is a vein is usually a few meters wide but can be hundreds of meters long so ten meter resolution is probably not very helpful unless the embeddings can encode some sort of pattern that stretches across many cells.
almost definitely!
I haven’t done this kind of thing since undergrad, but hyperspectral data is really frickin cool this way. Not only can you use spectral signatures to identify specific things, but also figure out what those things are made out of by unmixing the spectra.
For example, figure out what crop someone’s growing and decide how healthy it is. With sufficient temporal resolution, you can understand when things are planted and how well they’re growing, how weedy or infiltrated they are by pest plants, how long the soil remains wet or if rainwater runs off and leaves the crop dry earlier than desired. Etc.
If you’re a good guy, you’d leverage this data to empower farmers. If you’re an asshole, you’re looking to see who has planted your crop illegally, or who is breaking your insurance fine print, etc.
Hyperspectral data is really neat though it's worth pointing out that TESSERA is only trained on multispectral (optical + SAR) data.
You are very right on the temporal aspect though, that's what makes the representation so powerful. Crops grow and change colour or scatter patterns in distinct ways.
It's worth pointing out the model and training code is under an Apache2 license and the global embeddings are under a CC-BY-A. We have a python library that makes working with them pretty easy: https://github.com/ucam-eo/geotessera
> If you’re a good guy, you’d leverage this data to empower farmers. If you’re an asshole, you’re looking to see who has planted your crop illegally, or who is breaking your insurance fine print, etc.
How does using it to speculate on crop futures rank?
Every time someone explains the way short selling or speculative markets work, I have a “oh, I get it…” moment and then forget months later.
Same with insurance… socialized risk for our food supply is objectively good, and protecting the insurance mechanism from fraud is good. People can always bastardize these things.
It is good to enable people to hedge against bad harvests.
Yes! TESSERA is very new so we're still exploring how well it works for various things.
We're hoping to try it with a few different things for our next field trip, maybe some that are much harder to find than brambles.
I've wondered this about finding hot springs.
That's should be a pretty good usecase; if you do just a few labels manually of known hotsprings you should be able to find others quite quickly using the TESSERA interactive notebook. The embeddings capture the annual spectral-temporal signature, so a hotspring should be fairly distinctive vs the surroundings.
Video of the notebook in action https://crank.recoil.org/w/mDzPQ8vW7mkLjdmWsW8vpQ and the source https://github.com/ucam-eo/tessera-interactive-map
Not much detail on the method? Like what data it takes from iNaturalist - for example if it's taking in GPS coordinates of observations of brambles then it's not clear what there is for the ML model to do.
What detail was in the satellite images, was it taking signals of the type of spaces brambles are in, or was it just visually identifying bramble patches?
In the UK you get brambles in pretty much every non-cultivated green space. I wonder how well the classifier did?
Interesting project.
Hi! You can find a bit more about Gabriel's model through some of his posts over the last few weeks: https://gabrielmahler.org/posts/
When it comes to the satellite images, the model actually used TESSERA (https://arxiv.org/abs/2506.20380) which is a model we trained to produce embeddings for every point on earth that encodes the temporal-spectral properties over a year.
Think of it like a compression of potentially fifty or a hundred observations of a particular point in earth down to a single 128 dimension vector.
Happy to answer any other questions.
The in-person verification of hotspots was good, but in-person verification of non-hotspots was not done, and might be difficult.
Seems like it could be pretty useful for archaeology as well.
That's actually a great idea! I wonder what kind of feature size would be needed though - TESSERA's embeddings are at a 10 metre resolution so for larger structures you might need some kind of spatial aggregation.
As a hobby project, I was looking into using LiDAR data to view archeological points of interest in Switzerland: https://github.com/r-follador/delta-relief
It would be interesting to overlay TESSERA data there, although the resolution is of course very different.
Related, I think? Satellite + AI = finding things, not sure if similar beyond that
https://www.pnas.org/doi/10.1073/pnas.2407652121
> So it turns out that there's a lot of bramble between the community center and entrance to Milton Country Park.
> In every place we checked, we found pretty significant amounts of bramble.
[Shocked Pikachu face]
If it can find sloes it's going to make sloe gin foragers very very angry. Generally when they find a usable crop they don't share it.
The whole-earth embeddings are interesting. Wonder if it'd be any good for looking for fresh water sources in the desert.
> Stopping to take a photo of a very photogenic bee
Show us the bee!
FarmLogs (YC 12) did exactly this. We used sat imagery in the near-infrared spectrum to determine crop health remotely. Modern farming utilizes a practice called precision ag - where your machine essentially has a map of zones on the field for where treatments are or aren't needed and controllers that can turn spray nozzles on/off depending on boundaries. We used sat imagery as the base for an automated prescription system, too. So a farmer can reduce waste by only applying fertilizer or herbicide in specific areas that need it.
can it find me truffles?
If you have some GPS locations of truffles, you could use the notebook Anil mentioned here https://news.ycombinator.com/item?id=45378855 and give it a go.
There is the issue of just how visible truffles are from space though, if they grow under cover. That said, it may still work because you can find habitats that are very likely to have truffles. We've had some promising results looking at fungal biomass.
A model I have trained on ASTER and LANDSAT data has major difficulties identifying spots for agate hunting. Even after I've given it extra instruction such as looking only in volcanic terrain (with USGS map provided,) or focusing on mixed signals of hydrous silica and iron, checking near known fault zones in said volcanic areas, it still gave me results everywhere, and almost none matching my criteria.
Plants are a way different and more difficult ballgame (they like to mess up my satellite data) so as I read I am not surprised to see that this didn't really give proper results.