Anyone have recommendations for an image cache? Native kubernetes a plus.
What would be really nice is a system with mutating admission webhooks for pods which kicks off a job to mirror the image to a local registry and then replaces the image reference with the mirrored location.
We do a local (well, internal) mirror for "all" these things. So, we're basically never stuck. It mirrors our CPAN, NPM, Composer, Docker and other of these web-repos. Helps on the CI tooling as well.
Not Google Artifact Registry... Our Docker Hub pull-through mirror went down with the Docker Hub outage. Images were still there but all image tags were gone
I've been using https://github.com/enix/kube-image-keeper on some of my clusters - it is a local docker registry running on cluster, with a proxy and mutation webhooks. I also evaluated spegel, but currently it isn't possible to setup on GKE
Depending on what other (additional) features you're willing to accept, the GoHarbor[0] registry supports pull-through as well as mirroring and other features, it's a nice registry that also supports other OCI stuff like Helm charts, and does vulnerability scanning with "Interrogation Services" like Trivy.
I've been using it at home and work for a few years now, might be a bit overkill if you just want a simple registry, but is a really nice tool for anyone who can benefit from the other features.
Basically it's a k3s configured to use a local mirror and that local mirror is running the Zot registry (https://zotregistry.dev/v2.1.8/). It is configured to automatically expired old images so my local hard drive isn't filled up).
CNCF has harbor [0], which I use at home and have deployed in a few clusters at work, and it works well as a pull through cache. In /etc/containers/registries.conf it's just another line below any registry you want mirrored.
Where hub is the name of the proxy you configured for, in this case, docker.io. It's not quite what you're asking for but it can definitely be transparent to users. I think the bonus is that if you look at a podspec it's obvious where the image originates and you can pull it yourself on your machine, versus if you've mutated the podspec, you have to rely on convention.
I would add, for anyone not familiar with it, that this (and more advanced mirroring, etc) is just as easily done from the really nice Web UI (if that's your cup of tea).
Github actions buildx also going down is a really unintended consequence. It would be great if we could mirror away from docker entirely at this point but I digress.
I’ll admit I haven’t checked before posting, perhaps an admin can merge both submissions and change the URL on the one you linked to the one in this submission.
I didn't even really realize it was a SPOF in my deploy chain. I figured at least most of it would be cached locally. Nope, can't deploy.
I don't work on mission-critical software (nor do I have anyone to answer to) so it's not the end of the world, but has me wondering what my alternate deployment routes are. Is there a mirror registry with all the same basic images? (node/alpine)
I suppose the fact that I didn't notice before says wonderful things about its reliability.
> wondering what my alternate deployment routes are
If the stakes are low and you don't have any specific need for a persistent registry then you could skip it entirely and push images to production from wherever they are built.
This could be as simple as `docker save`/`scp`/`docker load`, or as fancy as running an ephemeral registry to get layer caching like you have with `docker push`/`docker pull`[1].
I guess the best way would be to have a self-hosted pull-through registry with a cache. This way you'd have all required images ready even when dockerhub is offline.
Unfortunately that does not help in an outage because you cannot fill the cache now.
In the case where you still have an image locally, trying to build will fail with an error complaining about not being able to load metadata for the image because a HEAD request failed. So, the real question is, why isn't there a way to disable the HEAD request for loading metadata for images? Perhaps there's a way and I don't know it.
This is the way tho this can lead to fun moments like I was just setting up a new cluster and couldn't figure out why I was having problems pulling images when the other clusters were pulling just fine.
Took me a while to think of checking the docker hub status page.
It's a bit stupid that I can't restart (on Coolify) my container, because pulling the image fails, even though I am already running it, so I do have the image, I just need to restart the Node.js process...
The images I use the most, we pull and push to our own internal registry, so we have full control.
There are still some we pull from Docker Hub, especially in the build process of our own images.
To work around that, on AWS, you can prefix the image with public.ecr.aws/docker/library/ for example public.ecr.aws/docker/library/python:3.12 and it will pull from AWS's mirror of Docker Hub.
Someone mentioned Artifactory; but its honestly not needed. I would very highly recommend an architecture where you build everything into a docker image and push it to an internal container registry (like ecr; all public clouds have one) for all production deployments. This way, outages only affect your build/deploy pipeline.
You pull the images you want to use, preferably with some automated process, then push them to your own repo. And anyways use your own repo when pulling for dev/production. It saves you from images disappearing as well.
There is Sonatype Nexus. A bit annoying to administer (automated cleanup works every time, 60% of the time), but supports most package formats (Maven, npm, NuGet and so on) alongside offering Docker registries, both hosted and proxy ones. Also can be deployed as a container itself.
I was hoping google cloud artifact registry pull-thru caching would help. Alas, it does not.
I can see an image tag available in the cache in my project on cloud.google.com, but after attempting to pull from the cache (and failing) the image is deleted from GAR :(
> "When a pull is attempted with a tag, the Registry checks the remote to ensure if it has the latest version of the requested content. Otherwise, it fetches and caches the latest content."
So if the authentication service is down, it might also affect the caching service.
In our ci setting up the docker buildx driver to use the artifact registry pull through cache involves (apparently) an auth transaction to dockerhub which fails out
Hard to see if this is /s or not. Nobody is forcing you to run images straight from dockerhub lol. Every host keeps the images already on it. Running a in-house registry is also a good idea.
We chose to move to GitLab's container registry for all the images we use. It's pretty easy to do and I'm glad we did. We used to only use it for our own builds.
The package registry is also nice. I only wish they would get out of the "experimental" status for apt mirror support.
We use a https://container-registry.com, which is a clean version of the open source Harbor registry software (https://goharbor.io/) from one of the maintainers. It works well and reliable for years now and has no vendor-lock-in thanks to Harbor.
It's pronounced keɪ (from your link - The spelling quay, first appearing in the sixteenth century, follows modern French. As noted by the Oxford English Dictionary, third edition, the expected outcome of Middle English keye would be /keɪ/ in Modern English). Or key (with modern spelling).
We actually originally pronounced it as "kway" (the American pronunciation we had heard) but then had a saying we'd tell customers (when asked) of "pronounce it however you please, so long as you're happy using it!" :)
I know quay is a real word - it's not normally pronounced like "kway" but like "key". But only because enough people agree on that - that's what I mean by made up. The rules are just a majority agreement for both meaning and pronunciation.
My french speaking partner recently informed me the quay (pronounced key) meant something like ‘dock’ when we were discussing the Florida Keys, and suddenly everything fell into place!
You changed your base image and docker build process for a temporary outage? Or do you mean that this in general will be better, as you avoid one in-between image?
Exceeded their quota, probably, based on my recent experience with dockerhub
Tech support needs to log the server in to its account to get a bigger quota.
Anyone have recommendations for an image cache? Native kubernetes a plus.
What would be really nice is a system with mutating admission webhooks for pods which kicks off a job to mirror the image to a local registry and then replaces the image reference with the mirrored location.
We do a local (well, internal) mirror for "all" these things. So, we're basically never stuck. It mirrors our CPAN, NPM, Composer, Docker and other of these web-repos. Helps on the CI tooling as well.
This is the way. At some point it’s way too expensive for a single repo in your supply chain to go down or even pull a package.
Not Google Artifact Registry... Our Docker Hub pull-through mirror went down with the Docker Hub outage. Images were still there but all image tags were gone
I've been using https://github.com/enix/kube-image-keeper on some of my clusters - it is a local docker registry running on cluster, with a proxy and mutation webhooks. I also evaluated spegel, but currently it isn't possible to setup on GKE
I've been using Amazon ECR as an alternative source.
https://gallery.ecr.aws/
Depending on what other (additional) features you're willing to accept, the GoHarbor[0] registry supports pull-through as well as mirroring and other features, it's a nice registry that also supports other OCI stuff like Helm charts, and does vulnerability scanning with "Interrogation Services" like Trivy.
I've been using it at home and work for a few years now, might be a bit overkill if you just want a simple registry, but is a really nice tool for anyone who can benefit from the other features.
[0] https://goharbor.io/
I'm using a different approach for local testing where I don't want to redownload images over and over: https://github.com/stackabletech/k8s-local-dev
Basically it's a k3s configured to use a local mirror and that local mirror is running the Zot registry (https://zotregistry.dev/v2.1.8/). It is configured to automatically expired old images so my local hard drive isn't filled up).
https://github.com/spegel-org/spegel
That looks pretty close to what I want. Thanks!
I usually do upstream image mirroring as part of CI. Registries are built into GitLab, AWS (ECR), GitHub, etc
Quay.io
CNCF has harbor [0], which I use at home and have deployed in a few clusters at work, and it works well as a pull through cache. In /etc/containers/registries.conf it's just another line below any registry you want mirrored.
Where hub is the name of the proxy you configured for, in this case, docker.io. It's not quite what you're asking for but it can definitely be transparent to users. I think the bonus is that if you look at a podspec it's obvious where the image originates and you can pull it yourself on your machine, versus if you've mutated the podspec, you have to rely on convention.[0] https://goharbor.io/
I would add, for anyone not familiar with it, that this (and more advanced mirroring, etc) is just as easily done from the really nice Web UI (if that's your cup of tea).
Github actions buildx also going down is a really unintended consequence. It would be great if we could mirror away from docker entirely at this point but I digress.
There's a registry image for OCI containers that is pretty painless to set up and low maintenance, can use s3 as a storage backend.
https://hub.docker.com/_/registry
Your git provider probably also has a container registry service built in.
Status report says issue with authentication fixed but it's far worse than that. This incident also took down docker pull for public images with it.
Dupe https://news.ycombinator.com/item?id=45366942
I’ll admit I haven’t checked before posting, perhaps an admin can merge both submissions and change the URL on the one you linked to the one in this submission.
Well to be fair: this doesn't happen very often. It's quite a stable service in my experience.
I didn't even really realize it was a SPOF in my deploy chain. I figured at least most of it would be cached locally. Nope, can't deploy.
I don't work on mission-critical software (nor do I have anyone to answer to) so it's not the end of the world, but has me wondering what my alternate deployment routes are. Is there a mirror registry with all the same basic images? (node/alpine)
I suppose the fact that I didn't notice before says wonderful things about its reliability.
> I don't work on mission-critical software
> wondering what my alternate deployment routes are
If the stakes are low and you don't have any specific need for a persistent registry then you could skip it entirely and push images to production from wherever they are built.
This could be as simple as `docker save`/`scp`/`docker load`, or as fancy as running an ephemeral registry to get layer caching like you have with `docker push`/`docker pull`[1].
[1]: https://stackoverflow.com/a/79758446/3625
I guess the best way would be to have a self-hosted pull-through registry with a cache. This way you'd have all required images ready even when dockerhub is offline.
Unfortunately that does not help in an outage because you cannot fill the cache now.
In the case where you still have an image locally, trying to build will fail with an error complaining about not being able to load metadata for the image because a HEAD request failed. So, the real question is, why isn't there a way to disable the HEAD request for loading metadata for images? Perhaps there's a way and I don't know it.
Sure? --pull=missing should be the default.
While I haven’t tried --pull=missing, I have tried --pull=never, which I assume is a stricter version and it was still attempting the HEAD request.
Yeah, this is the actual error that I'm running into. Metadata pages are returning 401 and bailing out of the build.
You might still have it on your dev box or build box
Per sibling comment, public.ecr.aws/docker/library/.... works even better
This saved me. I was able to push image from one of my nodes. Thank you.
This is the way tho this can lead to fun moments like I was just setting up a new cluster and couldn't figure out why I was having problems pulling images when the other clusters were pulling just fine.
Took me a while to think of checking the docker hub status page.
> Is there a mirror registry with all the same basic images?
https://gallery.ecr.aws/
It's a bit stupid that I can't restart (on Coolify) my container, because pulling the image fails, even though I am already running it, so I do have the image, I just need to restart the Node.js process...
Nevermind, I used the terminal, docker ps to find the container and docker restart <container_id>, without going through Coolify.
What's the easiest way to cache registries like docker, pypi, and npm these days?
The images I use the most, we pull and push to our own internal registry, so we have full control.
There are still some we pull from Docker Hub, especially in the build process of our own images.
To work around that, on AWS, you can prefix the image with public.ecr.aws/docker/library/ for example public.ecr.aws/docker/library/python:3.12 and it will pull from AWS's mirror of Docker Hub.
> To work around that, on AWS, you can prefix the image with public.ecr.aws/docker/library/
I believe anyone can pull from the public ecr, not just clients in AWS
Someone mentioned Artifactory; but its honestly not needed. I would very highly recommend an architecture where you build everything into a docker image and push it to an internal container registry (like ecr; all public clouds have one) for all production deployments. This way, outages only affect your build/deploy pipeline.
You pull the images you want to use, preferably with some automated process, then push them to your own repo. And anyways use your own repo when pulling for dev/production. It saves you from images disappearing as well.
What do you like using for your own repo? Artifactory? Something else?
There is Sonatype Nexus. A bit annoying to administer (automated cleanup works every time, 60% of the time), but supports most package formats (Maven, npm, NuGet and so on) alongside offering Docker registries, both hosted and proxy ones. Also can be deployed as a container itself.
Note, artifactory SaaS had downtime today as well.
I have experience with ECR. If you’re in the AWS ecosystem it does the job.
Another reply had some good insight: https://news.ycombinator.com/item?id=45368092
All I really need is for Debian to have their own OCI image registry I can pull from. :)
Not Debian itself, but Red Hat's registry has them: https://quay.io/organization/lib
I was hoping google cloud artifact registry pull-thru caching would help. Alas, it does not.
I can see an image tag available in the cache in my project on cloud.google.com, but after attempting to pull from the cache (and failing) the image is deleted from GAR :(
I think it was likely caused by the cache trying to compare the tag with Docker Hub: https://docs.docker.com/docker-hub/image-library/mirror/#wha...
> "When a pull is attempted with a tag, the Registry checks the remote to ensure if it has the latest version of the requested content. Otherwise, it fetches and caches the latest content."
So if the authentication service is down, it might also affect the caching service.
Even cloud vendors can’t get distributed systems design right.
I’m able to pull by the digest, even images that are now missing a tag.
In our ci setting up the docker buildx driver to use the artifact registry pull through cache involves (apparently) an auth transaction to dockerhub which fails out
So that's why. This gave me the kick I needed to finally switch over the remaining builds to the pull-through cache.
Yup, my Coolify deployments were failing and I didn't know why : https://softuts.com/docker-hub-is-down/
Also, isn't it weird that it takes so long to fix given the magnitude of the issue? Already down for 3 hours.
Therefore keep a local registry mirror. You will get it from local cache all the time.
This is one of the reasons I don't want to use docker on production machines and have started to use systemd again!!
Hard to see if this is /s or not. Nobody is forcing you to run images straight from dockerhub lol. Every host keeps the images already on it. Running a in-house registry is also a good idea.
At a reasonably modest scale - running an in-house registry is a polite thing to do for the rest of the internet.
Development environment won't boot. Guess I'll go home early.
Somewhat unrelated, but GitLab put out a blog post earlier this year warning users about Docker Hub's rate limiting: https://about.gitlab.com/blog/prepare-now-docker-hub-rate-li...
We chose to move to GitLab's container registry for all the images we use. It's pretty easy to do and I'm glad we did. We used to only use it for our own builds.
The package registry is also nice. I only wish they would get out of the "experimental" status for apt mirror support.
Is there a good alternative for DockerHub these days? Besides azure CR
We use a https://container-registry.com, which is a clean version of the open source Harbor registry software (https://goharbor.io/) from one of the maintainers. It works well and reliable for years now and has no vendor-lock-in thanks to Harbor.
Basically all my Docker images were being built from Github repos anyways, so I just switched to Github's container registry.
GHCR authentication is just broken. They still require the deprecated personal access tokens.
I was publishing public containers on Docker Hub, and I'm publishing public containers on GHCR.
Quay.io is nice (but you have to memorize the spelling of its name)
Or start a pronunciation revolution and say "kway". It's all made up anyway ;-)
It _is_ pronounced "kway", and it _is_ a real word: https://www.merriam-webster.com/dictionary/quay !
It's pronounced keɪ (from your link - The spelling quay, first appearing in the sixteenth century, follows modern French. As noted by the Oxford English Dictionary, third edition, the expected outcome of Middle English keye would be /keɪ/ in Modern English). Or key (with modern spelling).
We actually originally pronounced it as "kway" (the American pronunciation we had heard) but then had a saying we'd tell customers (when asked) of "pronounce it however you please, so long as you're happy using it!" :)
Source: I co-founded Quay.io
A tongue twister we accidentally invented: "quick Quay queue counter" :)
So far, spelling has been our worst issue with Quay!
Pronounced "key". The main ferry dock in Sydney is called Circular Quay.
The third pronunciation in the link is “kway”
I know quay is a real word - it's not normally pronounced like "kway" but like "key". But only because enough people agree on that - that's what I mean by made up. The rules are just a majority agreement for both meaning and pronunciation.
My french speaking partner recently informed me the quay (pronounced key) meant something like ‘dock’ when we were discussing the Florida Keys, and suddenly everything fell into place!
Duplicate https://news.ycombinator.com/item?id=45366942
have same problem, visiting https://hub.docker.com/_/node return error
explains why my watchtower container was exploding
same
Was already struggling to do any work today and now my builds aren't working.
https://xkcd.com/303/
I had some images in cache, but not all of them, and pull is failing
for example, i have redis:7.2-alpine in cache, but not golang:1.24.5-alpine
I needed the golang image to start my dev-backend
so i replaced FROM golang:1.24.5-alpine with FROM redis:7.2-alpine, and manually installed golang with apk in the redis container :)
You changed your base image and docker build process for a temporary outage? Or do you mean that this in general will be better, as you avoid one in-between image?
It's up now, can pull images
Also GCP K8S have an partial outage! was this vibe coded release... insane...