Traffic to my blog plummeted this year and you can never be entirely sure how it happened. But here are two culprits i identified.
1. Ai overview: my page impressions were high, my ranking was high, but click through took a dive. People read the generated text and move along without ever clicking.
2. You are now a spammer. Around August, traffic took a second plunge. In my logs, I noticed these weird queries in my search page. Basically people were searching for crypto and scammy websites on my blog. Odd, but not like they were finding anything. Turns out, their search query was displayed as an h1 on the page and crawled by google. I was basically displaying spam.
I don't have much control over ai overview because disabling it means I don't appear in search at all. But for the spam, I could do something. I added a robot noindex on the search page. A week later, both impressions and clicks recovered.
I think the question is “how are the behavior of random spammers on your search page getting picked up by the crawler”? The assumption with cache is that searches of one user were being cached so that the crawler saw them. Other alternatives I can imagine are that your search page is powered by google, so it gets the search terms and indexes the results, or that you show popular queries somewhere. But you have to admit that the crawler seeing user generated search terms points to some deeper issue.
If I'm reading correctly, it's not that your search results would be crawled, it's that if you created a link to www.theirwebsite.com/search/?q=yourspamlinkhere.com or otherwise submitted that link to google for crawling, then the google crawler makes the same search and sees the spam link prominently displayed.
Not enough. According to this article (https://www.dr.dk/nyheder/penge/pludselig-dukkede-nyhed-op-d... you probably need to translate) its enough to link to an authorative site that accepts a query parameter. Googles AI picks up the query parameter as a fact. The artile is about a danish compay probably circumventing sanctions and how russian actors manipulate that fact and turn it around via Google AI
I posted some details in the main thread but I think you might need to check the change in methodology of counting impressions and clicks Google did around September this year.
They say the data before and after is not comparable anymore as they are not counting certain events below a threshold anymore.
You might need to have your own analytics to understand your traffic from now own.
This affected only reporting of placement and impressions; basically you don’t get counts for placements below the first 10 or 20 results (can’t remember which). It did not affect clicks which are measured directly regardless of how deep in the SERP they happen.
They are spamming other websites with links to my website like in your example. Google crawl those other websites, follow the spammy link to mine, and I get penalized for having a page with spam content.
The solution is to tell the crawler that my search page shouldn't be indexed. This can be done with the robots meta tags.
AI overviews likely aren't going anywhere. Techies complain about it, but from seeing average people use google - everyone just reads the overview. Hell I even saw a screenshot of an AI overview in a powerpoint this week...
Anyway, I'd really like to at least see google make the overview text itself clickable, and link to the source of the given sentence or paragraph. I think that a lot of people would instinctively click-through just to quickly spot check if it was made as easy as possible.
Citations got worse with AI overviews or AI mode, right, over the past couple months?
IIRC-
Used to take you to cited links, now launches a sidebar of supposed sources but which are un-numbered / disconnected from any specific claims from the bot.
So the spammer would link to my search page with their query param:
example.com/search?q=text+scam.com+text
On my website, I'll display "text scam.com text - search result" now google will see that link in my h1 tag and page title and say i am probably promoting scams.
Also, the reason this appeared suddenly is because I added support for unicode in search. Before that, the page would fail if you added unicode. So the moment i fixed it, I allowed spammers to have their links displayed on my page.
Interesting - surely you'd have to trick Google into visiting the /search? url in order to get it indexed? I wonder if them listing all these URLs somewhere are requesting that page be crawled is enough.
Since these are very low quality results surely one of Google's 10000 engineers can tweak this away.
> surely you'd have to trick Google into visiting the /search? url in order to get it indexed
That's trivially easy. Imagine a spammer creating some random page which links to your website with that made up query parameter. Once Google indexes their page and sees the link to your page, Google's search console complains to you as the victim that this page doesn't exist. You as in the victim have no insight into where Google even found that non-existent path.
> Since these are very low quality results surely one of Google's 10000 engineers can tweak this away.
You're assuming there's still people at Google who are tasked with improving actual search results and not just the AI overview at the top. I have my doubts Google still has such people.
I messed around with our website trying url encoded hyperlinks etc but it was all escaped pretty well. I bet there's a lot of tricks out there for those with time on their hands.
Why anyone would bother creating content when Google AI summary is effectively going to steal it to intercept your click is beyond me. So the whole issue will solve it's self when google has nothing to index except endless regurgitated slop and everyone finally logs off and goes outside.
i imagine the search page echoed the search query. Then, a SEO bot automated search(s) on the site with crypto and spam keywords, which is echo'ed in the search results - said bot may have a site/page full of links to these search results to create fake pages for those keywords for SEO purposes (essentially, an exploit).
Google got smart and found out such exploits, and penalized sites that do this.
> my page impressions were high, my ranking was high, but click through took a dive. People read the generated text and move along without ever clicking.
This been our experience with out content-driven marketing pages in 2025. SERP results constant, but clicks down 90%.
This not good for our marketing efforts, and terrible for ad-supported public websites, but I also don't understand how Google is not terribly impacted by the zero-click Internet. If content clicks are down 90%, aren't ad clicks down by a similar number?
Whether or not this specific author’s blog was de-indexed or de-prioritized, the issue this surfaces is real and genuine.
The real issue at hand here is that it’s difficult to impossible to discover why, or raise an effective appeal, when one runs afoul of Google, or suspects they have.
I shudder to use this word as I do think in some contexts it’s being overused, I think it’s the best word to use here though: the issue is really that Google is a Gatekeeper.
As the search engine with the largest global market share, whether or not Google has a commercial relationship with a site is irrelevant. Google has decided to let their product become a Utility. As a Utility, Google has a responsibility to provide effective tools and effective support for situations like this. Yes it will absolutely add cost for Google. It’s a cost of doing business as a Gatekeeper, as a Utility.
My second shudder in this comment - regulation is not always the answer. Maybe even it’s rarely the answer. But I do think when it comes to enterprises that have products that intentionally or unintentionally become Gatekeepers and/or Utilities, there should be a regulated mandate that they provide an acceptable level of support and access to the marketplaces they serve. The absence of that is what enables and causes this to perpetuate, and it will continue to do so until an entity with leverage over them can put them in check.
The situation reads more like a monopoly issue rather than a gatekeeper issue. Because google owns the indexer and the search tool most used, they're really only gate keeping their own sandbox.
It's entirely possible to have utility-importance non-monopoly gatekeepers, which is part of the legal issue.
The US regulates monopolies.
The US regulates utilities, defined by ~1910 industries.
It doesn't generally regulate non-monopoly companies that are gatekeepers.
Hence, Apple/Google/Facebook et al. have been able to avoid regulation by avoiding being classed as monopolies.
Imho, the EU is taking the right approach: also classify sub-monopoly entities with large market shares, and apply regulatory requirements on them.
I'd expect the US to use a lighter touch, and I'm fine with that, but regulations need to more than 'no touch'. It'd also be nice if they were bucketed and scaled (e.g. minimal requirements for 10-20%, more for 21-50%, max for 50%+).
Sure, we agree there though I'd add that while the US regulates monopolies we don't always enforce that, we also allow state-sponsored monopolies for many regional utilities.
With Google and SEO I see it more in the monopoly camp though. The existence of other big tech companies doesn't break the monopoly Google has by owning search, ads, analytics, et al under the same umbrella.
I’m really hoping the pendulum swings back to sanity in the US rather than becoming a Russia-like mafia business state.
It’s possible the only hope is a painful one: a major market crash caused by greed and excessive consolidation, the kind of crash that would trigger a 21st century new deal.
If they considered having some ethical responsibility they would at least tame the bidding war that turned a well paid ads for an existing, unrelated business show before the legitimate link, or limit it so that the search result to show the legitimate link on the first page.
For certain popular sites, it doesn't. Those businesses got to pay the shelf tax if they want their published piece to ever be - not just seen, but reasonably - found when looking specifically for it.
About six months ago Ahrefs recommended I remove some Unicode from the pathing on a personal project. Easy enough. Change the routing, set up permanent redirects for the old paths to the new paths.
I used to work for an SEO firm, I have a decent idea of best practices for this sort of thing.
BAM, I went from thousands of indexed pages to about 100
It's been six months and never recovered. If I were a business I would be absolutely furious. As it stands this is a tool I largely built for myself so I'm not too bothered but I don't know what's going on with Google being so fickle.
It’s probably also reflective of the fact that google are throwing all their new resources at AI, as soon as you’ve hit cache invalidation you’re gone, and anything new that’s crawled is probably ranked differently in the post llm world.
exactly my experience, suddenly thousands of non indexed pages, never figured out why, had to disband the business as it was content website selling ads.
What I find strange about Google, is that there's a lot of illegal advertising on Google maps - things like accomodation and liquor sellers that don't have permits.
However, if they do it for the statutory term, they can then successfully apply for existing-use rights.
Yet I've seen expert witnesses bring up Google pins on Maps during tribunal over planning permits and the tribunal sort of acts as if it's all legit.
I've even seen the tribunals report publish screenshots from Google maps as part of their judgement.
I was a victim of this when I moved into my house. Being unfamiliar with the area, I googled for a locksmith near me. It returned a result in a shopping center just about a mile away from me. I'd driven past that center before, it seemed entirely plausible that there was a locksmith in there.
I called the locksmith and they came, but in an unmarked van, spent over an hour to change 2 locks, damaged both locks, and then tried to charge me like $600 because the locks were high security. It's actually a deal for me, y'know, these locks go for much more usually. I just paid them and immediately contacted my credit card company to dispute the charge.
I called their office to complain and the lady answering the phone hung up on me multiple times. I drove to where the office was supposed to be, and there was no such office. I reported this to google maps and it did get removed very quickly, but this seems like something that should be checked or at least tied back to an actual person in some way for accountability.
Then I went to the hardware store and re-changed all the locks myself.
It definitely sounds like a hard problem. I'm not familiar with the current process, but based on what I found when I looked it up, it seems like there is a verification step already in place, but some of the methods of verification are tenuous. The method that seems the most secure to me is delivering a pin to the physical location that's being registered, but I feel like everything is exploitable.
Locksmiths and plumbers is especially one of those things that they've figured out how to game the system to get an extra expensive service that they contract with instead of a local company that is less expensive and doesn't have the middleman.
Reminds me of Trap streets or Trap towns that cartographers would use to watermark their maps and prove plagiarism. The trouble is reality would sometimes change to match the map.
Is it treated differently from other kinds of advertising? A lot of planning and permitting has a bit of a 'if it's known about and no-one's been complaining it's OK' kind of principle to it.
Google search results have gone shit. I am facing some deindexing issues where Google is citing a content duplicate and picking a canonical URL itself, despite no similar content.
Just the open is similar, but the intent is totally different, and so is the focus keyword.
Not facing this issue in Bing and other search engines.
I've also noticed Google having indexing issues over the past ~year:
Some popular models on Hugging Face never appear in the results, but the sub-pages (discussion, files, quants, etc.) do.
Some Reddit pages show up only in their auto-translated form, and in a language Google has no reason to think I speak. (Maybe there's some deduplication to keep machine translations out of the results, but it's misfiring and discarding the original instead?)
Reddit auto translation is horrible. It’s an extremely frustrating feeling, starting to read something in your language believing it’s local, until you reach a weird phrase and realise it’s translated English.
It’s also clearly confusing users, as you get replies in a random language, obviously made by people who read an auto translation and thought they were continuing the conversation in their native language.
I will google for something in French when I don't find the results I want in English. Sometimes google will return links to English threads (that I've already seen and decided were worthless!) auto-tranalated to French. As if that were any help at all..
The issues with auto-translated Reddit pages unfortunately also happens with Kagi. I am not sure if this is just because Kagi uses Google's search index or if Reddit publishes the translated title as metadata.
I think at least for Google there are some browser extensions that can remove these results.
The Reddit issue is also something that really annoys me and i wish kagi would find some way to counter it. Whenever I search for administrational things I do so in one of three languages, German, French or English depending on which context this issue arises in. And I would really prefer to only get answers that are relevant to that country. It's simply not useful for me to find answers about social security issues in the US when I'm searching for them in French.
Check that you're not routing unnamed SNI requests to your web content. If someone sets up a reverse proxy with a different domain name, google will index both domains and freak out when it sees duplicate content. Also make sure you're setting canonical tags properly.
Edit: I'd also consider using full URLs in links rather than relative paths.
Canonical Tags are done perfectly. Never changed them, and the blog is quite old too. I found a pattern where Google considers a page a duplicate because of the URL structure.
For example:
So, for a topic, if I have two of the above pages, Google will pick one of them canonically despite different keyword focus and intent. And the worst part is that it picks the worst page canonical, i.e., the tag page over blog or blog page over service.
B.c they shifted their internal KPI in 2018 roughly, to keeping users on Google and not tuning towards users finding what they are looking for ie. Clicking off google.
This is what has caused the degradation of search quality since then.
Bearblog.dev keeps subdomains out of search indexes until it approves them, as a measure against hosting the sort of things that would get the whole system de-indexed.
My guess is that they are more successful at suppressing subdomains than at getting them indexed. After all, they are not in control of what search engines do, they can only send signals.
For reference, I have a simple community event site on bearblog.dev which has been up for months and is not in any search index.
I encountered the same problem. I also use the Bear theme, specifically Hugo Bear. Recently, my blog was unindexed by Bing. Using `site:`, there are no links at all. My blog has been running normally for 17 years without any issues before.
Entirely possible the rss failed validation triggered some spam flag that isn't documented, because documenting anti-spam rules lets spammers break the rules.
The amount of spam has increased enormously and I have no doubt there are a number of such anti-spam flags and a number of false positive casualties along the way.
If failing to validate a page because it is pointing to an RSS feed triggers a spam flag and de-indexes all of the rest of the pages, that seems important to fix. By losing legit content because of such an error they are lowering the legit:spam ratio thus causing more harm than a spam page being indexed. It might not appear so bad for one instance, but it is indicative of a larger problem.
In the past I've heard that TripAdvisor has 60% market share for local reviews in the UK. Did Google Maps really climb that quickly? Are Instagram and TikTok not shaping tastes in London too? I feel like she might be assigning too much power to it just because that's what she used.
That's not to say I don't have gripes with how Google Maps works, but I just don't know why the other factors were not considered.
I don’t think I’ve met anyone in the UK who routinely checked tripadvisor for anything!
I just checked a few local restaurants to me in London that opened in the last few years, and the ratio of reviews is about 16:1 for google maps. It looks like stuff that’s been around longer has a much better ratio towards trip advisor though.
Almost certainly Instagram/tiktok are though. I know a few places which have been ruined by becoming TikTok tourist hotspots.
Not in the UK, but from Romania, I last checked Tripadvisor back in 2012, and that was for a holiday stay in the Greek islands. Google Maps has eaten the lunch of almost all of the entrants in this space, and I say that having worked for a local/Romanian "Google places"-type of company, back in 2010-2012 (after which Google Places came in, ~~stole~~ scrapped some of our data and some of our direct competitor's data and put us both out of that business).
Google search also favors large, well-known sites over newcomers. For sites that have a lot of competition, this is a real problem and leads to asymmetry and a chicken-and-egg problem. You are small/new, but you can't really be found, which means you can't grow enough to be found. At the same time, you are also disadvantaged because Google displays your already large competitors without any problems!
Had the same issue - we have a massive register of regulated services and Google was a help for people finding those names easily.
But in August suddenly "Page is not indexed: Crawled – currently not indexed" shot up massively. We've tried all sorts to get them back into the index but with no help. It would be helpful if Google explained why they aren't indexed or have been removed. As with the blogpost every other search engine is fine.
Breaking News: Google de-indexes random sites all of the time and there is often no obvious reason why. They also penalize sites in a way where pages are indexed but so deep-down that no one will ever find them. Again, there is often no obvious reason.
Do you have any resources here? The /r/seo subreddit seems vers superficial coming from an web agency background so its hard to find legit cases versus obvious oversights. Often people make a post describing a legit sounding issue on there just to let it shine through that they are essentially doing seo spam.
It's something you'll experience if you publish many sites over time.
Can't point to any definitive sources, many of the reputable search related blogs are now just Google shills.
Or if you search for content which you know exists on the web and it suddenly takes an unusual amount of coaxing (e.g. half a sentence in quotes, if you remember it correctly word for word) before it brings up the page you're looking for
Like, isn't this a well-known thing that happens constantly no matter if you're a user or run any websites? Relying on search engine ranking algorithms is russian roulette for businesses sadly, at least unless you outbid the competition to show your own page as an advertisement when someone searches your business' name
I have a page that ranks well worldwide, but is completely missing in Canada. Not just poorly ranked, gone. It shows up #1 for keyword in the US, but won't show up with precise unique quotes in Canada.
I have the same issue with DollarDeploy and Bing (and consequently with DuckDuckGo which uses bing)
Primary domain cannot be found via search - Bing knows about brand, LinkedIn, YouTube channel and but refuses to show search results about primary domain.
Bing search console does not give any clue, force reindexing does not help. Google search works fine.
Google is blackboxy about this and I understand why. SEO is an arms race and there's no advantage to them advertising what they use as signals of "this is a good guy". My blog (on Mediawiki) was deranked to oblivion. Exactly zero of my pages would index on Google. Some of it is that my most read content is about pregnancy and IVF and those are sensitive subjects that google requires some authorship credibility on. That's fair.
But there were other posts that I thought were just normal blog posts of the form that you'd expect to be all right. But none of the search engines wanted anything to do with me. I talked to a friend[0] who told me it was probably something to do with the way MediaWiki was generating certain pages and so on, and I did all the things he recommended:
* edit the sitemap
* switch from the default site.tld/index.php/Page to site.tld/fixed-slug/Page
* put in json+ld info on the page
* put in meta tags
The symptoms were exactly as described here. All pages crawled, zero indexed. The wiki is open to anonymous users, but there's no spam on it (I once had some for an hour before I installed RequestAccount). Finally, my buddy told me that maybe I just need to dump this CMS and use something else. I wondered if perhaps they need you to spend on their ads platform to get it to work so I ran some ads too as an experiment. Some $300 or so. Didn't change a thing.
I really wanted things to be wiki-like so I figured I'd just write and no one would find anything and that's life. But one day I was bored enough that I wrote a wiki bot that reads each recently published page and adds a meta description tag to it.
Now, to be clear, Google does delay reinstatement so that it's not obvious what 'solved' the problem (part of the arms race), but a couple of days later I was back in Google and now I get a small but steady stream of visits from there (I use self-hosted Plausible in cookie-free mode so it's just the Referer [sic] header).
Overall, I get why they're what they are. And maybe there's a bunch of spammy Mediawiki sites out there or something. But I was surprised that a completely legitimate blog would be deranked so aggressively unless a bunch of SEO measures were taken. Fascinating stuff the modern world is.
I suspect it has to do with the Mediawiki because the top-level of the domain was a static site and indexed right away!
From the post, while it is hard to completely rule out the possibility that author did something wrong, they likely did everything they could to remove the suspicion. I assume they consulted all documentation or other resources.
Someone else's fault? It is unlikely, since there isn't (obviously) another party involved here.
Which leaves us to Google's fault.
Also, I mean, if a user can't figure out what's wrong, the blame should just go to the vendor by default for poor user experience and documentation.
Well they are calling out for help and seeing if others have had the same problem. And you can see from the responses that there are people having the same. So we are either all messing up despite not doing anything different. So Google has messed up something and we are all suffering because of it.
You're absolutely right, we should give google the opportunity to defend itself. What's the phone number google provides so their victims can speak directly with an informed googler to discuss it?
What's that? Google doesn't publish a phone number for their victims to do that? They just victimize and hide?
Okay then google, here's your chance: please reply to this-here post of mine, with a plausible explanation that convincingly exonerates you in light of the evidence against you.
I'll check back in a few hours to see whether google has done so. Until they do, we can continue blaming them.
Without going into details. The company I work for has potentially millions of pages indexed. Despite new content being published everyday, since around the same October dates we are seeing a decrease in the number of indexed pages.
We have a consultant for the topic but I am not sure how much of that conversation I could share publicly so I will refrain myself of doing so.
But I think I can say that it is not only about data structure or quality. The changes in methodology applied by Google in September might be playing a stronger role than what people initially thought
What "changes in methodology applied by Google in September" are you referring to? There surely is a public announcement that can be shared? Most curious to hear as a shop I built is experiencing massive issues since august / september 2025
Gone through everything I can find, but nothing has made a difference for months now. Would love to hear any thoughts people have that aren't the usual checklist items.
Good news is I still get some visitors through Kagi, DDG, and Bing.
I noticed google not being able to find smaller blogs a few years ago. The sort of blogs I used to like and read a lot - small irregular blog of an expert in something like cryptography, sociology etc kind of disappeared from the search. Then they disappeared for real.
Even when I knew the exact name of article I was looking for google was unable to find it. And yes it still existed,
Marginalia search is good for this. But generally it’s becoming a real problem due to Google’s dominance. Blog authors ought to look into web rings and crosslinks again to help each other out with discovery.
I have noticed the same thing. And the blogs are still there, I checked, and marginalia returns them as top results when I search the relevant keywords. Google just really doesn't care.
Probably an intern (oh its 2025, maybe LLM?) messed up some spaghetti part and the async job for reindexing your site is failing since then and the on-call is busy taking mojito/the alert is silenced :)
Parts of Google are all Blackbox-y. Never know when computer says no. And if they had usable ways to contact a human they’d just tell you they don’t know either
A weird thing: on the hacker news page, in firefox mobile, all the visited links are grey, but the link to this blog post won't turn grey even when visited.
I ran into the same thing! My site still isnt indexed and I would REALLY like to not change the URL (its a shop and the url is printed on stuff) - redirects are my last resort.
But basically what happened: In august 2025 we finished the first working version of our shop. I wanted to accelerate indexing after some weeks because only ~50 of our pages were indexed and submitted the sitemap and everything got de-indexed within days. I thought for the longest time that its content quality because we sell niche trading cards and the descriptions are all one liners i made in Excel. ("This is $cardname from $set for your collection or deck!"). And because its single trading cards we have 7000+ products that are very similiar. (We did do all product images ourselves I thought google would like this but alas).
But later we added binders, whole sets and took a lot of care with their product data. The frontpage also got a massive overhaul - no shot. Not one page in index. We still get traffic from marketplaces and our older non-shop site. The shop itself lives on a subdomain (shop.myoldsite.com). The normal site also has a sitemap but that one was submitted 2022. I later rewrote how my sitemaps were generated and deleted the old ones in search console hoping this would help. It did not. (The old sitemap was generated by the shop system and was very large. Some forums mentioned that its better to create a chunked sitemap so I made a script that creates lists with 1000 products at a time as well as an index for them.)
Later observations are:
- Both sitemaps i deleted in GSC are still getting crawled and are STILL THERE. You cant see them in the overview but if you have the old links they still appear as normal.
- We eventually started submitting product data to google merchant center as well. It works 100% fine and our products are getting found and bought. The clicks still even show up in search console!!!! So I have a shop with 0 indexed pages in GSC that gets clicks every day. WTHeck?
So like... I dont even know anymore. Maybe we also have to restart like the person in the blog did and move the shop to a new domain and NEVER give google a sitemap. If I really go that route I will probably delete the cronjob that creates the sitemap in case google finds it by itself. But also like what the heck? I have worked in a web agency for 5 years and created a new webpage about every 2-8 weeks so i roughly launached about 50-70 webpages and shops and i NEVER saw that happen. Is it an ai hallucinating? Is it anti spam gone too far? Is it a straight up bug that they dont see? Who knows. I dont
(Good article though and I hope maybe some other people chime in and googlers browsing HN see this stuff).
At the risk of sounding crazy, I de-indexed my blog myself and rely on the mailing list (which is now approaching 5000 subscribers) + reprints in several other online media to get traffic to me. On a good day, I get 5000 hits, which is quite a lot by Czech language community standards.
Together with deleting my Facebook and Twitter accounts, this removed a lot of pressure to conform to their unclear policies. Especially around 2019-21, it was completely unclear how to escape their digital guillotine which seemed to hit various people randomly.
The deliverability problem still stands, though. You cannot be completely independent nowadays. Fortunately my domain is 9 years old.
Similar here. I've removed my websites from Google so people would find that other engines can return more complete results. People can find the content using any other engine (or LLM) or links in relevant places. Using anything but the monopolist should pay off until it's no longer a monopoly, or at least not one abused to gain power in other markets like by advertising their browser product on the homepage and not allowing anyone else to advertise there (much less for free)
It’s weird that the number one search engine in modern times is so finicky, perhaps just has become way over-engineered and over-rigged. Just index the web, at a certain point they went from search engine to arbiter of what people can find.
No more Google. No more websites. A distributed swarm of ephemeral signed posts. Shared, rebroadcasted.
When you find someone like James and you like them, you follow them. Your local algorithm then prioritizes finding new content from them. You bookmark their author signature.
Like RSS but better. Fully distributed.
Your own local interest graph, but also the power of your peers' interest graphs.
Content is ephemeral but can also live forever if any nodes keep rebroadcasting it. Every post has a unique ID, so you can search for it later in the swarm or some persistent index utility.
The Internet should have become fully p2p. That would have been magical. But platforms stole the limelight just as the majority of the rest of the world got online.
There is a technological feudalism being built in an ongoing manner, and you and I cannot do anything with it.
On the other side of the same coin there are already governments that will make you legally responsible of what your page's visitors write in comments. This renders any p2p internet legally unbearable (i.e. someone goes to your page, posts some bad word and you get jailed). So far they say "it's only for big companies" but it's a lie, just boiling frogs.
Depends what your times scale is for "being built". 50 years ago the centralization and government control were much stronger. 20 years ago probably less.
"cannot do anything" is relative. Google did something about it (at least for the first 10-15 years) but I am sure that was not their primary intention nor they were sure it will work. So "we have no clue what will work to reduce it" is more appropriate.
Now I think everybody has tools to build stuff easier (you could not make a television or a newspaper 50 years ago). That is just an observation of possibility, not a guarantee of success.
That literally already exists and nobody uses it. Gnutella. Jabber. Tor. IPFS. Mastodon. The Entire Fucking IPv4/IPv6 Address Space And Every Layer Built On Top Of It. (if you don't think the internet is p2p, you don't understand how it works)
You know what else we need? We need food to be free. We need medicine to be free, especially medicines which end epidemics and transmissible disease. We need education to be free. We need to end homelessness. We need to end pollution. We need to end nationalism, racism, xenophobia, sexism. We need freedom of speech, religion, print, association. We need to end war.
There are a lot of things we as a society need. But we can't even make "p2p internet" work, and we already have it. (And please just forget the word 'distributed', because it's misleading you into thinking it's a transformative idea, when it's not)
I don't think we need for food to be free, we just need it to be accessible to everyone.
Every family should be provided with a UBI that covers food and rent (not in the city). That is a more attainable goal and would solve the same problems (better, in fact).
(Not saying that UBI is a panacea, but I've lived in countries that have experimented with such and it seems the best of the alternatives)
I do not think free is attainable for everything due to thermodynamics constraints. Imagine "free energy". Everybody uses as much as they want, Earth heats up, things go bad (not far from what is actually happening!).
I would settle for simpler, attainable things. Equal opportunity for next generation. Quality education for everybody. Focus on merit not other characteristics. Personal freedom if it does not infringe on the freedom of people around you (ex: there can't be such thing as a "freedom to pollute").
In my view Internet as p2p worked pretty well to improve the previous status quo in many areas (not all). But there will never be a "stable solution", life and humans are dynamic. We do have some good and free stuff on the Internet today because of the groundwork laid out 30 years ago by the open source movement. Any plan started today will have noticeable effect in many years. So "we can't even make" sounds more of an excuse to not start, rather than an honest take.
We already have excesses of everything needed to provide for people's basic needs for no extra cost. We have excess food, excess land for housing, and we already pay for free emergency services, which actually costs us much more than if we fixed problems before they became emergencies. (And if there were a need for extra cost, we have massive wealth inequality that can be adjusted, not to mention things like massive military budgets and unfair medical pricing)
What does this mean? I suppose it can't literally mean equal opportunity, because people aren't equal, and their circumstances aren't equal; but then, what does this mean?
A clear definition is definitively hard to come by, but I will share what I see as rather large issues that impact society: minimal spending per children for education to allow a good service for most (this will imply that smart kids are selected and become productive as opposed to drop out because they had nobody to learn from); reasonable health availability for children such that they can develop rather than being sick; sufficient food for children to support the first two (can't learn or be healthy if you are hungry).
Currently I know in many countries multiple measures/rules/policies that affect these 3 things in ways that I find damaging for the society overall on the long term. Companies complain they don't have work forces, governments complain the natality is low but there are many issues with raising a child. Financial incentives to parents do not seem to work (for example: https://www.bbc.com/news/world-europe-47192612)
Centralization is simply more efficient. Redundancy is a cost and network effects make it even worse. You’d have to go the authoritarian route - effectively and/or outright ban Google and build alternatives, like Yandex or Baidu.
It's not even that. It's that "centralization is more efficient" is a big fat lie. If you look at the "centralized systems" they're... not actually technologically centralized, they're really just a monopolist that internally implements a distributed system.
How do you think Google or Cloudflare actually work? One big server in San Francisco that runs the whole world, or lots of servers distributed all over?
I know exactly how they work, but they have a single entry point, as a customer you don't really care that the system is global, and they also have a single control plane, etc. Decisions are efficient if they need to be taken only once. The underlying architecture is irrelevant for the end user.
Why do you think they're a monopoly in the first place? Obviously because they were more efficient than the competition and network effects took care of the rest. Having to make choices is a cost for the consumer - IOW consumers are lazy - so winners have staying power, too. It's a perfect storm for a winner-takes-all centralization since a good centralized service is the most efficient utility-wise ('I know I'm getting what I need') and decision-cost-wise ('I don't need to search for alternatives') for consumers until it switches to rent seeking, which is where the anti-monopoly laws should kick in.
> Decisions are efficient if they need to be taken only once.
In other words, open source decentralized systems are the most efficient because you don't have to reduplicate a competitor's effort when you can just use the same code.
> Obviously because they were more efficient than the competition and network effects took care of the rest.
In most cases it's just the network effect, and whether it was a proprietary or open system in any given case is no more than the historical accident of which one happened to gain traction first.
> Having to make choices is a cost for the consumer
If you want an email address you can choose between a few huge providers and a thousand smaller ones, but that doesn't seem to prevent anyone from using it.
> until it switches to rent seeking
If it wasn't an open system from the beginning then that was always the end state and there is no point in waiting for someone to lock the door before trying to remove yourself from the cage.
This is the great lie. Approximately zero end consumers care about code, the product they consume is the service, and if the marginal cost of switching the service provider is zero, it's enough to be 1% better to take 99% of the market.
> Approximately zero end consumers care about code
Most people don't care about reading it. They very much care about what it does.
Also, it's not "approximately zero" at all. It's millions or tens of millions of people out of billions, and when a small minority of people improve the code -- because they have the ability to -- it improves the code for all the rest too. Which is why they should have a preference for the ability to do it even if they're not going to be the one to exercise it themselves.
> if the marginal cost of switching the service provider is zero, it's enough to be 1% better to take 99% of the market.
Except that you'd then need to be 1% better across all dimensions for different people to not have different preferences, and everyone else is trying to carve out a share of the market too. Meanwhile if you were doing something that actually did cause 99% of people to prefer a service that does that then everybody else would start doing it.
There are two main things that cause monopolies. The first is a network effect, which is why those things all need to be open systems. The second is that one company gets a monopoly somewhere (often because of the first, sometimes through anti-competitive practices like dumping) and then leverages it in order to monopolize the supply chain before and after that thing, so that competing with them now requires not just competing with the original monopoly but also reproducing the rest of the supply chain which is now no longer available as independent commodities.
This is why we need antitrust laws, but also why we need to recognize that antitrust laws are never perfect and do everything possible to stamp out anything that starts to look like one of those markets through development of open systems and promoting consumer aversion to products that are inevitably going to ensnare them.
"People don't want X" as an observed behavior is a bunch of nonsense. People's preferences depend on their level of information. If they don't realize they're walking into a trap then they're going to step right into it. That isn't the same thing as "people prefer walking into a trap". They need to be educated about what a trap looks like so they don't keep ending up hanging upside down by their ankles as all the money gets shaken out of their pockets.
> This is not a problem you solve with code. This is a problem you solve with law.
When the DMCA was a bill, people were saying that the anti-circumvention provision was going to be used to monopolize playback devices. They were ignored, it was passed, and now it's being used to monopolize not just playback devices but also phones.
Here's the test for "can you rely on the government here": Have they repealed it yet? The answer is still no, so how can you expect them to do something about it when they're still actively making it worse?
Now try to imagine the world where the Free Software Foundation never existed, Berkeley never released the source code to BSD and Netscape was bought by Oracle instead of being forked into Firefox. As if the code doesn't matter.
Yes. It's a political problem and a very old one. That's why we also already have solutions for it, antitrust laws and other regulations to ensure competition and fairness in the market, to keep it free. Governments just have to keep funding and enabling these institutions.
So maybe it's the status code? Shouldn't that page return a 200 ok?
When I go to blog.james..., I first get a 301 moved permanently, and then journal.james... loads, but it returns a 304 not modified, even if i then reload the page.
Only when I fully sumbit the URL again in the URL-bar, it responds with a 200.
Maybe crawling also returns a 304, and Google won't index that?
Maybe prompt: "why would a 301 redirect lead to a 304 not modified instead of a 200 ok?", "would this 'break' Google's crawler?"
> When Google's crawler follows the 301 to the new URL and receives a 304, it gets no content body. The 304 response basically says "use what you cached"—but the crawler's cache might be empty or stale for that specific URL location, leaving Google with nothing to index.
You get a 304 because your browser tells the server what it has cached, and the server says "nothing changed, use that". In browsers you can bypass the cache by using Ctrl-F5, or in the developer tools you can usually disable caching while they're open. Doing so shows that the server is doing the right thing.
That's a different situation. The browser decides what to do depending on the situation and what was communicated about caching. Sometimes it sends a request to the server along with information about what it already has. Then it can get back a 304. Other times it already knows the cached data is fine, so it doesn't send a request to the server in the first place. The developer tools show this as a cached 200.
Has anyone noticed that the response for the blog page has a header: "x-robots-tag: noindex, nofollow"? What's the purpose of this header on a content page?
UPD: Sorry, never mind, I inspected a wrong response.
> Second, the issue couldn’t be the quality or the quantity of the content. I came across some other pretty barebones Bear blogs that don’t have much content, and looked them up on Google, and they showed up in the results just fine. An example:
Suggestion: Remember that many large companies are emergently shitty, with shitty processes, and individuals motivated to act in shitty ways.
When a company is so powerful, this might be a time to think about solidarity.
When you're feeling an unexplained injustice from them, sometimes saying "but you let X do it" could just throw X under the bus.
Whether because a fickle process or petty actor simply missed X before, or because now they have new reason to double down and also punish the others (to CYA consistency, or, if petty, to assert their power now that you've questioned it).
Traffic to my blog plummeted this year and you can never be entirely sure how it happened. But here are two culprits i identified.
1. Ai overview: my page impressions were high, my ranking was high, but click through took a dive. People read the generated text and move along without ever clicking.
2. You are now a spammer. Around August, traffic took a second plunge. In my logs, I noticed these weird queries in my search page. Basically people were searching for crypto and scammy websites on my blog. Odd, but not like they were finding anything. Turns out, their search query was displayed as an h1 on the page and crawled by google. I was basically displaying spam.
I don't have much control over ai overview because disabling it means I don't appear in search at all. But for the spam, I could do something. I added a robot noindex on the search page. A week later, both impressions and clicks recovered.
Edit: Adding write up I did a couple weeks ago https://idiallo.com/blog/how-i-became-a-spammer
Sounds like point 2 was a negative seo attack. It could be that your /?s page is being cached and getting picked up via crawlers.
You can avaoid this by no caching search pages and applying noindex via X-robots tag https://developers.google.com/search/docs/crawling-indexing/...
Cache has nothing to do with this
But yes just noindex search pages like they already said they did
I think the question is “how are the behavior of random spammers on your search page getting picked up by the crawler”? The assumption with cache is that searches of one user were being cached so that the crawler saw them. Other alternatives I can imagine are that your search page is powered by google, so it gets the search terms and indexes the results, or that you show popular queries somewhere. But you have to admit that the crawler seeing user generated search terms points to some deeper issue.
You just link to that page from a page that Google crawls. Cache isn't involved unless you call links caching
Not sure how search result pages can be crawled unless they are cached somewhere?
If I'm reading correctly, it's not that your search results would be crawled, it's that if you created a link to www.theirwebsite.com/search/?q=yourspamlinkhere.com or otherwise submitted that link to google for crawling, then the google crawler makes the same search and sees the spam link prominently displayed.
Yikes.
What could Google do to mitigate?
You noindex search pages or anything user generated, it's really that simple
Not enough. According to this article (https://www.dr.dk/nyheder/penge/pludselig-dukkede-nyhed-op-d... you probably need to translate) its enough to link to an authorative site that accepts a query parameter. Googles AI picks up the query parameter as a fact. The artile is about a danish compay probably circumventing sanctions and how russian actors manipulate that fact and turn it around via Google AI
In this case, all i had to do was let the crawler know not to index the search page. I used the robots noindex meta tag on the search page.
I don't know what you mean by cache but you aren't using it correctly...
I posted some details in the main thread but I think you might need to check the change in methodology of counting impressions and clicks Google did around September this year.
They say the data before and after is not comparable anymore as they are not counting certain events below a threshold anymore. You might need to have your own analytics to understand your traffic from now own.
This affected only reporting of placement and impressions; basically you don’t get counts for placements below the first 10 or 20 results (can’t remember which). It did not affect clicks which are measured directly regardless of how deep in the SERP they happen.
WRT AI overviews/summaries, was Google smart enough to set up for this long ago?
1) encourage SEO sites to proliferate and dominate search results, pushing useful content down on the page.
2) sell ad placement directly on the search results page, further pushing useful content down on the page
3) introduce AI summaries, making it unnecessary to even look for the useful content pushed down on the page.
Now, people only see the summaries and the paid-for ad placements. No need to ever leave the search page.
Question: If I do a search for say crypto on your blog, how does Google gets to index the resulting page?
I’m imagining something like “blog.example/?s=crypto” which only I should see, not Google.
Edit: Where they linking to your website from their own? (In that case the link with the bad search keywords can be crawled)
They are spamming other websites with links to my website like in your example. Google crawl those other websites, follow the spammy link to mine, and I get penalized for having a page with spam content.
The solution is to tell the crawler that my search page shouldn't be indexed. This can be done with the robots meta tags.
I see, thanks for helping me understand the issue (and also the solution)
Link.com?search=spam from external page
AI overviews likely aren't going anywhere. Techies complain about it, but from seeing average people use google - everyone just reads the overview. Hell I even saw a screenshot of an AI overview in a powerpoint this week...
Anyway, I'd really like to at least see google make the overview text itself clickable, and link to the source of the given sentence or paragraph. I think that a lot of people would instinctively click-through just to quickly spot check if it was made as easy as possible.
That is how duckduckgo has implemented it, I also find it to be a nicer middle ground.
Kagi too (and before DDG).
Citations got worse with AI overviews or AI mode, right, over the past couple months?
IIRC-
Used to take you to cited links, now launches a sidebar of supposed sources but which are un-numbered / disconnected from any specific claims from the bot.
Sorry but how did 2 work before you fixed it? You saved the queries people did and displayed them?
So the spammer would link to my search page with their query param:
On my website, I'll display "text scam.com text - search result" now google will see that link in my h1 tag and page title and say i am probably promoting scams.Also, the reason this appeared suddenly is because I added support for unicode in search. Before that, the page would fail if you added unicode. So the moment i fixed it, I allowed spammers to have their links displayed on my page.
Reminds me of a recent story on scammers using search queries to inject their scam phone numbers into the h1 header on legitimate sites [1]
[1] https://cyberinsider.com/threat-actors-inject-fake-support-n...
Interesting - surely you'd have to trick Google into visiting the /search? url in order to get it indexed? I wonder if them listing all these URLs somewhere are requesting that page be crawled is enough.
Since these are very low quality results surely one of Google's 10000 engineers can tweak this away.
> surely you'd have to trick Google into visiting the /search? url in order to get it indexed
That's trivially easy. Imagine a spammer creating some random page which links to your website with that made up query parameter. Once Google indexes their page and sees the link to your page, Google's search console complains to you as the victim that this page doesn't exist. You as in the victim have no insight into where Google even found that non-existent path.
> Since these are very low quality results surely one of Google's 10000 engineers can tweak this away.
You're assuming there's still people at Google who are tasked with improving actual search results and not just the AI overview at the top. I have my doubts Google still has such people.
I messed around with our website trying url encoded hyperlinks etc but it was all escaped pretty well. I bet there's a lot of tricks out there for those with time on their hands. Why anyone would bother creating content when Google AI summary is effectively going to steal it to intercept your click is beyond me. So the whole issue will solve it's self when google has nothing to index except endless regurgitated slop and everyone finally logs off and goes outside.
Great blog post. You typically think of people linking to your website as a good thing. This is a good counterexample.
What does Unicode have to do with links?
Lot of spam uses unicode, either for non-English languages or just to swap in lookalike characters to try and dodge keyword filters.
This has been a trick used by "reputation management" people for years.
i imagine the search page echoed the search query. Then, a SEO bot automated search(s) on the site with crypto and spam keywords, which is echo'ed in the search results - said bot may have a site/page full of links to these search results to create fake pages for those keywords for SEO purposes (essentially, an exploit).
Google got smart and found out such exploits, and penalized sites that do this.
> my page impressions were high, my ranking was high, but click through took a dive. People read the generated text and move along without ever clicking.
This been our experience with out content-driven marketing pages in 2025. SERP results constant, but clicks down 90%.
This not good for our marketing efforts, and terrible for ad-supported public websites, but I also don't understand how Google is not terribly impacted by the zero-click Internet. If content clicks are down 90%, aren't ad clicks down by a similar number?
They moved from clicks to pageviews which gives them cover until AI ads make up the difference.
Whether or not this specific author’s blog was de-indexed or de-prioritized, the issue this surfaces is real and genuine.
The real issue at hand here is that it’s difficult to impossible to discover why, or raise an effective appeal, when one runs afoul of Google, or suspects they have.
I shudder to use this word as I do think in some contexts it’s being overused, I think it’s the best word to use here though: the issue is really that Google is a Gatekeeper.
As the search engine with the largest global market share, whether or not Google has a commercial relationship with a site is irrelevant. Google has decided to let their product become a Utility. As a Utility, Google has a responsibility to provide effective tools and effective support for situations like this. Yes it will absolutely add cost for Google. It’s a cost of doing business as a Gatekeeper, as a Utility.
My second shudder in this comment - regulation is not always the answer. Maybe even it’s rarely the answer. But I do think when it comes to enterprises that have products that intentionally or unintentionally become Gatekeepers and/or Utilities, there should be a regulated mandate that they provide an acceptable level of support and access to the marketplaces they serve. The absence of that is what enables and causes this to perpetuate, and it will continue to do so until an entity with leverage over them can put them in check.
The situation reads more like a monopoly issue rather than a gatekeeper issue. Because google owns the indexer and the search tool most used, they're really only gate keeping their own sandbox.
It's entirely possible to have utility-importance non-monopoly gatekeepers, which is part of the legal issue.
The US regulates monopolies.
The US regulates utilities, defined by ~1910 industries.
It doesn't generally regulate non-monopoly companies that are gatekeepers.
Hence, Apple/Google/Facebook et al. have been able to avoid regulation by avoiding being classed as monopolies.
Imho, the EU is taking the right approach: also classify sub-monopoly entities with large market shares, and apply regulatory requirements on them.
I'd expect the US to use a lighter touch, and I'm fine with that, but regulations need to more than 'no touch'. It'd also be nice if they were bucketed and scaled (e.g. minimal requirements for 10-20%, more for 21-50%, max for 50%+).
Sure, we agree there though I'd add that while the US regulates monopolies we don't always enforce that, we also allow state-sponsored monopolies for many regional utilities.
With Google and SEO I see it more in the monopoly camp though. The existence of other big tech companies doesn't break the monopoly Google has by owning search, ads, analytics, et al under the same umbrella.
I’m really hoping the pendulum swings back to sanity in the US rather than becoming a Russia-like mafia business state.
It’s possible the only hope is a painful one: a major market crash caused by greed and excessive consolidation, the kind of crash that would trigger a 21st century new deal.
If they considered having some ethical responsibility they would at least tame the bidding war that turned a well paid ads for an existing, unrelated business show before the legitimate link, or limit it so that the search result to show the legitimate link on the first page.
For certain popular sites, it doesn't. Those businesses got to pay the shelf tax if they want their published piece to ever be - not just seen, but reasonably - found when looking specifically for it.
About six months ago Ahrefs recommended I remove some Unicode from the pathing on a personal project. Easy enough. Change the routing, set up permanent redirects for the old paths to the new paths.
I used to work for an SEO firm, I have a decent idea of best practices for this sort of thing.
BAM, I went from thousands of indexed pages to about 100
See screenshot:
https://x.com/donatj/status/1937600287826460852
It's been six months and never recovered. If I were a business I would be absolutely furious. As it stands this is a tool I largely built for myself so I'm not too bothered but I don't know what's going on with Google being so fickle.
Updated screenshots;
https://x.com/donatj/status/1999451442739019895
It’s probably also reflective of the fact that google are throwing all their new resources at AI, as soon as you’ve hit cache invalidation you’re gone, and anything new that’s crawled is probably ranked differently in the post llm world.
They already scouted all content they needed. Sites are now competition for their AI systems
Lesson: if its working, dont fix it
exactly my experience, suddenly thousands of non indexed pages, never figured out why, had to disband the business as it was content website selling ads.
What I find strange about Google, is that there's a lot of illegal advertising on Google maps - things like accomodation and liquor sellers that don't have permits.
However, if they do it for the statutory term, they can then successfully apply for existing-use rights.
Yet I've seen expert witnesses bring up Google pins on Maps during tribunal over planning permits and the tribunal sort of acts as if it's all legit.
I've even seen the tribunals report publish screenshots from Google maps as part of their judgement.
I was a victim of this when I moved into my house. Being unfamiliar with the area, I googled for a locksmith near me. It returned a result in a shopping center just about a mile away from me. I'd driven past that center before, it seemed entirely plausible that there was a locksmith in there.
I called the locksmith and they came, but in an unmarked van, spent over an hour to change 2 locks, damaged both locks, and then tried to charge me like $600 because the locks were high security. It's actually a deal for me, y'know, these locks go for much more usually. I just paid them and immediately contacted my credit card company to dispute the charge.
I called their office to complain and the lady answering the phone hung up on me multiple times. I drove to where the office was supposed to be, and there was no such office. I reported this to google maps and it did get removed very quickly, but this seems like something that should be checked or at least tied back to an actual person in some way for accountability.
Then I went to the hardware store and re-changed all the locks myself.
Just curious, if you were Google, how would you fix this? And take the question seriously, because it's harder than it sounds.
They are certainly trying. It's not good for them to have fake listings.
https://podcast.rainmakerreputation.com/2412354/episodes/169...
(just googled that, didn't listen, was looking for a much older podcast from I think Reply All from like 10yrs ago)
It definitely sounds like a hard problem. I'm not familiar with the current process, but based on what I found when I looked it up, it seems like there is a verification step already in place, but some of the methods of verification are tenuous. The method that seems the most secure to me is delivering a pin to the physical location that's being registered, but I feel like everything is exploitable.
They could send real mail to the address with an activate code in it.
Locksmiths and plumbers is especially one of those things that they've figured out how to game the system to get an extra expensive service that they contract with instead of a local company that is less expensive and doesn't have the middleman.
Reminds me of Trap streets or Trap towns that cartographers would use to watermark their maps and prove plagiarism. The trouble is reality would sometimes change to match the map.
Is it treated differently from other kinds of advertising? A lot of planning and permitting has a bit of a 'if it's known about and no-one's been complaining it's OK' kind of principle to it.
legal citogenesis?
Clan justice, google is the clan.
Reality is just tug of war and weight is all that matters at the limit
Google search results have gone shit. I am facing some deindexing issues where Google is citing a content duplicate and picking a canonical URL itself, despite no similar content.
Just the open is similar, but the intent is totally different, and so is the focus keyword.
Not facing this issue in Bing and other search engines.
Totally. Bing works like a charm for all my sites, whereas Google fails them all, and they couldn't be more diverse.
I've also noticed Google having indexing issues over the past ~year:
Some popular models on Hugging Face never appear in the results, but the sub-pages (discussion, files, quants, etc.) do.
Some Reddit pages show up only in their auto-translated form, and in a language Google has no reason to think I speak. (Maybe there's some deduplication to keep machine translations out of the results, but it's misfiring and discarding the original instead?)
Reddit auto translation is horrible. It’s an extremely frustrating feeling, starting to read something in your language believing it’s local, until you reach a weird phrase and realise it’s translated English.
It’s also clearly confusing users, as you get replies in a random language, obviously made by people who read an auto translation and thought they were continuing the conversation in their native language.
I will google for something in French when I don't find the results I want in English. Sometimes google will return links to English threads (that I've already seen and decided were worthless!) auto-tranalated to French. As if that were any help at all..
The issues with auto-translated Reddit pages unfortunately also happens with Kagi. I am not sure if this is just because Kagi uses Google's search index or if Reddit publishes the translated title as metadata.
I think at least for Google there are some browser extensions that can remove these results.
The Reddit issue is also something that really annoys me and i wish kagi would find some way to counter it. Whenever I search for administrational things I do so in one of three languages, German, French or English depending on which context this issue arises in. And I would really prefer to only get answers that are relevant to that country. It's simply not useful for me to find answers about social security issues in the US when I'm searching for them in French.
Check that you're not routing unnamed SNI requests to your web content. If someone sets up a reverse proxy with a different domain name, google will index both domains and freak out when it sees duplicate content. Also make sure you're setting canonical tags properly. Edit: I'd also consider using full URLs in links rather than relative paths.
Canonical Tags are done perfectly. Never changed them, and the blog is quite old too. I found a pattern where Google considers a page a duplicate because of the URL structure. For example:
www.xyz.com/blog/keyword-term/ www.xyz.com/services/keyword-term/ www.xyz.com/tag/keyword-term/
So, for a topic, if I have two of the above pages, Google will pick one of them canonically despite different keyword focus and intent. And the worst part is that it picks the worst page canonical, i.e., the tag page over blog or blog page over service.
Amazong, Google is the same. Fake products, fake results, scammers left and right.
Yeah, Google search results are almost useless. How could they have neglected their core competence so badly?
B.c they shifted their internal KPI in 2018 roughly, to keeping users on Google and not tuning towards users finding what they are looking for ie. Clicking off google.
This is what has caused the degradation of search quality since then.
Their core competency is ADs, not search.
Bearblog.dev keeps subdomains out of search indexes until it approves them, as a measure against hosting the sort of things that would get the whole system de-indexed.
My guess is that they are more successful at suppressing subdomains than at getting them indexed. After all, they are not in control of what search engines do, they can only send signals.
For reference, I have a simple community event site on bearblog.dev which has been up for months and is not in any search index.
I encountered the same problem. I also use the Bear theme, specifically Hugo Bear. Recently, my blog was unindexed by Bing. Using `site:`, there are no links at all. My blog has been running normally for 17 years without any issues before.
Don't take it personal. Google has lost control of its algo a long time ago already.
Entirely possible the rss failed validation triggered some spam flag that isn't documented, because documenting anti-spam rules lets spammers break the rules.
The amount of spam has increased enormously and I have no doubt there are a number of such anti-spam flags and a number of false positive casualties along the way.
If failing to validate a page because it is pointing to an RSS feed triggers a spam flag and de-indexes all of the rest of the pages, that seems important to fix. By losing legit content because of such an error they are lowering the legit:spam ratio thus causing more harm than a spam page being indexed. It might not appear so bad for one instance, but it is indicative of a larger problem.
I'll be honest, I read "Google de-indexed my Bear Blog" and was looking forward to discovering an interesting blog about bears.
You may find rather unexpected results if you look for blogs with an interest in bears.
Same. I still don’t know why the word “Bear” was used in the title.
I guess they use this blogging platform: https://bearblog.dev/
Makes sense. Hadn't heard of that platform before but looks really nice.
Coming from a quietfox, it is OK. It is important to preserve oneself ^^.
Sounds similar to https://news.ycombinator.com/item?id=46203343 in terms, that Google decides who survives and who does not in business
Also: https://news.ycombinator.com/item?id=40970987
https://gehrcke.de/2023/09/google-changes-recently-i-see-mor...
The wrong RSS thing may have just tipped the scales over to Google not caring.
In the past I've heard that TripAdvisor has 60% market share for local reviews in the UK. Did Google Maps really climb that quickly? Are Instagram and TikTok not shaping tastes in London too? I feel like she might be assigning too much power to it just because that's what she used.
That's not to say I don't have gripes with how Google Maps works, but I just don't know why the other factors were not considered.
I don’t think I’ve met anyone in the UK who routinely checked tripadvisor for anything!
I just checked a few local restaurants to me in London that opened in the last few years, and the ratio of reviews is about 16:1 for google maps. It looks like stuff that’s been around longer has a much better ratio towards trip advisor though.
Almost certainly Instagram/tiktok are though. I know a few places which have been ruined by becoming TikTok tourist hotspots.
'I don’t think I’ve met anyone in the UK who routinely checked tripadvisor for anything!'
Counterpoint: I have met people in the UK who's lives revolve around doing nothing but.
Not in the UK, but from Romania, I last checked Tripadvisor back in 2012, and that was for a holiday stay in the Greek islands. Google Maps has eaten the lunch of almost all of the entrants in this space, and I say that having worked for a local/Romanian "Google places"-type of company, back in 2010-2012 (after which Google Places came in, ~~stole~~ scrapped some of our data and some of our direct competitor's data and put us both out of that business).
Google search also favors large, well-known sites over newcomers. For sites that have a lot of competition, this is a real problem and leads to asymmetry and a chicken-and-egg problem. You are small/new, but you can't really be found, which means you can't grow enough to be found. At the same time, you are also disadvantaged because Google displays your already large competitors without any problems!
Had the same issue - we have a massive register of regulated services and Google was a help for people finding those names easily.
But in August suddenly "Page is not indexed: Crawled – currently not indexed" shot up massively. We've tried all sorts to get them back into the index but with no help. It would be helpful if Google explained why they aren't indexed or have been removed. As with the blogpost every other search engine is fine.
What url is your website if you dont mind sharing?
This submission title is wrong. The Title should be what the post title is:
"Google De-Indexed My Entire Bear Blog and I Don’t Know Why"
Bearblog.dev is not de-indexed from Google. I can pull up results fine.
Breaking News: Google de-indexes random sites all of the time and there is often no obvious reason why. They also penalize sites in a way where pages are indexed but so deep-down that no one will ever find them. Again, there is often no obvious reason.
Totally. They've completely lost the plot.
Do you have any resources here? The /r/seo subreddit seems vers superficial coming from an web agency background so its hard to find legit cases versus obvious oversights. Often people make a post describing a legit sounding issue on there just to let it shine through that they are essentially doing seo spam.
It's something you'll experience if you publish many sites over time. Can't point to any definitive sources, many of the reputable search related blogs are now just Google shills.
Or if you search for content which you know exists on the web and it suddenly takes an unusual amount of coaxing (e.g. half a sentence in quotes, if you remember it correctly word for word) before it brings up the page you're looking for
Like, isn't this a well-known thing that happens constantly no matter if you're a user or run any websites? Relying on search engine ranking algorithms is russian roulette for businesses sadly, at least unless you outbid the competition to show your own page as an advertisement when someone searches your business' name
I bet Google doesn't know why either...
Yeah. And they've now milked the whole web for AI and couldn't care less about the mess they made.
And it's not like they actually care enough to know why.
How does one debug issues like this?
I have a page that ranks well worldwide, but is completely missing in Canada. Not just poorly ranked, gone. It shows up #1 for keyword in the US, but won't show up with precise unique quotes in Canada.
Buy a .ca domain for seo results and point to existing content
I have the same issue with DollarDeploy and Bing (and consequently with DuckDuckGo which uses bing)
Primary domain cannot be found via search - Bing knows about brand, LinkedIn, YouTube channel and but refuses to show search results about primary domain.
Bing search console does not give any clue, force reindexing does not help. Google search works fine.
They probably don't know why it was de-indexed either, likely a bunch of unexplainable ML models flagged it.
In 2025, is it still prohibively expensive to run some community-supported crawler & search engine? Without Google censorship, ads, and ai.
Based on marginalia's work I would say no, but maybe you're thinking of a different scale than they work at.
Google is blackboxy about this and I understand why. SEO is an arms race and there's no advantage to them advertising what they use as signals of "this is a good guy". My blog (on Mediawiki) was deranked to oblivion. Exactly zero of my pages would index on Google. Some of it is that my most read content is about pregnancy and IVF and those are sensitive subjects that google requires some authorship credibility on. That's fair.
But there were other posts that I thought were just normal blog posts of the form that you'd expect to be all right. But none of the search engines wanted anything to do with me. I talked to a friend[0] who told me it was probably something to do with the way MediaWiki was generating certain pages and so on, and I did all the things he recommended:
* edit the sitemap
* switch from the default site.tld/index.php/Page to site.tld/fixed-slug/Page
* put in json+ld info on the page
* put in meta tags
The symptoms were exactly as described here. All pages crawled, zero indexed. The wiki is open to anonymous users, but there's no spam on it (I once had some for an hour before I installed RequestAccount). Finally, my buddy told me that maybe I just need to dump this CMS and use something else. I wondered if perhaps they need you to spend on their ads platform to get it to work so I ran some ads too as an experiment. Some $300 or so. Didn't change a thing.
I really wanted things to be wiki-like so I figured I'd just write and no one would find anything and that's life. But one day I was bored enough that I wrote a wiki bot that reads each recently published page and adds a meta description tag to it.
Now, to be clear, Google does delay reinstatement so that it's not obvious what 'solved' the problem (part of the arms race), but a couple of days later I was back in Google and now I get a small but steady stream of visits from there (I use self-hosted Plausible in cookie-free mode so it's just the Referer [sic] header).
Overall, I get why they're what they are. And maybe there's a bunch of spammy Mediawiki sites out there or something. But I was surprised that a completely legitimate blog would be deranked so aggressively unless a bunch of SEO measures were taken. Fascinating stuff the modern world is.
I suspect it has to do with the Mediawiki because the top-level of the domain was a static site and indexed right away!
0: https://news.ycombinator.com/user?id=jrhizor
The author doesn't know the cause but states "The whole affair is Google’s fault"?
There are three possibilities:
Author's fault, Google's fault, someone else's fault.
From the post, while it is hard to completely rule out the possibility that author did something wrong, they likely did everything they could to remove the suspicion. I assume they consulted all documentation or other resources.
Someone else's fault? It is unlikely, since there isn't (obviously) another party involved here.
Which leaves us to Google's fault.
Also, I mean, if a user can't figure out what's wrong, the blame should just go to the vendor by default for poor user experience and documentation.
Well they are calling out for help and seeing if others have had the same problem. And you can see from the responses that there are people having the same. So we are either all messing up despite not doing anything different. So Google has messed up something and we are all suffering because of it.
Surely whether Google indexes a page or not is Google's decision?
You're absolutely right, we should give google the opportunity to defend itself. What's the phone number google provides so their victims can speak directly with an informed googler to discuss it?
What's that? Google doesn't publish a phone number for their victims to do that? They just victimize and hide?
Okay then google, here's your chance: please reply to this-here post of mine, with a plausible explanation that convincingly exonerates you in light of the evidence against you.
I'll check back in a few hours to see whether google has done so. Until they do, we can continue blaming them.
Without going into details. The company I work for has potentially millions of pages indexed. Despite new content being published everyday, since around the same October dates we are seeing a decrease in the number of indexed pages.
We have a consultant for the topic but I am not sure how much of that conversation I could share publicly so I will refrain myself of doing so.
But I think I can say that it is not only about data structure or quality. The changes in methodology applied by Google in September might be playing a stronger role than what people initially thought
What "changes in methodology applied by Google in September" are you referring to? There surely is a public announcement that can be shared? Most curious to hear as a shop I built is experiencing massive issues since august / september 2025
Adding another data point from my own experience. I had a very similar case with my own blog which I detailed here: https://blog.matthewbrunelle.com/i-dont-want-to-play-the-seo...
Gone through everything I can find, but nothing has made a difference for months now. Would love to hear any thoughts people have that aren't the usual checklist items.
Good news is I still get some visitors through Kagi, DDG, and Bing.
I never really use it but there is a lot in the Yahoo index that google refuses to index.
https://search.yahoo.com/search?p=blog.james-zhan.com&fr=yfp...
I thought Yahoo! was just Bing now. The real Yahoo! died ages ago.
I noticed google not being able to find smaller blogs a few years ago. The sort of blogs I used to like and read a lot - small irregular blog of an expert in something like cryptography, sociology etc kind of disappeared from the search. Then they disappeared for real.
Even when I knew the exact name of article I was looking for google was unable to find it. And yes it still existed,
Marginalia search is good for this. But generally it’s becoming a real problem due to Google’s dominance. Blog authors ought to look into web rings and crosslinks again to help each other out with discovery.
I have noticed the same thing. And the blogs are still there, I checked, and marginalia returns them as top results when I search the relevant keywords. Google just really doesn't care.
Probably an intern (oh its 2025, maybe LLM?) messed up some spaghetti part and the async job for reindexing your site is failing since then and the on-call is busy taking mojito/the alert is silenced :)
Parts of Google are all Blackbox-y. Never know when computer says no. And if they had usable ways to contact a human they’d just tell you they don’t know either
A weird thing: on the hacker news page, in firefox mobile, all the visited links are grey, but the link to this blog post won't turn grey even when visited.
I find this thing sometimes works itself out. Just submit sitemaps and the usual stuff and be careful with your HTML.
I ran into the same thing! My site still isnt indexed and I would REALLY like to not change the URL (its a shop and the url is printed on stuff) - redirects are my last resort.
But basically what happened: In august 2025 we finished the first working version of our shop. I wanted to accelerate indexing after some weeks because only ~50 of our pages were indexed and submitted the sitemap and everything got de-indexed within days. I thought for the longest time that its content quality because we sell niche trading cards and the descriptions are all one liners i made in Excel. ("This is $cardname from $set for your collection or deck!"). And because its single trading cards we have 7000+ products that are very similiar. (We did do all product images ourselves I thought google would like this but alas).
But later we added binders, whole sets and took a lot of care with their product data. The frontpage also got a massive overhaul - no shot. Not one page in index. We still get traffic from marketplaces and our older non-shop site. The shop itself lives on a subdomain (shop.myoldsite.com). The normal site also has a sitemap but that one was submitted 2022. I later rewrote how my sitemaps were generated and deleted the old ones in search console hoping this would help. It did not. (The old sitemap was generated by the shop system and was very large. Some forums mentioned that its better to create a chunked sitemap so I made a script that creates lists with 1000 products at a time as well as an index for them.)
Later observations are:
- Both sitemaps i deleted in GSC are still getting crawled and are STILL THERE. You cant see them in the overview but if you have the old links they still appear as normal.
- We eventually started submitting product data to google merchant center as well. It works 100% fine and our products are getting found and bought. The clicks still even show up in search console!!!! So I have a shop with 0 indexed pages in GSC that gets clicks every day. WTHeck?
So like... I dont even know anymore. Maybe we also have to restart like the person in the blog did and move the shop to a new domain and NEVER give google a sitemap. If I really go that route I will probably delete the cronjob that creates the sitemap in case google finds it by itself. But also like what the heck? I have worked in a web agency for 5 years and created a new webpage about every 2-8 weeks so i roughly launached about 50-70 webpages and shops and i NEVER saw that happen. Is it an ai hallucinating? Is it anti spam gone too far? Is it a straight up bug that they dont see? Who knows. I dont
(Good article though and I hope maybe some other people chime in and googlers browsing HN see this stuff).
At the risk of sounding crazy, I de-indexed my blog myself and rely on the mailing list (which is now approaching 5000 subscribers) + reprints in several other online media to get traffic to me. On a good day, I get 5000 hits, which is quite a lot by Czech language community standards.
Together with deleting my Facebook and Twitter accounts, this removed a lot of pressure to conform to their unclear policies. Especially around 2019-21, it was completely unclear how to escape their digital guillotine which seemed to hit various people randomly.
The deliverability problem still stands, though. You cannot be completely independent nowadays. Fortunately my domain is 9 years old.
Similar here. I've removed my websites from Google so people would find that other engines can return more complete results. People can find the content using any other engine (or LLM) or links in relevant places. Using anything but the monopolist should pay off until it's no longer a monopoly, or at least not one abused to gain power in other markets like by advertising their browser product on the homepage and not allowing anyone else to advertise there (much less for free)
This is not about bears at all. Very disappointed.
Bear witness that Google have bear away from bearing Bear blog.
It’s weird that the number one search engine in modern times is so finicky, perhaps just has become way over-engineered and over-rigged. Just index the web, at a certain point they went from search engine to arbiter of what people can find.
We depend too much on Google.
Why do you care if Google indexes your site or not?
I'm annoyed that mine even shows up on Google.
We need a P2P internet.
No more Google. No more websites. A distributed swarm of ephemeral signed posts. Shared, rebroadcasted.
When you find someone like James and you like them, you follow them. Your local algorithm then prioritizes finding new content from them. You bookmark their author signature.
Like RSS but better. Fully distributed.
Your own local interest graph, but also the power of your peers' interest graphs.
Content is ephemeral but can also live forever if any nodes keep rebroadcasting it. Every post has a unique ID, so you can search for it later in the swarm or some persistent index utility.
The Internet should have become fully p2p. That would have been magical. But platforms stole the limelight just as the majority of the rest of the world got online.
If we nerds had but a few more years...
Websites are p2p by default actually. It's just discovery that goes through google.
Isn't what you're describing something like mastodon or usenet?
There is a technological feudalism being built in an ongoing manner, and you and I cannot do anything with it.
On the other side of the same coin there are already governments that will make you legally responsible of what your page's visitors write in comments. This renders any p2p internet legally unbearable (i.e. someone goes to your page, posts some bad word and you get jailed). So far they say "it's only for big companies" but it's a lie, just boiling frogs.
Depends what your times scale is for "being built". 50 years ago the centralization and government control were much stronger. 20 years ago probably less.
"cannot do anything" is relative. Google did something about it (at least for the first 10-15 years) but I am sure that was not their primary intention nor they were sure it will work. So "we have no clue what will work to reduce it" is more appropriate.
Now I think everybody has tools to build stuff easier (you could not make a television or a newspaper 50 years ago). That is just an observation of possibility, not a guarantee of success.
That literally already exists and nobody uses it. Gnutella. Jabber. Tor. IPFS. Mastodon. The Entire Fucking IPv4/IPv6 Address Space And Every Layer Built On Top Of It. (if you don't think the internet is p2p, you don't understand how it works)
You know what else we need? We need food to be free. We need medicine to be free, especially medicines which end epidemics and transmissible disease. We need education to be free. We need to end homelessness. We need to end pollution. We need to end nationalism, racism, xenophobia, sexism. We need freedom of speech, religion, print, association. We need to end war.
There are a lot of things we as a society need. But we can't even make "p2p internet" work, and we already have it. (And please just forget the word 'distributed', because it's misleading you into thinking it's a transformative idea, when it's not)
I don't think we need for food to be free, we just need it to be accessible to everyone.
Every family should be provided with a UBI that covers food and rent (not in the city). That is a more attainable goal and would solve the same problems (better, in fact).
(Not saying that UBI is a panacea, but I've lived in countries that have experimented with such and it seems the best of the alternatives)
I do not think free is attainable for everything due to thermodynamics constraints. Imagine "free energy". Everybody uses as much as they want, Earth heats up, things go bad (not far from what is actually happening!).
I would settle for simpler, attainable things. Equal opportunity for next generation. Quality education for everybody. Focus on merit not other characteristics. Personal freedom if it does not infringe on the freedom of people around you (ex: there can't be such thing as a "freedom to pollute").
In my view Internet as p2p worked pretty well to improve the previous status quo in many areas (not all). But there will never be a "stable solution", life and humans are dynamic. We do have some good and free stuff on the Internet today because of the groundwork laid out 30 years ago by the open source movement. Any plan started today will have noticeable effect in many years. So "we can't even make" sounds more of an excuse to not start, rather than an honest take.
We already have excesses of everything needed to provide for people's basic needs for no extra cost. We have excess food, excess land for housing, and we already pay for free emergency services, which actually costs us much more than if we fixed problems before they became emergencies. (And if there were a need for extra cost, we have massive wealth inequality that can be adjusted, not to mention things like massive military budgets and unfair medical pricing)
> Equal opportunity for next generation.
What does this mean? I suppose it can't literally mean equal opportunity, because people aren't equal, and their circumstances aren't equal; but then, what does this mean?
A clear definition is definitively hard to come by, but I will share what I see as rather large issues that impact society: minimal spending per children for education to allow a good service for most (this will imply that smart kids are selected and become productive as opposed to drop out because they had nobody to learn from); reasonable health availability for children such that they can develop rather than being sick; sufficient food for children to support the first two (can't learn or be healthy if you are hungry).
Currently I know in many countries multiple measures/rules/policies that affect these 3 things in ways that I find damaging for the society overall on the long term. Companies complain they don't have work forces, governments complain the natality is low but there are many issues with raising a child. Financial incentives to parents do not seem to work (for example: https://www.bbc.com/news/world-europe-47192612)
Centralization is simply more efficient. Redundancy is a cost and network effects make it even worse. You’d have to go the authoritarian route - effectively and/or outright ban Google and build alternatives, like Yandex or Baidu.
For a lot of things we don’t opt for the cheapest solutions that also lack redundancy for a lot of things. Why not for the “information highway”?
Most efficient = cheaper. A lot of times cheaper sacrifices quality, and sometimes safety.
It's not even that. It's that "centralization is more efficient" is a big fat lie. If you look at the "centralized systems" they're... not actually technologically centralized, they're really just a monopolist that internally implements a distributed system.
How do you think Google or Cloudflare actually work? One big server in San Francisco that runs the whole world, or lots of servers distributed all over?
I know exactly how they work, but they have a single entry point, as a customer you don't really care that the system is global, and they also have a single control plane, etc. Decisions are efficient if they need to be taken only once. The underlying architecture is irrelevant for the end user.
Why do you think they're a monopoly in the first place? Obviously because they were more efficient than the competition and network effects took care of the rest. Having to make choices is a cost for the consumer - IOW consumers are lazy - so winners have staying power, too. It's a perfect storm for a winner-takes-all centralization since a good centralized service is the most efficient utility-wise ('I know I'm getting what I need') and decision-cost-wise ('I don't need to search for alternatives') for consumers until it switches to rent seeking, which is where the anti-monopoly laws should kick in.
> Decisions are efficient if they need to be taken only once.
In other words, open source decentralized systems are the most efficient because you don't have to reduplicate a competitor's effort when you can just use the same code.
> Obviously because they were more efficient than the competition and network effects took care of the rest.
In most cases it's just the network effect, and whether it was a proprietary or open system in any given case is no more than the historical accident of which one happened to gain traction first.
> Having to make choices is a cost for the consumer
If you want an email address you can choose between a few huge providers and a thousand smaller ones, but that doesn't seem to prevent anyone from using it.
> until it switches to rent seeking
If it wasn't an open system from the beginning then that was always the end state and there is no point in waiting for someone to lock the door before trying to remove yourself from the cage.
> just use the same code
This is the great lie. Approximately zero end consumers care about code, the product they consume is the service, and if the marginal cost of switching the service provider is zero, it's enough to be 1% better to take 99% of the market.
> Approximately zero end consumers care about code
Most people don't care about reading it. They very much care about what it does.
Also, it's not "approximately zero" at all. It's millions or tens of millions of people out of billions, and when a small minority of people improve the code -- because they have the ability to -- it improves the code for all the rest too. Which is why they should have a preference for the ability to do it even if they're not going to be the one to exercise it themselves.
> if the marginal cost of switching the service provider is zero, it's enough to be 1% better to take 99% of the market.
Except that you'd then need to be 1% better across all dimensions for different people to not have different preferences, and everyone else is trying to carve out a share of the market too. Meanwhile if you were doing something that actually did cause 99% of people to prefer a service that does that then everybody else would start doing it.
There are two main things that cause monopolies. The first is a network effect, which is why those things all need to be open systems. The second is that one company gets a monopoly somewhere (often because of the first, sometimes through anti-competitive practices like dumping) and then leverages it in order to monopolize the supply chain before and after that thing, so that competing with them now requires not just competing with the original monopoly but also reproducing the rest of the supply chain which is now no longer available as independent commodities.
This is why we need antitrust laws, but also why we need to recognize that antitrust laws are never perfect and do everything possible to stamp out anything that starts to look like one of those markets through development of open systems and promoting consumer aversion to products that are inevitably going to ensnare them.
"People don't want X" as an observed behavior is a bunch of nonsense. People's preferences depend on their level of information. If they don't realize they're walking into a trap then they're going to step right into it. That isn't the same thing as "people prefer walking into a trap". They need to be educated about what a trap looks like so they don't keep ending up hanging upside down by their ankles as all the money gets shaken out of their pockets.
No, you need to bust up Google as the monopolist it is.
YouTube should get split out and then broken up. Google Search should get split out and broken up. etc.
This is not a problem you solve with code. This is a problem you solve with law.
> This is not a problem you solve with code. This is a problem you solve with law.
When the DMCA was a bill, people were saying that the anti-circumvention provision was going to be used to monopolize playback devices. They were ignored, it was passed, and now it's being used to monopolize not just playback devices but also phones.
Here's the test for "can you rely on the government here": Have they repealed it yet? The answer is still no, so how can you expect them to do something about it when they're still actively making it worse?
Now try to imagine the world where the Free Software Foundation never existed, Berkeley never released the source code to BSD and Netscape was bought by Oracle instead of being forked into Firefox. As if the code doesn't matter.
Yes. It's a political problem and a very old one. That's why we also already have solutions for it, antitrust laws and other regulations to ensure competition and fairness in the market, to keep it free. Governments just have to keep funding and enabling these institutions.
There are very few things I'd consider a silver bullet for a lot of problems, but antitrust enforcement to break up near-monopolies is one of them.
Why do you need Google at all?
From what you've described, you've just re-invented webrings.
When I reload the page "https://journal.james-zhan.com/google-de-indexed-my-entire-b...", I get
Request URL: https://journal.james-zhan.com/google-de-indexed-my-entire-b...
Request Method: GET
Status Code: 304 Not Modified
So maybe it's the status code? Shouldn't that page return a 200 ok?
When I go to blog.james..., I first get a 301 moved permanently, and then journal.james... loads, but it returns a 304 not modified, even if i then reload the page.
Only when I fully sumbit the URL again in the URL-bar, it responds with a 200.
Maybe crawling also returns a 304, and Google won't index that?
Maybe prompt: "why would a 301 redirect lead to a 304 not modified instead of a 200 ok?", "would this 'break' Google's crawler?"
> When Google's crawler follows the 301 to the new URL and receives a 304, it gets no content body. The 304 response basically says "use what you cached"—but the crawler's cache might be empty or stale for that specific URL location, leaving Google with nothing to index.
You get a 304 because your browser tells the server what it has cached, and the server says "nothing changed, use that". In browsers you can bypass the cache by using Ctrl-F5, or in the developer tools you can usually disable caching while they're open. Doing so shows that the server is doing the right thing.
Your LLM prompt and response are worthless.
When Chrome serves a cached page, like when you click a on this page and then navitate back or hit F5, it shows it like this:
Request URL: https://news.ycombinator.com/item?id=46196076
Request Method: GET
Status Code: 200 OK (from disk cache)
I just thought that it would be worthwhile investigating in that direction.
That's a different situation. The browser decides what to do depending on the situation and what was communicated about caching. Sometimes it sends a request to the server along with information about what it already has. Then it can get back a 304. Other times it already knows the cached data is fine, so it doesn't send a request to the server in the first place. The developer tools show this as a cached 200.
Got it, thanks for explaining.
Has anyone noticed that the response for the blog page has a header: "x-robots-tag: noindex, nofollow"? What's the purpose of this header on a content page?
UPD: Sorry, never mind, I inspected a wrong response.
I don't see it. With Chrome devtools, for the posted URL I see X-Clacks-Overhead, X-Content-Type-Options, and X-Frame-Options. No X-Robots-Rag.
And no <meta name="robots"> in the HTML either.
What URL are you seeing that on? And what tool are you using to detect that?
Edit: cURL similarly shows no such header for me:
Sorry. I am an idiot. Checked the wrong url. Please ignore.
> Second, the issue couldn’t be the quality or the quantity of the content. I came across some other pretty barebones Bear blogs that don’t have much content, and looked them up on Google, and they showed up in the results just fine. An example:
Suggestion: Remember that many large companies are emergently shitty, with shitty processes, and individuals motivated to act in shitty ways.
When a company is so powerful, this might be a time to think about solidarity.
When you're feeling an unexplained injustice from them, sometimes saying "but you let X do it" could just throw X under the bus.
Whether because a fickle process or petty actor simply missed X before, or because now they have new reason to double down and also punish the others (to CYA consistency, or, if petty, to assert their power now that you've questioned it).