Not sure if it's coincidental that OpenAI's open weights release got delayed right after an ostensibly excellent open weights model (Kimi K2) got released today.
They might also be focusing all their work on beating Grok 4 now, since xAi has a significant edge in accumulating computing power and they opened a considerable gap in raw intelligence tests like ARC and HLE. OpenAI is in this to win the competitive race, not the open one.
I'm starting to think talent is way less concentrated in these individuals than execs would have investors believe. While all those people who left OpenAI certainly have the ability to raise ridiculous sums of venture capital in all sorts of companies, Anthropic remains the only offspring that has actually reached a level where they can go head-to-head with OpenAI. Zuck now spending billions on snatching those people seems more like a move out of desperation than a real plan.
At this point, it seems to be more engineering throughput that will decide short to medium term outcomes. I’ve yet to see a case where an IC who took a position only because of X in fact outrageous compensation package (especially if not directly tied to longterm company performance through equity) was ever productive again. Meta certainly doesn’t strike me as a company that attracts talent for their “mission.”
TLDR Zuck’s recent actions definitely smell like a predictable failure driven by desperation to me.
You've been shadowbanned for saying some things that go against the prevailing groupthink, so all your comments within the last couple of months are invisible for most users.
I really think it's disrespectful towards honest users (excluding spammers and obvious trolls), but I don't pay HN's moderation bills...
Am I the only one who thinks mention of “safety tests” for LLMs is a marketing scheme? Cars, planes and elevators have safety tests. LLMs don’t. Nobody is going to die if a LLM gives an output that its creators do not like, yet when they say “safety tests”, they mean that they are checking to what extent the LLM will say things they do not like.
An LLM can trivially instruct someone to take medications with adverse interactions, steer a mental health crisis toward suicide, or make a compelling case that a particular ethnic group is the cause of your society's biggest problem so they should be eliminated. Words can't kill people, but words can definitely lead to deaths.
Part of the problem is due to the marketing of LLMs as more capable and trustworthy than they really are.
And the safety testing actually makes this worse, because it leads people to trust that LLMs are less likely to give dangerous advice, when they could still do so.
Spend 15 minutes talking to a person in their 20's about how they use ChatGPT to work through issues in their personal lives and you'll see how much they already trust the "advice" and other information produced by LLMs.
Netflix needs to do a Black Mirror episode where either a sentient AI pretends that it's "dumber" than it is while secretly plotting to overthrow humanity. Either that or a LLM is hacked by deep state actors that provides similar manipulated advice.
It's not just young people. My boss (originally a programmer) agreed with me that there's lots of problems using ChatGPT for our products and programs as it gives the wrong answers too often, but tgen 30 seconds later told me that it was apparently great at giving medical advice.
...later someone higher-up decided that it's actually great at programming as well, and so now we all believe it's incredibly useful and necessary for us to be able to do our daily work
Most doctors will prescribe antibiotics for viral infections just to get you out and the next guy in, they have zero interest in sitting there to troubleshoot with you.
For this reason o3 is way better than most of the doctors I've had access to, to the point where my PCP just writes whatever I brought in because she can't follow 3/4 of it.
Yes, the answers are often wrong and incomplete, and it's up to you to guide the model to sort it out, but it's just like vibe coding: if you put in the steering effort, you can get a decent output.
Would it be better if you could hire an actual professional to do it? Of course. But most of us are priced out of that level of care.
Family in my case. There are two reasons they do this. A lot of people like medicine - they think it justifies the cost of the visit, and there's a real placebo effect (which is not an oxymoron as many might think).
The second is that many viral infections can, in rare scenarios, lead to bacterial infections. For instance a random flu can leave one more susceptible to developing pneumonia. Throwing antibiotics at everything is a defensive measure to help ward of malpractice lawsuits. Even if frivolous, it's something no doctor wants to deal with, but some absurd number - something like 1 in 15 per year, will.
I can co-sign this being bi-coastal. in the US not once have I or my 12-year old kid been prescribed antibiotics. on three ocassions in europe I had to take my kid to the doctor and each time antibiotics were prescribed (never consumed)
This is analogous to saying a computer can be used to do bad things if it is loaded with the right software. Coincidentally, people do load computers with the right software to do bad things, yet people are overwhelmingly opposed to measures that would stifle such things.
If you hook up a chat bot to a chat interface, or add tool use, it is probable that it will eventually output something that it should not and that output will cause a problem. Preventing that is an unsolved problem, just as preventing people from abusing computers is an unsolved problem.
(1) Execute yes (with or without arguments, whatever you desire).
(2) Let the program run as long as you desire.
(3) When you stop desiring the program to spit out your argument,
(4) Stop the program.
Between (3) and (4) some time must pass. During this time the program is behaving in an undesired way. Ergo, yes is not a counter example of the GP's claim.
I upvoted your reply for its clever (ab)use of ambiguity to say otherwise to a fairly open and shut case.
That said, I suspect the other person was actually agreeing with me, and tried to state that software incorporating LLMs would eventually malfunction by stating that this is true for all software. The yes program was an obvious counter example. It is almost certain that all LLMs will eventually generate some output that is undesired given that it is determining the next token to output based on probabilities. I say almost only because I do not know how to prove the conjecture. There is also some ambiguity in what is a LLM, as the first L means large and nobody has made a precise definition of what is large. If you look at literature from several years ago, you will find people saying 100 million parameters is large, while some people these days will refuse to use the term LLM to describe a model of that size.
Radical rebuttal of this idea: if you hire an assassin then you are responsible too (even more so, actually), even if you only told them stuff over the phone.
> An LLM can trivially make a compelling case that a particular ethnic group is the cause of your society's biggest problem so they should be eliminate
This is an extraordinary claim.
I trust that the vast majority of people are good and would ignore such garbage.
Even assuming that an LLM can trivially build a compelling case to convince someone who is not already murderous to go on a killing spree to kill a large group of people, one killer has limited impact radius.
For contrast, many books and religious texts, have vastly more influence and convincing power over huge groups of people. And they have demonstrably caused widespread death or other harm. And yet we don’t censor or ban them.
Table saws sold all over the world are inspected and certified by trusted third parties to ensure they operate safely. They are illegal to sell without the approval seal.
Moreover, table saws sold in the United States & EU (at least) have at least 3 safety features (riving knife, blade guard, antikickback device) designed to prevent personal injury while operating the machine. They are illegal to sell without these features.
Then of course there are additional devices like sawstop, but it is not mandatory yet as far as I'm aware. Should be in a few years though.
LLMs have none of those board labels or safety features, so I'm not sure what your point was exactly?
They are somewhat self regulated, as they can cause permament damage to the company that releases them, and they are meant for general consumers without any training, unlike table saws that are meant for trained people.
An example is the first Microsoft bot that started to go extreme rightwing when people realized how to make it go that direction. Grok had a similar issue recently.
Google had racial issues with its image generation (and earlier with image detection). Again something that people don't forget.
Also an OpenAI 4o release was encouraging stupid things to people when they asked stupid questions and they just had to roll it back recently.
Of course I'm not saying that that's the real reason (somehow they never say that the problem is with performance for not releasing stuff), but safety matters with consumer products.
The problem is “safety” prevents users from using LLMs to meet their requirements.
We typically don’t critique the requirements of users, at least not in functionality.
The marketing angle is that this measure is needed because LLMs are “so powerful it would be unethical not to!”
AI marketers are continually emphasizing how powerful their software is. “Safety”
reinforces this.
“Safety” also brings up many of the debates “mis/disinformation” brings up. Misinformation concerns consistently overestimate the power of social media.
I’d feel much better if “safety” focused on preventing unexpected behavior, rather than evaluating the motives of users.
Was there? It seems like that was the perfect natural experiment then. So what was the outcome? Was there a sudden rash of holocausts the year that publishing started again?
Major book publishers have sensitivity readers that evaluate whether or not a book can be "safely" published nowadays. And even historically there have always been at least a few things publishers would refuse to print.
GP said major publishers. There's nothing stopping you from printing out your book and spiral binding it by hand, if that's what it takes to get your ideas into the world. Companies having standards for what they publish isn't censorship.
At the end of the day an LM is just a machine that talks. It might say silly things, bad things, nonsensical things, or even crazy insane things. But end the end of the day it just talks. Words don't kill.
does your CPU, your OS, your web browser come with ~~built-in censorship~~ safety filters too?
AI 'safety' is one of the most neurotic twitter-era nanny bullshit things in existence, blatantly obviously invented to regulate small competitors out of existence.
It isn’t. This is dismissive without first thinking through the difference of application.
AI safety is about proactive safety. Such an example: if an AI model could be used to screen hiring applications, making sure it doesn’t have any weighted racial biases.
The difference here is that it’s not reactive. Reading a book with a racial bias would be the inverse; where you would be reacting to that information.
That’s the basis of proper AI safety in a nutshell
As someone who has reviewed people’s résumés that they submitted with job applications in the past, I find it difficult to imagine this. The résumés that I saw had no racial information. I suppose the names might have some correlation to such information, but anyone feeding these things into a LLM for evaluation would likely censor the name to avoid bias. I do not see an opportunity for proactive safety in the LLM design here. It is not even clear that they even are evaluating whether there is bias in such a scenario when someone did not properly sanitize inputs.
Luckily, this is something that can be studied and has been. Sticking a stereotypically Black name on a resume on average substantially decreases the likelihood that the applicant will get past a resume screen, compared to the same resume with a generic or stereotypically White name:
That is a terrible study. The stereotypically black names are not just stereotypically black, they are stereotypical for the underclass of trashy people. You would also see much higher rejection rates if you slapped stereotypical white underclass names like "Bubba" or "Cleetus" on resumes. As is almost always the case, this claim of racism in America is really classism and has little to do with race.
"Names from N.C. speeding tickets were selected from the most common names where at least 90% of individuals are reported to belong to the relevant race and gender group."
If you're deploying LLM-based decision making that affects lives, you should be the one held responsible for the results. If you don't want to do due diligence on automation, you can screen manually instead.
okay. and? there are no AI 'safety' laws in the US.
without OpenAI, Anthropic and Google's fearmongering, AI 'safety' would exist only in the delusional minds of people who take sci-fi way too seriously.
for fuck's sake, how more obvious could they be? sama himself went on a world tour begging for laws and regulations, only to purge safetyists a year later. if you believe that he and the rest of his ilk are motivated by anything other than profit, smh tbh fam.
it's all deceit and delusion. China will crush them all, inshallah.
Yes, perfection is difficult, but it's relative. It can definitely be made much safer. Looking at the analysis of pre vs post alignment makes this obvious, including when the raw unaligned models are compared to "uncensored" models.
At my company (which produces models) almost all the responsible AI jazz is about DEI and banning naughty words. Little actions on preventing bad outcomes
> Am I the only one who thinks mention of “safety tests” for LLMs is a marketing scheme?
It is. It is also part of Sam Altman’s whole thing about being the guy capable of harnessing the theurgical magicks of his chat bot without shattering the earth. He periodically goes on Twitter or a podcast or whatever and reminds everybody that he will yet again single-handedly save mankind. Dude acts like he’s Buffy the Vampire Slayer
There are other forms of safety, but whether a digital parrot says something that people do not like is not a form of safety. They are abusing the term safety for marketing purposes.
You're abusing the terms by picking either the overly limited ("death") or overly expansive ("not like") definitions to fit your conclusion.
Unless you reject the fact that harm can come from words/images, a parrot can parrot harmful words/images, so be unsafe.
it's like complaining about bad words in the dictionary
the bot has no agency, the bot isn't doing anything, people talk to themselves, augmenting their chain of thought with an automated process. If the automated process is acting in an undesirable manner, the human that started the process can close the tab.
The maxim “sticks and stones can break my bones, but words can never hurt me” comes to mind here. That said, I think this misses the point that the LLM is not a gatekeeper to any of this.
It is possible to turn any open weight model into that with fine tuning. It is likely possible to do that with closed weight models, even when there is no creator provided sandbox for fine tuning them, through clever prompting and trying over and over again. It is unfortunate, but there really is no avoiding that.
That said, I am happy to accept the term safety used in other places, but here it just seems like a marketing term. From my recollection, OpenAI had made a push to get regulation that would stifle competition by talking about these things as dangerous and needing safety. Then they backtracked somewhat when they found the proposed regulations would restrict themselves rather than just their competitors. However, they are still pushing this safety narrative that was never really appropriate. They have a term for this called alignment and what they are doing are tests to verify alignment in areas that they deem sensitive so that they have a rough idea to what extent the outputs might contain things that they do not like in those areas.
Playing devil's advocate, what if it was more subtle?
Prolonged use of conversational programs does reliably induce certain mental states in vulnerable populations. When ChatGPT got a bit too agreeable, that was enough for a man to kill himself in a psychotic episode [1]. I don't think this magnitude of delusion was possible with ELIZA, even if the fundamental effect remains the same.
Could this psychosis be politically weaponized by biasing the model to include certain elements in its responses? We know this rhetoric works: cults have been using love-bombing, apocalypticism, us-vs-them dynamics, assigned special missions, and isolation from external support systems to great success. What we haven't seen is what happens when everyone has a cult recruiter in their pocket, waiting for a critical moment to offer support.
ChatGPT has an estimated 800 million weekly active users [2]. How many of them would be vulnerable to indoctrination? About 3% of the general population has been involved in a cult [3], but that might be a reflection of conversion efficiency, not vulnerability. Even assuming 5% are vulnerable, that's still 40 million people ready to sacrifice their time, possessions, or even their lives in their delusion.
You’re worried about indoctrination in an LLM but it starts much earlier than that. The school system is indoctrination of our youngest minds, both today in the West and its Prussian origins
I go on Polymarket and find things that would make me happy or optimistic about society and tech, and then bet a couple of dollars (of some shitcoin) against them.
Last month I was up about ten bucks because OpenAI wasn't open, the ceasefire wasn't a ceasefire, and the climate metrics got worse. You can't hedge away all the existential despair, but you can take the sting out of it.
> go on Polymarket and find things that would make me happy or optimistic about society and tech, and then bet a couple of dollars (of some shitcoin) against them.
Classic win win bet. Your bet wins -> you make money (win). Your bet loses -> something good happened for society (win).
People use crypto on Polymarket because it doesn't comply with gambling regulations, so in theory isn't allowed to have US customers. Using crypto as an intermediary lets Polymarket pretend not to know where the money is coming from. Though I think a more robust regulator would call them out on the large volume of betting on US politics on their platform...
Calling them out is one thing, but do you think the US could realistically stop them?
I don't know much about Polymarkets governance structure, if it's a decentralized smart contract, the US is DOA. Even if it's not... the Pirate Bay wasn't, the US really tried to stop them, and they basically didn't get anywhere.
Looking it up, it seems like the CEO actually got raided by the FBI last year: https://www.reuters.com/world/us/fbi-raids-polymarket-ceos-h... So maybe the wheels of justice are just grinding a bit slowly. The terms of use want to have Panamanian law apply https://polymarket.com/tos but that doesn't provide much protection when the company is physically operating in the US.
Bitcoin is higher than ever. People can't wait until it gets high enough that they can sell it for dollars, and use those dollars to buy things and make investments in things that are valuable.
That's just speculation though. I saw a cynical comment on reddit yesterday that unfortunately made a lot of sense. Many people now are just so certain that the future of work is not going to include many humans, so they're throwing everything into stocks and crypto, which is why they remain so high even in the face of so much political uncertainty. It's not that people are investing because they have hope. People are just betting everything as a last ditch survival attempt before the robots take over.
Of course this is hyperbolic - market forces are never that simple. But I think there might be some truth to it.
The question was about people using crypto to buy things. The person above me was implying that because it's going up in value, people are using it that way. I replied to say that it's (mostly) just speculation. Which is a kind of use, but not the one being implied.
Gold (and land, jewels, antiques etc.) are bought and held during times of uncertainty because people believe they will retain their value through virtually anything. Stocks don't work that way. In times of uncertainty, gold should increase in value, stocks should decrease.
Crypto is high because people keep believing some sucker in the future will buy it from them even higher. So far, they've been right all along. You really think crypto is ever going to pay some kind of dividend?
"Gambling can be addictive. Please gamble responsibly. You must be 18 years or older to gamble. If you need help, please contact your local gambling advice group or your doctor"
> But Deepseek cost $5M to develop, and made multiple novel ways to train
This is highly contested, and was either a big misunderstanding by everyone reporting it, or maliciously placed there (by a quant company, right before the stock fell a lot for nvda and the rest) depending on who you ask.
If we're being generous and assume no malicious intent (big if), anyone who has trained a big model can tell you that the cost of 1 run is useless in the big scheme of things. There is a lot of cost in getting there, in the failed runs, in the subsequent runs, and so on. The fact that R2 isn't there after ~6 months should say a lot. Sometimes you get a great training run, but no-one is looking at the failed ones and adding up that cost...
They were pretty explicit that this was only the cost in GPU hours to USD for the final run. Journalists and Twitter tech bros just saw an easy headline there. It's the same with Clair Obscur developer's Sandfall, where the people say that the game was made by 30 people, when there were 200 people involved.
These "200 people" were counted from credits which list pretty much everyone who even sniffed at the general direction of the studio's direction. The studio itself is ~30 people (just went and check on their website, they have a team list with photos for everyone). The rest are contractors whose contributions usually vary wildly. Besides, credits are free so unless the the company are petty (see Rockstar not crediting people on their games if they leave before the game is released even if they worked on it for years) people err on the site on crediting everyone. Personally i've been credited on a game that used a library i wrote once and i learned about it years after the release.
Most importantly those who mention that the game was made by 30 people do it to compare it with other much larger teams with hundreds if not thousands of people and those teams use contractors too!
> They were pretty explicit that this was only the cost in GPU hours to USD for the final run.
The researchers? Yes.
What followed afterwards, I'm not so sure. There was clearly some "cheap headlines" in the media, but there were also some weird coverage being pushed everywhere, from weird tlds, and they were all pushing nvda dead, cheap deepseek, you can run it on raspberries, etc. That might have been a campaign designed to help short the stocks.
Actually the majority of Google models are open source and they also were pretty fundamental in pushing a lot of the techniques in training forward - working in the AI space I’ve read quite a few of their research papers and I really appreciate what they’ve done to share their work and also release their models under licenses that allow you to use them for commercial purposes.
"Actually the majority of Google models are open source"
That's not accurate. The Gemini family of models are all proprietary.
Google's Gemma models (which are some of the best available local models) are open weights but not technically OSI-compatible open source - they come with usage restrictions: https://ai.google.dev/gemma/terms
You’re ignoring the T5 series of models that were incredibly influential, the T5 models and their derivatives (FLAN-T5, Long-T5, ByT5, etc) have been downloaded millions of times on huggingface and are real workhorses. There are even variants still being produced within the last year or so.
A yea the Gemma series is incredible and while maybe not meeting the standards of OSI - I consider them to be pretty open as far as local models go. And it’s not just the standard Gemma variants, Google is releasing other incredible Gemma models that I don’t think people have really even caught wind of yet like MedGemma, of which the 4b variant has vision capability.
I really enjoy their contributions to the open source AI community and think it’s pretty substantial.
Exactly. Not to minimize Deepseeks tremendous achievement, but that $5 million was just for the training run, not the GPUs used they purchased before, and all the OpenAI API calls they likely used to assist in synthetic data generation.
Yeah, I hate that this figure keeps getting thrown around. IIRC, it's the price of 2048 H800s for 2 months at $2/hour/GPU. If you consider months to be 30 days, that's around $5.7M, which lines up. What doesn't line up is ignoring the costs of facilities, salaries, non-cloud hardware, etc. which will dominate costs, I'd expect. $100M seems like a fairer estimate, TBH. The original paper had more than a dozen authors, and DeepSeek had about 150 researchers working on R1, which supports the notion that personnel costs would likely dominate.
>ignoring the costs of facilities, salaries, non-cloud hardware, etc.
If you lease, those costs are amortized. It was definitely more than $5M, but I don't think it was as high as $100M. All things considered, I still believe Deepseek was trained at one (perhaps two) orders of magnitude lower cost than other competing models.
That is also just the final production run. How many experimental runs were performed before starting the final batch? It could be some ratio like 10 hours of research to every one hour of final training.
Parent is referencing the recent court case with Anthropic, and the legal requirement of not copying books, but consuming them- translating to Anthropic having to destroy every book it uses as input data in order to comply with said requirements.
Deepseek R1 was trained at least partially on the output of other LLMs. So, it might have been much more expensive if they needed to do it themselves from scratch.
Probably the results were worse than K2 model released today. No serious engineer would say it's for "safety" reasons given that ablation nullifies any safety post-training.
I'm expecting (and indeed hoping) that the open weights OpenAI model is a lot smaller than K2. K2 is 1 trillion parameters and almost a terabyte to download! There's no way I'm running that on my laptop.
I think the sweet spot for local models may be around the 20B size - that's Mistral Small 3.x and some of the Gemma 3 models. They're very capable and run in less than 32GB of RAM.
I really hope OpenAI put one out in that weight class, personally.
Early rumours (from a hosting company that apparently got early access) was that you'd need "multiple h100s to run it", so I doubt it's a gemma - mistral small tier model..
You will get at 20gb model. Distillation is so compute efficient that it’s all but inevitable that if not OpenAI, numerous other companies will do it.
I would rather have an open weights model that’s the best possible one I can run and fine tune myself, allowing me to exceed SOTA models on the narrower domain my customers care about.
What is their business purpose for releasing an open-weights model? How does it help them? I asked an LLM but it just said vague unconvincing things about ecosystem plays and fights for talent.
Wow. Twitter is not a serious website anymore. Why are companies and professionals still using it? Is it really like that now, with all that noise from grok floating to the top?
My pet theory is that they delayed this because grok-4 released because they explicitly want to not be seen as competing with them by pulling the usual trick of releasing right around when google does. Feels like a very sam altman move in my model of his mind.
https://nitter.space/sama/status/1943837550369812814
Not sure if it's coincidental that OpenAI's open weights release got delayed right after an ostensibly excellent open weights model (Kimi K2) got released today.
https://moonshotai.github.io/Kimi-K2/
OpenAI know they need to raise the bar with their release. It can't be a middle-of-the-pack open weights model.
They might also be focusing all their work on beating Grok 4 now, since xAi has a significant edge in accumulating computing power and they opened a considerable gap in raw intelligence tests like ARC and HLE. OpenAI is in this to win the competitive race, not the open one.
> They might also be focusing all their work on beating Grok 4 now,
With half the key team members they had a month prior
I'm starting to think talent is way less concentrated in these individuals than execs would have investors believe. While all those people who left OpenAI certainly have the ability to raise ridiculous sums of venture capital in all sorts of companies, Anthropic remains the only offspring that has actually reached a level where they can go head-to-head with OpenAI. Zuck now spending billions on snatching those people seems more like a move out of desperation than a real plan.
They've kind of played themselves making "genius engineers" their competitive advantage. Anyone can hire those engineers!
At this point, it seems to be more engineering throughput that will decide short to medium term outcomes. I’ve yet to see a case where an IC who took a position only because of X in fact outrageous compensation package (especially if not directly tied to longterm company performance through equity) was ever productive again. Meta certainly doesn’t strike me as a company that attracts talent for their “mission.”
TLDR Zuck’s recent actions definitely smell like a predictable failure driven by desperation to me.
Yet it suspiciously can't draw a pelican?
simonw is going to force every competitive LLM to over-ingest cartoon svg pelicans before this is over!
Btw why is there no k2 discussion on HN? Isn’t it pretty huge news?
There is, but it’s not on the front page so you don’t find it unless you go through multiple pages or manually search it up.
Moonshot ai has released banger models without much noise about it. Like for example Kimi K1.5, it was quite impressive at the time
Why don’t you start one?
had to search for the discussion, it's here, seems like nobody noticed it and it only couple hundred upvotes.
Here: https://news.ycombinator.com/item?id=44533403
You've been shadowbanned for saying some things that go against the prevailing groupthink, so all your comments within the last couple of months are invisible for most users.
I really think it's disrespectful towards honest users (excluding spammers and obvious trolls), but I don't pay HN's moderation bills...
How can you tell the user been shadowbanned?
check his comment history. it's all [flagged]
This could be it, especially since they announced last week that it would be the best open-source model.
Technically they were right when they said it, in their minds. Things are moving so fast that in a week, it will be true again.
Every openai model since gpt4 has been behind the curve by miles
Am I the only one who thinks mention of “safety tests” for LLMs is a marketing scheme? Cars, planes and elevators have safety tests. LLMs don’t. Nobody is going to die if a LLM gives an output that its creators do not like, yet when they say “safety tests”, they mean that they are checking to what extent the LLM will say things they do not like.
An LLM can trivially instruct someone to take medications with adverse interactions, steer a mental health crisis toward suicide, or make a compelling case that a particular ethnic group is the cause of your society's biggest problem so they should be eliminated. Words can't kill people, but words can definitely lead to deaths.
That's not even considering tool use!
Part of the problem is due to the marketing of LLMs as more capable and trustworthy than they really are.
And the safety testing actually makes this worse, because it leads people to trust that LLMs are less likely to give dangerous advice, when they could still do so.
Spend 15 minutes talking to a person in their 20's about how they use ChatGPT to work through issues in their personal lives and you'll see how much they already trust the "advice" and other information produced by LLMs.
Manipulation is a genuine concern!
Netflix needs to do a Black Mirror episode where either a sentient AI pretends that it's "dumber" than it is while secretly plotting to overthrow humanity. Either that or a LLM is hacked by deep state actors that provides similar manipulated advice.
One of the story arcs in “The Phoenix” by Osama Tezuka is on a similar topic.
It's not just young people. My boss (originally a programmer) agreed with me that there's lots of problems using ChatGPT for our products and programs as it gives the wrong answers too often, but tgen 30 seconds later told me that it was apparently great at giving medical advice.
...later someone higher-up decided that it's actually great at programming as well, and so now we all believe it's incredibly useful and necessary for us to be able to do our daily work
Most doctors will prescribe antibiotics for viral infections just to get you out and the next guy in, they have zero interest in sitting there to troubleshoot with you.
For this reason o3 is way better than most of the doctors I've had access to, to the point where my PCP just writes whatever I brought in because she can't follow 3/4 of it.
Yes, the answers are often wrong and incomplete, and it's up to you to guide the model to sort it out, but it's just like vibe coding: if you put in the steering effort, you can get a decent output.
Would it be better if you could hire an actual professional to do it? Of course. But most of us are priced out of that level of care.
> Most doctors will prescribe antibiotics for viral infections just to get you out and the next guy in
Where do you get this data from?
Family in my case. There are two reasons they do this. A lot of people like medicine - they think it justifies the cost of the visit, and there's a real placebo effect (which is not an oxymoron as many might think).
The second is that many viral infections can, in rare scenarios, lead to bacterial infections. For instance a random flu can leave one more susceptible to developing pneumonia. Throwing antibiotics at everything is a defensive measure to help ward of malpractice lawsuits. Even if frivolous, it's something no doctor wants to deal with, but some absurd number - something like 1 in 15 per year, will.
Lived experience. I'm not in the US and neither are most doctors.
I can co-sign this being bi-coastal. in the US not once have I or my 12-year old kid been prescribed antibiotics. on three ocassions in europe I had to take my kid to the doctor and each time antibiotics were prescribed (never consumed)
Your claim of most here is not only unsupported, it's completely wrong.
I'd like to see your support for that very confident take.
In my experience it's not only correct, but so common that it's hard not to get a round of antibiotics to go.
The only caveat is that I'm in the EU, not the US.
LLMs are really good at medical diagnostics, though…
Can you point to a specific bit of marketing that says to take whatever medications a LLM suggests, or other similar overreach?
People keep talking about this “marketing”, and I have yet to see a single example.
This is analogous to saying a computer can be used to do bad things if it is loaded with the right software. Coincidentally, people do load computers with the right software to do bad things, yet people are overwhelmingly opposed to measures that would stifle such things.
If you hook up a chat bot to a chat interface, or add tool use, it is probable that it will eventually output something that it should not and that output will cause a problem. Preventing that is an unsolved problem, just as preventing people from abusing computers is an unsolved problem.
> This is analogous to saying a computer can be used to do bad things if it is loaded with the right software.
It's really not. Parent's examples are all out-of-the-box behavior.
As the runtime of any program approaches infinity, the probability of the program behaving in an undesired manner approaches 1.
That is not universally true. The yes program is a counter example:
https://www.man7.org/linux/man-pages/man1/yes.1.html
Devil's advocate:
(1) Execute yes (with or without arguments, whatever you desire).
(2) Let the program run as long as you desire.
(3) When you stop desiring the program to spit out your argument,
(4) Stop the program.
Between (3) and (4) some time must pass. During this time the program is behaving in an undesired way. Ergo, yes is not a counter example of the GP's claim.
I upvoted your reply for its clever (ab)use of ambiguity to say otherwise to a fairly open and shut case.
That said, I suspect the other person was actually agreeing with me, and tried to state that software incorporating LLMs would eventually malfunction by stating that this is true for all software. The yes program was an obvious counter example. It is almost certain that all LLMs will eventually generate some output that is undesired given that it is determining the next token to output based on probabilities. I say almost only because I do not know how to prove the conjecture. There is also some ambiguity in what is a LLM, as the first L means large and nobody has made a precise definition of what is large. If you look at literature from several years ago, you will find people saying 100 million parameters is large, while some people these days will refuse to use the term LLM to describe a model of that size.
Thanks, it was definitely tongue-in-cheek. I agree with you on both counts.
The society has accepted that computers bring more benefit than harm, but LLMs could still get pushback due to bad PR.
PDFs can do this too.
In such a case, the author of the PDF can be held responsible.
Radical idea: let’s hold the reader responsible for the actions they take from the material.
So we should hold my grandmother responsible for the phishing emails she gets? Hmm.
Radical rebuttal of this idea: if you hire an assassin then you are responsible too (even more so, actually), even if you only told them stuff over the phone.
I don’t see the connection. Publishing != hiring.
Then I don't see the connection in your idea. Answering questions != publishing.
Twitter does it at scale.
> An LLM can trivially make a compelling case that a particular ethnic group is the cause of your society's biggest problem so they should be eliminate
This is an extraordinary claim.
I trust that the vast majority of people are good and would ignore such garbage.
Even assuming that an LLM can trivially build a compelling case to convince someone who is not already murderous to go on a killing spree to kill a large group of people, one killer has limited impact radius.
For contrast, many books and religious texts, have vastly more influence and convincing power over huge groups of people. And they have demonstrably caused widespread death or other harm. And yet we don’t censor or ban them.
> An LLM can trivially instruct someone to take medications with adverse interactions,
What’s an example of such a medication that does not require a prescription?
How about just telling people that drinking grapefruit juice with their liver medicine is a good idea and to ignore their doctor.
Tylenol.
Oil of wintergreen?
Yeah, give it access to some bitcoin and the internet, and it can definitely cause deaths.
Yes, and a table saw can take your hand. As can a whole variety of power tools. That does not render them illegal to sell to adults.
No but they have guards on them.
It dose render them illigal to sell without studying their safety.
An interesting comparison.
Table saws sold all over the world are inspected and certified by trusted third parties to ensure they operate safely. They are illegal to sell without the approval seal.
Moreover, table saws sold in the United States & EU (at least) have at least 3 safety features (riving knife, blade guard, antikickback device) designed to prevent personal injury while operating the machine. They are illegal to sell without these features.
Then of course there are additional devices like sawstop, but it is not mandatory yet as far as I'm aware. Should be in a few years though.
LLMs have none of those board labels or safety features, so I'm not sure what your point was exactly?
An LLM is not gonna chop of your limb. You can’t use it to attack someone.
They are somewhat self regulated, as they can cause permament damage to the company that releases them, and they are meant for general consumers without any training, unlike table saws that are meant for trained people.
An example is the first Microsoft bot that started to go extreme rightwing when people realized how to make it go that direction. Grok had a similar issue recently.
Google had racial issues with its image generation (and earlier with image detection). Again something that people don't forget.
Also an OpenAI 4o release was encouraging stupid things to people when they asked stupid questions and they just had to roll it back recently.
Of course I'm not saying that that's the real reason (somehow they never say that the problem is with performance for not releasing stuff), but safety matters with consumer products.
> They are somewhat self regulated, as they can cause permament damage to the company that releases them
And then you proceed to give a number of examples of that not happening. Most people already forgot those.
The problem is “safety” prevents users from using LLMs to meet their requirements.
We typically don’t critique the requirements of users, at least not in functionality.
The marketing angle is that this measure is needed because LLMs are “so powerful it would be unethical not to!”
AI marketers are continually emphasizing how powerful their software is. “Safety” reinforces this.
“Safety” also brings up many of the debates “mis/disinformation” brings up. Misinformation concerns consistently overestimate the power of social media.
I’d feel much better if “safety” focused on preventing unexpected behavior, rather than evaluating the motives of users.
The closed weights models from OpenAI already do these things though
Books can do this too.
There's a reason the inherititors of the coyright* refused to allow more copies of Mein Kampf to be produced until that copyright expired.
* the federal state of Bavaria
Was there? It seems like that was the perfect natural experiment then. So what was the outcome? Was there a sudden rash of holocausts the year that publishing started again?
Major book publishers have sensitivity readers that evaluate whether or not a book can be "safely" published nowadays. And even historically there have always been at least a few things publishers would refuse to print.
All it means is that the Overton window on "should we censor speech" has shifted in the direction of less freedom.
GP said major publishers. There's nothing stopping you from printing out your book and spiral binding it by hand, if that's what it takes to get your ideas into the world. Companies having standards for what they publish isn't censorship.
At the end of the day an LM is just a machine that talks. It might say silly things, bad things, nonsensical things, or even crazy insane things. But end the end of the day it just talks. Words don't kill.
LM safety is just a marketing gimmick.
We absolutely regulate which words you can use in certain areas. Take instructions on medicine for one example
[dead]
does your CPU, your OS, your web browser come with ~~built-in censorship~~ safety filters too?
AI 'safety' is one of the most neurotic twitter-era nanny bullshit things in existence, blatantly obviously invented to regulate small competitors out of existence.
It isn’t. This is dismissive without first thinking through the difference of application.
AI safety is about proactive safety. Such an example: if an AI model could be used to screen hiring applications, making sure it doesn’t have any weighted racial biases.
The difference here is that it’s not reactive. Reading a book with a racial bias would be the inverse; where you would be reacting to that information.
That’s the basis of proper AI safety in a nutshell
As someone who has reviewed people’s résumés that they submitted with job applications in the past, I find it difficult to imagine this. The résumés that I saw had no racial information. I suppose the names might have some correlation to such information, but anyone feeding these things into a LLM for evaluation would likely censor the name to avoid bias. I do not see an opportunity for proactive safety in the LLM design here. It is not even clear that they even are evaluating whether there is bias in such a scenario when someone did not properly sanitize inputs.
> I find it difficult to imagine this
Luckily, this is something that can be studied and has been. Sticking a stereotypically Black name on a resume on average substantially decreases the likelihood that the applicant will get past a resume screen, compared to the same resume with a generic or stereotypically White name:
https://www.npr.org/2024/04/11/1243713272/resume-bias-study-...
That is a terrible study. The stereotypically black names are not just stereotypically black, they are stereotypical for the underclass of trashy people. You would also see much higher rejection rates if you slapped stereotypical white underclass names like "Bubba" or "Cleetus" on resumes. As is almost always the case, this claim of racism in America is really classism and has little to do with race.
"Names from N.C. speeding tickets were selected from the most common names where at least 90% of individuals are reported to belong to the relevant race and gender group."
Got a better suggestion?
> but anyone feeding these things into a LLM for evaluation would likely censor the name to avoid bias
That should really be done for humans reviewing the resumes as well, but in practice that isn't done as much as it should be
If you're deploying LLM-based decision making that affects lives, you should be the one held responsible for the results. If you don't want to do due diligence on automation, you can screen manually instead.
Social media does. Even person to person communication has laws that apply to it. And the normal self-censorship a normal person will engage in.
okay. and? there are no AI 'safety' laws in the US.
without OpenAI, Anthropic and Google's fearmongering, AI 'safety' would exist only in the delusional minds of people who take sci-fi way too seriously.
https://en.wikipedia.org/wiki/Regulatory_capture
for fuck's sake, how more obvious could they be? sama himself went on a world tour begging for laws and regulations, only to purge safetyists a year later. if you believe that he and the rest of his ilk are motivated by anything other than profit, smh tbh fam.
it's all deceit and delusion. China will crush them all, inshallah.
iOS certainly does by limiting you to the App Store and restricring what apps are available there
They have been forced to open up to alternative stores in the EU. This is unequivocally a good thing, and a victory for consumer rights.
Especially since "safety" in this context often just means making sure the model doesn't say things that might offend someone or create PR headaches.
Don’t draw pictures of celebrities.
Don’t discuss making drugs or bombs.
Don’t call yourself MechaHitler… which I don’t care that while scenario was objectively funny on its sheer ridiculousness.
Sure it’s funny until some mentally unstable Nazi sympathizer goes and shoots up another synagogue. So funny.
I also think it's marketing but kind of for the opposite reason. Basically I don't think any of the current technology can be made safe.
Yes, perfection is difficult, but it's relative. It can definitely be made much safer. Looking at the analysis of pre vs post alignment makes this obvious, including when the raw unaligned models are compared to "uncensored" models.
It’s about safety for the LLM provider, not necessarily the user.
At my company (which produces models) almost all the responsible AI jazz is about DEI and banning naughty words. Little actions on preventing bad outcomes
> Am I the only one who thinks mention of “safety tests” for LLMs is a marketing scheme?
It is. It is also part of Sam Altman’s whole thing about being the guy capable of harnessing the theurgical magicks of his chat bot without shattering the earth. He periodically goes on Twitter or a podcast or whatever and reminds everybody that he will yet again single-handedly save mankind. Dude acts like he’s Buffy the Vampire Slayer
I hope the same people questioning ai safety (which is reasonable) don’t also hold concern on Grok due to the recent incident.
You have to understand that a lot of people do care about these kind of things.
Why is your definition of safety so limited? Death isn't the only type of harm...
There are other forms of safety, but whether a digital parrot says something that people do not like is not a form of safety. They are abusing the term safety for marketing purposes.
You're abusing the terms by picking either the overly limited ("death") or overly expansive ("not like") definitions to fit your conclusion. Unless you reject the fact that harm can come from words/images, a parrot can parrot harmful words/images, so be unsafe.
it's like complaining about bad words in the dictionary
the bot has no agency, the bot isn't doing anything, people talk to themselves, augmenting their chain of thought with an automated process. If the automated process is acting in an undesirable manner, the human that started the process can close the tab.
Which part of this is dangerous or harmful?
The maxim “sticks and stones can break my bones, but words can never hurt me” comes to mind here. That said, I think this misses the point that the LLM is not a gatekeeper to any of this.
I find it particularly irritating that the models are so overly puritan that they refuse to translate subtitles because they mention violence.
Don't let your mind potential be limited by such primitive slogans!
You could be right about this being an excuse for some other reason, but lots of software has “safety tests” beyond life or death situations.
Most companies, for better or worse (I say for better) don’t want their new chatbot to be a RoboHitler, for example.
It is possible to turn any open weight model into that with fine tuning. It is likely possible to do that with closed weight models, even when there is no creator provided sandbox for fine tuning them, through clever prompting and trying over and over again. It is unfortunate, but there really is no avoiding that.
That said, I am happy to accept the term safety used in other places, but here it just seems like a marketing term. From my recollection, OpenAI had made a push to get regulation that would stifle competition by talking about these things as dangerous and needing safety. Then they backtracked somewhat when they found the proposed regulations would restrict themselves rather than just their competitors. However, they are still pushing this safety narrative that was never really appropriate. They have a term for this called alignment and what they are doing are tests to verify alignment in areas that they deem sensitive so that they have a rough idea to what extent the outputs might contain things that they do not like in those areas.
> Nobody is going to die
Callous. Software does have real impact on real people.
Ex: https://news.ycombinator.com/item?id=44531120
It's overblown. Elon shipped Hitler grok straight to prod
Nobody died
Playing devil's advocate, what if it was more subtle?
Prolonged use of conversational programs does reliably induce certain mental states in vulnerable populations. When ChatGPT got a bit too agreeable, that was enough for a man to kill himself in a psychotic episode [1]. I don't think this magnitude of delusion was possible with ELIZA, even if the fundamental effect remains the same.
Could this psychosis be politically weaponized by biasing the model to include certain elements in its responses? We know this rhetoric works: cults have been using love-bombing, apocalypticism, us-vs-them dynamics, assigned special missions, and isolation from external support systems to great success. What we haven't seen is what happens when everyone has a cult recruiter in their pocket, waiting for a critical moment to offer support.
ChatGPT has an estimated 800 million weekly active users [2]. How many of them would be vulnerable to indoctrination? About 3% of the general population has been involved in a cult [3], but that might be a reflection of conversion efficiency, not vulnerability. Even assuming 5% are vulnerable, that's still 40 million people ready to sacrifice their time, possessions, or even their lives in their delusion.
[1] https://www.rollingstone.com/culture/culture-features/chatgp...
[2] https://www.forbes.com/sites/martineparis/2025/04/12/chatgpt...
[3] https://www.peopleleavecults.com/post/statistics-on-cults
You’re worried about indoctrination in an LLM but it starts much earlier than that. The school system is indoctrination of our youngest minds, both today in the West and its Prussian origins
https://today.ucsd.edu/story/education-systems-were-first-de...
We should fix both systems. I don’t want Altman’s or Musk’s opinions indoctrinating
My hobby: monetizing cynicism.
I go on Polymarket and find things that would make me happy or optimistic about society and tech, and then bet a couple of dollars (of some shitcoin) against them.
e.g. OpenAI releasing an open weights model before September is trading at 81% at time of writing - https://polymarket.com/event/will-openai-release-an-open-sou...
Last month I was up about ten bucks because OpenAI wasn't open, the ceasefire wasn't a ceasefire, and the climate metrics got worse. You can't hedge away all the existential despair, but you can take the sting out of it.
> go on Polymarket and find things that would make me happy or optimistic about society and tech, and then bet a couple of dollars (of some shitcoin) against them.
Classic win win bet. Your bet wins -> you make money (win). Your bet loses -> something good happened for society (win).
My friend does this and calls it “hedging humanity”. Every time some big political event has happened that bums me out, he’s made a few hundred.
people still use crypto? I thought the hype died around the time when AI boomed.
People use crypto on Polymarket because it doesn't comply with gambling regulations, so in theory isn't allowed to have US customers. Using crypto as an intermediary lets Polymarket pretend not to know where the money is coming from. Though I think a more robust regulator would call them out on the large volume of betting on US politics on their platform...
> a more robust regulator would call them out
Calling them out is one thing, but do you think the US could realistically stop them?
I don't know much about Polymarkets governance structure, if it's a decentralized smart contract, the US is DOA. Even if it's not... the Pirate Bay wasn't, the US really tried to stop them, and they basically didn't get anywhere.
Looking it up, it seems like the CEO actually got raided by the FBI last year: https://www.reuters.com/world/us/fbi-raids-polymarket-ceos-h... So maybe the wheels of justice are just grinding a bit slowly. The terms of use want to have Panamanian law apply https://polymarket.com/tos but that doesn't provide much protection when the company is physically operating in the US.
Bitcoin is higher than ever. People can't wait until it gets high enough that they can sell it for dollars, and use those dollars to buy things and make investments in things that are valuable.
> Bitcoin is higher than ever
That's just speculation though. I saw a cynical comment on reddit yesterday that unfortunately made a lot of sense. Many people now are just so certain that the future of work is not going to include many humans, so they're throwing everything into stocks and crypto, which is why they remain so high even in the face of so much political uncertainty. It's not that people are investing because they have hope. People are just betting everything as a last ditch survival attempt before the robots take over.
Of course this is hyperbolic - market forces are never that simple. But I think there might be some truth to it.
What does 'just' mean here? The monetary value of a thing is what people will pay you for it. Full stop.
The question was about people using crypto to buy things. The person above me was implying that because it's going up in value, people are using it that way. I replied to say that it's (mostly) just speculation. Which is a kind of use, but not the one being implied.
The irony is that these people think their rights as shareholders will be respected in this future world.
It's a bit like when peasants had to sell their land for cash and ended up enslaved working their own land.
It'll work at first but it's just what the parent poster said: a last ditch effort of the desperate and soon to be desperater.
Wait until you hear about this thing called gold and how its price behaves during periods of uncertainty.
Gold (and land, jewels, antiques etc.) are bought and held during times of uncertainty because people believe they will retain their value through virtually anything. Stocks don't work that way. In times of uncertainty, gold should increase in value, stocks should decrease.
Crypto is high because people keep believing some sucker in the future will buy it from them even higher. So far, they've been right all along. You really think crypto is ever going to pay some kind of dividend?
Isn't that equivalent to saying the USD won't pay dividends. Correct, but also not the point. I say this as someone with no crypto ownership.
There is always demand for USD, it is the only way to pay taxes in the US.
Does gold pay dividends? Would you say it's a bad investment?
I'm also someone who owns zero crypto.
people use crypto for speculation, and for (semi)illegal purposes
only a small percentage of use is for actual legitimate money transfers
Unfortunatley crypto hype is still high; and I think still on the up, but that's vibes not market analysis.
"Gambling can be addictive. Please gamble responsibly. You must be 18 years or older to gamble. If you need help, please contact your local gambling advice group or your doctor"
To be completely and utterly fair, I trust Deepseek and Qwen (Alibaba) more than American AI companies.
American AI companies have shown they are money and compute eaters, and massively so at that. Billions later, and well, not much to show.
But Deepseek cost $5M to develop, and made multiple novel ways to train.
Oh, and their models and code are all FLOSS. The US companies are closed. Basically, the US ai companies are too busy treating each other as vultures.
> But Deepseek cost $5M to develop, and made multiple novel ways to train
This is highly contested, and was either a big misunderstanding by everyone reporting it, or maliciously placed there (by a quant company, right before the stock fell a lot for nvda and the rest) depending on who you ask.
If we're being generous and assume no malicious intent (big if), anyone who has trained a big model can tell you that the cost of 1 run is useless in the big scheme of things. There is a lot of cost in getting there, in the failed runs, in the subsequent runs, and so on. The fact that R2 isn't there after ~6 months should say a lot. Sometimes you get a great training run, but no-one is looking at the failed ones and adding up that cost...
They were pretty explicit that this was only the cost in GPU hours to USD for the final run. Journalists and Twitter tech bros just saw an easy headline there. It's the same with Clair Obscur developer's Sandfall, where the people say that the game was made by 30 people, when there were 200 people involved.
These "200 people" were counted from credits which list pretty much everyone who even sniffed at the general direction of the studio's direction. The studio itself is ~30 people (just went and check on their website, they have a team list with photos for everyone). The rest are contractors whose contributions usually vary wildly. Besides, credits are free so unless the the company are petty (see Rockstar not crediting people on their games if they leave before the game is released even if they worked on it for years) people err on the site on crediting everyone. Personally i've been credited on a game that used a library i wrote once and i learned about it years after the release.
Most importantly those who mention that the game was made by 30 people do it to compare it with other much larger teams with hundreds if not thousands of people and those teams use contractors too!
> They were pretty explicit that this was only the cost in GPU hours to USD for the final run.
The researchers? Yes.
What followed afterwards, I'm not so sure. There was clearly some "cheap headlines" in the media, but there were also some weird coverage being pushed everywhere, from weird tlds, and they were all pushing nvda dead, cheap deepseek, you can run it on raspberries, etc. That might have been a campaign designed to help short the stocks.
Actually the majority of Google models are open source and they also were pretty fundamental in pushing a lot of the techniques in training forward - working in the AI space I’ve read quite a few of their research papers and I really appreciate what they’ve done to share their work and also release their models under licenses that allow you to use them for commercial purposes.
"Actually the majority of Google models are open source"
That's not accurate. The Gemini family of models are all proprietary.
Google's Gemma models (which are some of the best available local models) are open weights but not technically OSI-compatible open source - they come with usage restrictions: https://ai.google.dev/gemma/terms
You’re ignoring the T5 series of models that were incredibly influential, the T5 models and their derivatives (FLAN-T5, Long-T5, ByT5, etc) have been downloaded millions of times on huggingface and are real workhorses. There are even variants still being produced within the last year or so.
A yea the Gemma series is incredible and while maybe not meeting the standards of OSI - I consider them to be pretty open as far as local models go. And it’s not just the standard Gemma variants, Google is releasing other incredible Gemma models that I don’t think people have really even caught wind of yet like MedGemma, of which the 4b variant has vision capability.
I really enjoy their contributions to the open source AI community and think it’s pretty substantial.
$5 million was the gpu hour cost of a single training run.
Exactly. Not to minimize Deepseeks tremendous achievement, but that $5 million was just for the training run, not the GPUs used they purchased before, and all the OpenAI API calls they likely used to assist in synthetic data generation.
> But Deepseek cost $5M to develop
Not true. It was $5M to train - it was many more millions in R&D.
Wasn’t that figure just the cost of the GPUs and nothing else?
Yeah, I hate that this figure keeps getting thrown around. IIRC, it's the price of 2048 H800s for 2 months at $2/hour/GPU. If you consider months to be 30 days, that's around $5.7M, which lines up. What doesn't line up is ignoring the costs of facilities, salaries, non-cloud hardware, etc. which will dominate costs, I'd expect. $100M seems like a fairer estimate, TBH. The original paper had more than a dozen authors, and DeepSeek had about 150 researchers working on R1, which supports the notion that personnel costs would likely dominate.
>ignoring the costs of facilities, salaries, non-cloud hardware, etc.
If you lease, those costs are amortized. It was definitely more than $5M, but I don't think it was as high as $100M. All things considered, I still believe Deepseek was trained at one (perhaps two) orders of magnitude lower cost than other competing models.
Perhaps. Do you think DeepSeek made use of those competing models at all in order to train theirs?
I believe so, but have no proof obviously.
That is also just the final production run. How many experimental runs were performed before starting the final batch? It could be some ratio like 10 hours of research to every one hour of final training.
It was more than $5m
https://interestingengineering.com/culture/deepseeks-ai-trai...
> American AI companies have shown they are money and compute eaters
Don't forget they also quite literally eat books
Who is literally eating books?
Parent is referencing the recent court case with Anthropic, and the legal requirement of not copying books, but consuming them- translating to Anthropic having to destroy every book it uses as input data in order to comply with said requirements.
Deepseek R1 was trained at least partially on the output of other LLMs. So, it might have been much more expensive if they needed to do it themselves from scratch.
Lawsuit, since it was against OpenAI TOS: https://hls.harvard.edu/today/deepseek-chatgpt-and-the-globa...
> Billions later, and well, not much to show.
This is obviously false, I'm curious why you included it.
> Oh, and their models and code are all FLOSS.
No?
Deepseek is far more worthy of the name OpenAI than Sam Altman's ClosedAI.
Probably the results were worse than K2 model released today. No serious engineer would say it's for "safety" reasons given that ablation nullifies any safety post-training.
I'm expecting (and indeed hoping) that the open weights OpenAI model is a lot smaller than K2. K2 is 1 trillion parameters and almost a terabyte to download! There's no way I'm running that on my laptop.
I think the sweet spot for local models may be around the 20B size - that's Mistral Small 3.x and some of the Gemma 3 models. They're very capable and run in less than 32GB of RAM.
I really hope OpenAI put one out in that weight class, personally.
Early rumours (from a hosting company that apparently got early access) was that you'd need "multiple h100s to run it", so I doubt it's a gemma - mistral small tier model..
You will get at 20gb model. Distillation is so compute efficient that it’s all but inevitable that if not OpenAI, numerous other companies will do it.
I would rather have an open weights model that’s the best possible one I can run and fine tune myself, allowing me to exceed SOTA models on the narrower domain my customers care about.
What is their business purpose for releasing an open-weights model? How does it help them? I asked an LLM but it just said vague unconvincing things about ecosystem plays and fights for talent.
PR
Pointless security theatre. The community worked out long ago how to strip away any safeguards.
Whenever I read something similar I immediately remember how "Open"AI refused to release GTP2 XL at the time because it was "too powerful".
It's worth remembering that the safety constraints can be successfully removed, as demonstrated by uncensored fine-tunes of Llama.
> this is new for us
So much for the company that should never be new to that
Wow. Twitter is not a serious website anymore. Why are companies and professionals still using it? Is it really like that now, with all that noise from grok floating to the top?
My pet theory is that they delayed this because grok-4 released because they explicitly want to not be seen as competing with them by pulling the usual trick of releasing right around when google does. Feels like a very sam altman move in my model of his mind.
Delays aside, I wonder what kind of license they're planning to use for their weights.
Will it be restricted like Llama, or fully open like Whisper or Granite?
Probably ClosedAI's model was not as good as some of the models being released now. They are delaying it to do some last minute benchmark hacking.
Honestly, they’re distancing themselves optically/temporally from HerrGrokler newslines
Maybe they’re making last minute changes to compete with Grok 4?
we'll never hear about this again