Astro/Solid - Hacker News

$phantompeace 2 hours ago

Considering the very small difference between just SFT on the student model as compared to SFT + DPO on a proxy, doesn't it make sense to concentrate on ensuring the SFT dataset is perfect rather than sorry about DPO etc? And just train directly on the student model?

$StreamCtx an hour ago

“Relevant to anyone building failure-attribution systems for agent pipelines — black-box distillation techniques here could feed into causal attribution models without needing white-box access to the underlying model.”

$Alifatisk 12 hours ago

Why is this published again? Is this a reference to recent events?

[-]

$babelfish 11 hours ago

I just saw some post about it on Threads and found it interesting so decided to share!

[-]

$tough 7 hours ago

My best guess is this is a reference to the recent accusations from Anthropic of chinese labs ¨distilling¨ on their models

[-]

$swingboy an hour ago

And it’s a paper from Alibaba researchers, the company/lab that Anthropic called out by name.

$dmezzetti 11 hours ago

Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

Related paper that's a good read: https://arxiv.org/abs/1908.08962

$duendefm 13 hours ago

The Chinese are really going strong on destroying the American AI economy bubble. Honestly, despite the fact that I'm totally pro USA and anti China, I think we should help them crashing the American AI bubble. They are controlling everything and we can't even buy a new computer nowadays while getting no benefit from this. I wish some influential programmers stimulated coders everywhere to skip Claude and Chatgpt subscriptions for Chinese ones, at scale. If we programmers united we could help this bubble burst, I'm sure.

[-]

$laichzeit0 an hour ago

The US government will do the job of destroying the American AI economy through their export controls.

$anax32 3 hours ago

The US "product machine" is so strong. They really know how to do frictionless signup and vendor lock-in on the corporate side.

$nozzlegear 12 hours ago

> skip Claude and Chatgpt subscriptions for Chinese ones, at scale. If we programmers united we could help this bubble burst, I'm sure.

I'm doing my part!

$linolevan 12 hours ago

Can we note that this is a 2024 paper in the title?

$modgate 4 hours ago

test comment from modgate

Knowledge Distillation of Black-Box Large Language Models (2024)