> Conventional research papers require readers to invest substantial effort to understand and adapt a paper's code, data, and methods to their own work [...]
But that's the point! If we take out the effort to understand, really understand something on a deeper level from even research, then how can there be anything useful build on top of it?
Is everything going to loose any depth and become shallow?
They’re not talking about removing any effort to understand a paper, but to lower it for the same level of understanding.
If more effort required to reach the same understanding was a good thing we should be making papers much harder to read than they currently are.
Why are the specific things they are doing a problem? Automatically building pipelines and code described in a paper, checking it matches the reported results then being able to execute it for queries the user has - is that a bad thing for understanding?
I'm just imaging someone trying to defend their PhD or comprehensive and only having surface level knowledge of the shit their AI bot has cited for them.
Fyi - this is actually happening right meow. And most young profs are writing their grants using ai. The biggest issue with the latter? It's hard to tell the difference with how many grants are just rehashing the same stuff over and over.
Isn't this also a problem given that ChatGPT at least is bad a summarizing scientific papers[1]? Idk about Claude or Gemenai with that though. Still a problem.
This study seemed to be before the reasoning models came out. With them I have the opposite problem. I ask something simple and it responds with what reads like a scientific paper.
Talk to engineers, they just fear research papers. It's important to have alternate ways of consuming research. Then maybe some engineers will jump the fence and start taking the habit of reading papers.
This is what's so depressing about the Apple Intelligence or Gemini ads for consumer AI. Everything they tell us an AI can do for us, like make up a bedtime story for our kids, or write a letter from a kid to his/her hero, or remember someone's name who you forgot from earlier, or sum up a presentation you forgot to read.
Isn't the point to put the time into those things? At some point aren't those the things one should choose to put time into?
Easy to say but have you ever read a paper and then a summary or breakdown of that paper by an actual person? Or compare a paper that you do understand very well with how you would explain it in a blog post.
The academic style of writing is almost purposefully as obtuse and dense and devoid of context as possible. Academia is trapped in all kinds of stupid norms.
Yes, I will get right on that. I believe that killing you is the right strategy to help you escape from a world where AI takes over every aspect of human existence in such a way that all those aspects are degraded.
> Is it that crazy? He's doing exactly what the AI boosters have told him to do.
I think we're starting to see the first real AI "harms" shake out, after some years of worrying it might swear or tell you how to make a molotov cocktail.
People are getting convinced, by hype men and by sycophantic LLMs themselves, that access to a chatbot suddenly grants them polymath abilities in any field, and are acting out there dumb ideas without pushback, until the buck finally stops, hopefully with just some wasted time and reputation damage.
People should of course continue to use LLMs as they see fit - I just think the branding of work like this gives the impression that they can do more than they can, and will encourage the kind of behavior I mention.
I went looking for how they define "agent" in the paper:
> AI agents are autonomous systems that can reason about tasks and act to achieve goals by leveraging external tools and resources [4]. Modern AI agents are typically powered by large language models (LLMs) connected to external tools or APIs. They can perform reasoning, invoke specialized models, and adapt based on feedback [5]. Agents differ from static models in that they are interactive and adaptive. Rather than returning fixed outputs, they can take multi-step actions, integrate context, and support iterative human–AI collaboration. Importantly, because agents are built on top of LLMs, users can interact with agents through human language, substantially reducing usage barriers for scientists.
So more-or-less an LLM running tools in a loop. I'm guessing "invoke specialized models" is achieved here by running a tool call against some other model.
LLM running tools in a loop is the core idea of ReAct agents, and is indeed one of the most effective way to extract value from a generative AI. Ironically, it's not about generation at all, we use the models classification skills to pick tools and text processing skills to take the context into account.
With your definitions of agents as running tools in a loop, do you have high hopes for multi-tool agents being feasible from a security perspective? Seems like they'll need to be locked down
That's a problem discussed in the industry. Currently LLM frameworks don't give enough structure when it comes to the agent authorization, sadly. But it will come.
I think the rule still applies that you should consider any tools as being under the control of anyone who manages to sneak instructions into your context.
Which is a pretty big limitation in terms of things you can safely use them for!
I tried it a few weeks ago. Wasn't very impressed with the resulting code compared to me manually working with an LLM and an uploaded research paper, which takes less time and costs less.
Notable in that this is research out of the genomics lab at Stanford - it’s likely that an ml practitioner could do better with a more hands on approach - but demonstrating some end to end work on genomics implementations as they do in th paper is pretty cool. Seems helpful.
So that I understand, is the idea that you point this tool at a GitHub repository, it figures out how to install and run it (figures out the build environment, installs any dependencies, configures the app, etc), plus it figures out how to interact with it, and then you send it queries via a chatbot?
Does it take only the repository as input, or does it also consume the paper itself?
Science is a collaborative process that occurs between humans that already have a specific shared language to discuss their problem domain.
Research papers use deliberately chosen language.
This transmission is expert to expert.
Inserting a generic statistical model between two experts can only have negative effects.
It might be useful for a casual observer who wants an overview, but this is what the abstract already is!.
Asking for a pipeline described in a paper to be run over a new set of inputs is not a tool for a casual observer. I’m not sure what benefit making someone build that themselves would have for people with expertise in the field but not coding.
very good direction!. we have to put science in software asap, it is interesting to see the push back but there is no way we can proceed with the curent approach that ignores that we have computers to help..
A lot of people will dismiss with some of the usual AI complaints. I suspect they never did real research. Getting into a paper can be a really long endeavor. The notation might not be entirely self contained, or used in an alien or confusing way. Managing to get into it might finally yield that the results in the paper are not applicable to your own, a point that is often obscured intentionally to make it to publication.
Lowering the investment to understand a specific paper could really help focus on the most relevant results, on which you can dedicate your full resources.
Although, as of now I tend to favor approaches that only summarize rather than produce "active systems" -- with the approximate nature of LLMs, every step should be properly human reviewed. So, it's not clear what signal you can take out of such an AI approach to a paper.
Related, a few days ago: "Show HN: Asxiv.org – Ask ArXiv papers questions through chat"
> Conventional research papers require readers to invest substantial effort to understand and adapt a paper's code, data, and methods to their own work [...]
But that's the point! If we take out the effort to understand, really understand something on a deeper level from even research, then how can there be anything useful build on top of it? Is everything going to loose any depth and become shallow?
They’re not talking about removing any effort to understand a paper, but to lower it for the same level of understanding.
If more effort required to reach the same understanding was a good thing we should be making papers much harder to read than they currently are.
Why are the specific things they are doing a problem? Automatically building pipelines and code described in a paper, checking it matches the reported results then being able to execute it for queries the user has - is that a bad thing for understanding?
I'm just imaging someone trying to defend their PhD or comprehensive and only having surface level knowledge of the shit their AI bot has cited for them.
Fyi - this is actually happening right meow. And most young profs are writing their grants using ai. The biggest issue with the latter? It's hard to tell the difference with how many grants are just rehashing the same stuff over and over.
Isn't this also a problem given that ChatGPT at least is bad a summarizing scientific papers[1]? Idk about Claude or Gemenai with that though. Still a problem.
Edit: spelling.
[1]: https://arstechnica.com/ai/2025/09/science-journalists-find-...
This study seemed to be before the reasoning models came out. With them I have the opposite problem. I ask something simple and it responds with what reads like a scientific paper.
Talk to engineers, they just fear research papers. It's important to have alternate ways of consuming research. Then maybe some engineers will jump the fence and start taking the habit of reading papers.
This is what's so depressing about the Apple Intelligence or Gemini ads for consumer AI. Everything they tell us an AI can do for us, like make up a bedtime story for our kids, or write a letter from a kid to his/her hero, or remember someone's name who you forgot from earlier, or sum up a presentation you forgot to read.
Isn't the point to put the time into those things? At some point aren't those the things one should choose to put time into?
if you're making up stories for your kid, you're not spending enough time consuming media that apple can profit from.
Easy to say but have you ever read a paper and then a summary or breakdown of that paper by an actual person? Or compare a paper that you do understand very well with how you would explain it in a blog post.
The academic style of writing is almost purposefully as obtuse and dense and devoid of context as possible. Academia is trapped in all kinds of stupid norms.
Kill me now.
Yes, I will get right on that. I believe that killing you is the right strategy to help you escape from a world where AI takes over every aspect of human existence in such a way that all those aspects are degraded.
I'm still alive.
That is a very good point, and I am sorry.
"I Have No Mouth, and I Must Scream" — Harlan Ellison
https://www.are.na/block/26283461
Earlier today there was a post about someone submitting an incorrect AI generated bug report. I found one of the comments telling:
https://news.ycombinator.com/item?id=45331233
> Is it that crazy? He's doing exactly what the AI boosters have told him to do.
I think we're starting to see the first real AI "harms" shake out, after some years of worrying it might swear or tell you how to make a molotov cocktail.
People are getting convinced, by hype men and by sycophantic LLMs themselves, that access to a chatbot suddenly grants them polymath abilities in any field, and are acting out there dumb ideas without pushback, until the buck finally stops, hopefully with just some wasted time and reputation damage.
People should of course continue to use LLMs as they see fit - I just think the branding of work like this gives the impression that they can do more than they can, and will encourage the kind of behavior I mention.
I went looking for how they define "agent" in the paper:
> AI agents are autonomous systems that can reason about tasks and act to achieve goals by leveraging external tools and resources [4]. Modern AI agents are typically powered by large language models (LLMs) connected to external tools or APIs. They can perform reasoning, invoke specialized models, and adapt based on feedback [5]. Agents differ from static models in that they are interactive and adaptive. Rather than returning fixed outputs, they can take multi-step actions, integrate context, and support iterative human–AI collaboration. Importantly, because agents are built on top of LLMs, users can interact with agents through human language, substantially reducing usage barriers for scientists.
So more-or-less an LLM running tools in a loop. I'm guessing "invoke specialized models" is achieved here by running a tool call against some other model.
LLM running tools in a loop is the core idea of ReAct agents, and is indeed one of the most effective way to extract value from a generative AI. Ironically, it's not about generation at all, we use the models classification skills to pick tools and text processing skills to take the context into account.
With your definitions of agents as running tools in a loop, do you have high hopes for multi-tool agents being feasible from a security perspective? Seems like they'll need to be locked down
That's a problem discussed in the industry. Currently LLM frameworks don't give enough structure when it comes to the agent authorization, sadly. But it will come.
I think the rule still applies that you should consider any tools as being under the control of anyone who manages to sneak instructions into your context.
Which is a pretty big limitation in terms of things you can safely use them for!
I tried it a few weeks ago. Wasn't very impressed with the resulting code compared to me manually working with an LLM and an uploaded research paper, which takes less time and costs less.
Notable in that this is research out of the genomics lab at Stanford - it’s likely that an ml practitioner could do better with a more hands on approach - but demonstrating some end to end work on genomics implementations as they do in th paper is pretty cool. Seems helpful.
So that I understand, is the idea that you point this tool at a GitHub repository, it figures out how to install and run it (figures out the build environment, installs any dependencies, configures the app, etc), plus it figures out how to interact with it, and then you send it queries via a chatbot?
Does it take only the repository as input, or does it also consume the paper itself?
Science is a collaborative process that occurs between humans that already have a specific shared language to discuss their problem domain. Research papers use deliberately chosen language. This transmission is expert to expert. Inserting a generic statistical model between two experts can only have negative effects. It might be useful for a casual observer who wants an overview, but this is what the abstract already is!.
Asking for a pipeline described in a paper to be run over a new set of inputs is not a tool for a casual observer. I’m not sure what benefit making someone build that themselves would have for people with expertise in the field but not coding.
Who shaves the barber then? ;)
very good direction!. we have to put science in software asap, it is interesting to see the push back but there is no way we can proceed with the curent approach that ignores that we have computers to help..
What if you could sit down, have a beer and shoot the shit with Research Papers?
A lot of people will dismiss with some of the usual AI complaints. I suspect they never did real research. Getting into a paper can be a really long endeavor. The notation might not be entirely self contained, or used in an alien or confusing way. Managing to get into it might finally yield that the results in the paper are not applicable to your own, a point that is often obscured intentionally to make it to publication.
Lowering the investment to understand a specific paper could really help focus on the most relevant results, on which you can dedicate your full resources.
Although, as of now I tend to favor approaches that only summarize rather than produce "active systems" -- with the approximate nature of LLMs, every step should be properly human reviewed. So, it's not clear what signal you can take out of such an AI approach to a paper.
Related, a few days ago: "Show HN: Asxiv.org – Ask ArXiv papers questions through chat"
https://news.ycombinator.com/item?id=45212535
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
https://news.ycombinator.com/item?id=43796419