Why Existing AI Writing Tools Weren’t Enough

Artificial intelligence and human interaction concept for AI research paper writing and academic research

This isn’t the first attempt at AI research paper writing. But previous systems had a pretty significant limitation. Tools like AI Scientist were built to run their own experiments and then write about them. You couldn’t hand them your data from outside their pipeline and expect a paper. Other tools focused on just one piece, like generating literature reviews or helping with citations, but couldn’t produce a full manuscript on their own.

The result was a clear gap. No automated research paper generator could take the kind of unstructured, real-world materials a researcher actually has after finishing experiments and produce a complete, rigorous paper independently. Google PaperOrchestra is built specifically to fill that gap.

Five Agents, One Complete Paper

What makes PaperOrchestra different from a single-prompt AI writing tool is how it’s structured. It’s a multi-agent AI writing system, meaning five specialised agents each handle a specific part of the process. Two of them run in parallel to keep things moving faster.

Here’s what each agent actually does:

Outline Agent reads your idea summary, experiment log, and the conference template, then builds a structured writing plan with a visualisation strategy and section-level citation hints

Plotting Agent generates all figures and diagrams using a Vision-Language Model that reviews and revises images until they meet the design objectives

Literature Review Agent finds relevant papers, verifies every single one through the Semantic Scholar API, discards hallucinated references, and drafts the Introduction and Related Work sections with a hard rule that 90% of verified citations must appear in the paper

Section Writing Agent writes the abstract, methodology, experiments, and conclusion using everything generated so far, pulling numbers directly from your experiment logs to build tables

Content Refinement Agent runs a simulated peer-review loop, revising the manuscript iteratively and only accepting changes that actually improve the quality score

The whole pipeline makes around 60 to 70 LLM API calls and finishes in a mean of 39.6 minutes. For a complete, formatted manuscript. That’s genuinely fast for what it’s doing.

Does the Output Actually Hold Up?

Google tested PaperOrchestra against 200 accepted papers from CVPR 2025 and ICLR 2025. The system was given the same raw inputs a real researcher would have, and its output was compared directly to what the actual human authors wrote.

In automated side-by-side comparisons, it outperformed the strongest AI baseline by 39% to 86% on overall paper quality. On literature review quality, the win margins were 88% to 99%. Human evaluators, 11 AI researchers doing 180 paired comparisons, confirmed the same results. PaperOrchestra beat AI baselines by 14% to 38% on overall manuscript quality.

The citation numbers tell an interesting story too. Competing AI systems averaged 9 to 14 citations per paper. This automated research paper generator averaged 45 to 48, much closer to the roughly 59 citations found in human-written papers. And it wasn’t just about volume. It identified the broader references that signal genuine scholarly depth, not just the obvious ones every system picks up.

In simulated acceptance rate tests, it hit 84% on CVPR and 81% on ICLR. Human-written papers sat at 86% and 94% respectively. Close enough to be taken seriously as a real AI research paper writing tool.

What Researchers Should Actually Know Before Using It

A few honest things worth knowing before getting too excited. PaperOrchestra cannot fabricate experimental results. It works only with what you give it. If your experiment logs don’t contain certain data, the system is specifically instructed not to invent it. The richer and more detailed your idea summary, the better the output. Sparse inputs produce noticeably weaker papers compared to detailed methodology descriptions.

The human researcher also stays fully responsible for accuracy, originality, and validity. This multi-agent AI writing system is an assistive tool, not a replacement for doing the actual research. The thinking still happens on your side. What gets easier is the translation of that thinking into a structured, submission-ready manuscript.

For researchers who currently spend weeks on the writing phase after experiments are done, that’s still a meaningful shift. The part that stays hard is the research itself. The part that gets faster is getting it on paper.

Google PaperOrchestra is available on arXiv with full technical details. The team also introduced PaperWritingBench alongside it, the first standardised benchmark specifically for evaluating AI research paper writing tools, which is useful well beyond just this one system.