Hack and Grow

Google Built an AI That Writes Research Papers. Here’s What It Can Actually Do

AI research paper writing concept showing digital brain, data to research paper process and academic writing automation

Every researcher knows this feeling. The experiments are done. The data is sitting right there. But weeks of work still remain, turning messy notes, result tables, and half-formed ideas into a clean, properly cited, conference-ready manuscript. That’s the part where a lot of papers quietly stall.

Google Cloud AI Research just released a new AI tool for researchers called PaperOrchestra. Give it your raw experiment logs and a rough idea summary. It gives you back a complete research paper in LaTeX, with citations, figures, an abstract, methodology, and everything else, in about 40 minutes.

Why Existing AI Writing Tools Weren’t Enough

Artificial intelligence and human interaction concept for AI research paper writing and academic research

This isn’t the first attempt at AI research paper writing. But previous systems had a pretty significant limitation. Tools like AI Scientist were built to run their own experiments and then write about them. You couldn’t hand them your data from outside their pipeline and expect a paper. Other tools focused on just one piece, like generating literature reviews or helping with citations, but couldn’t produce a full manuscript on their own.

The result was a clear gap. No automated research paper generator could take the kind of unstructured, real-world materials a researcher actually has after finishing experiments and produce a complete, rigorous paper independently. Google PaperOrchestra is built specifically to fill that gap.

Five Agents, One Complete Paper

What makes PaperOrchestra different from a single-prompt AI writing tool is how it’s structured. It’s a multi-agent AI writing system, meaning five specialised agents each handle a specific part of the process. Two of them run in parallel to keep things moving faster.

Here’s what each agent actually does:

  • Outline Agent reads your idea summary, experiment log, and the conference template, then builds a structured writing plan with a visualisation strategy and section-level citation hints
  • Plotting Agent generates all figures and diagrams using a Vision-Language Model that reviews and revises images until they meet the design objectives
  • Literature Review Agent finds relevant papers, verifies every single one through the Semantic Scholar API, discards hallucinated references, and drafts the Introduction and Related Work sections with a hard rule that 90% of verified citations must appear in the paper
  • Section Writing Agent writes the abstract, methodology, experiments, and conclusion using everything generated so far, pulling numbers directly from your experiment logs to build tables
  • Content Refinement Agent runs a simulated peer-review loop, revising the manuscript iteratively and only accepting changes that actually improve the quality score

The whole pipeline makes around 60 to 70 LLM API calls and finishes in a mean of 39.6 minutes. For a complete, formatted manuscript. That’s genuinely fast for what it’s doing.

Does the Output Actually Hold Up?

Google tested PaperOrchestra against 200 accepted papers from CVPR 2025 and ICLR 2025. The system was given the same raw inputs a real researcher would have, and its output was compared directly to what the actual human authors wrote.

In automated side-by-side comparisons, it outperformed the strongest AI baseline by 39% to 86% on overall paper quality. On literature review quality, the win margins were 88% to 99%. Human evaluators, 11 AI researchers doing 180 paired comparisons, confirmed the same results. PaperOrchestra beat AI baselines by 14% to 38% on overall manuscript quality.

The citation numbers tell an interesting story too. Competing AI systems averaged 9 to 14 citations per paper. This automated research paper generator averaged 45 to 48, much closer to the roughly 59 citations found in human-written papers. And it wasn’t just about volume. It identified the broader references that signal genuine scholarly depth, not just the obvious ones every system picks up.

In simulated acceptance rate tests, it hit 84% on CVPR and 81% on ICLR. Human-written papers sat at 86% and 94% respectively. Close enough to be taken seriously as a real AI research paper writing tool.

What Researchers Should Actually Know Before Using It

A few honest things worth knowing before getting too excited. PaperOrchestra cannot fabricate experimental results. It works only with what you give it. If your experiment logs don’t contain certain data, the system is specifically instructed not to invent it. The richer and more detailed your idea summary, the better the output. Sparse inputs produce noticeably weaker papers compared to detailed methodology descriptions.

The human researcher also stays fully responsible for accuracy, originality, and validity. This multi-agent AI writing system is an assistive tool, not a replacement for doing the actual research. The thinking still happens on your side. What gets easier is the translation of that thinking into a structured, submission-ready manuscript.

For researchers who currently spend weeks on the writing phase after experiments are done, that’s still a meaningful shift. The part that stays hard is the research itself. The part that gets faster is getting it on paper.

Google PaperOrchestra is available on arXiv with full technical details. The team also introduced PaperWritingBench alongside it, the first standardised benchmark specifically for evaluating AI research paper writing tools, which is useful well beyond just this one system.

Share this :