AI for Academic & STM Publishing: Citation-Grounded Workflows

Scholarly and STM publishing has a specific relationship to AI that other industries do not. Citation is not a feature of the work; it is the work. The whole epistemic system of science runs on “you can verify by following the source.” AI tools that operate over scholarly content without strong citation provenance do not just produce worse outputs — they undermine the value proposition of the content itself.

This guide is for publishing decision-makers thinking about where AI fits and where it does not. We build Edtek Chat and Edtek Cite, and we have deployed the same architectural pattern that powers the AAAi Chat Book for reference publishers as well. The framing reflects that operating experience.

Three publishing workflows AI is changing in 2026

The publishing-AI conversation in 2026 has moved past “AI will disrupt publishing” generalities and into specific workflow-level questions. Three areas are where most of the change is happening.

Reader-facing AI over the publisher’s content. Journal-level chatbots, book-level Q&A interfaces, encyclopedia-style conversational access to reference works. The promise is that readers get faster access to specific answers from authoritative sources without losing the citation chain that makes the sources authoritative. The risk is that badly-built AI undermines reader trust in the publisher’s content by attaching the publisher’s brand to hallucinated outputs.

Editorial AI for integrity and process. Reference verification, citation accuracy checking, screening for AI-generated content in submissions, peer-review support. These workflows have grown rapidly between 2024 and 2026 in response to the well-documented increase in AI-generated submissions across major journals.

Reference work monetisation through licensed AI access. Publishers with valuable reference catalogues (legal treatises, clinical guidelines, technical handbooks, encyclopedic content) are licensing AI-mediated access as a distinct product alongside traditional subscriptions. The Chat Book pattern is the most concrete example.

The publishers making real progress in 2026 are the ones treating these as engineering and product problems with publishing context, not as publishing problems with an AI plug-in. The architectural choices matter as much as the editorial ones.

The hallucinated-citation problem

The single most-documented failure mode of generic AI in scholarly contexts is fabricated citations. A general-purpose LLM, asked to support a claim, produces a citation in the right format — author, year, journal, volume, page — that looks correct but corresponds to no real paper. Or the paper exists but does not support the claim. Or the paper exists, supports the claim, but the citation details (author order, year) are wrong.

Peer-reviewed audits document the scale. Walters and Wilder (Scientific Reports, 2023) examined 636 ChatGPT citations across 42 topics and found 55% of GPT-3.5 citations and 18% of GPT-4 citations to be fabricated, with 43% and 24% respectively of the real citations containing substantive errors. Bhattacharyya et al. (Cureus, 2023) tested 115 ChatGPT-3.5-generated medical references and found 47% fabricated, 46% authentic but inaccurate, and only 7% both authentic and accurate. More recently, Linardon et al. (JMIR Mental Health, 12 November 2025) tested GPT-4o on six mental-health literature reviews: 19.9% of citations were entirely fabricated, with 56.2% either fabricated or containing material errors in publication date, page numbers, or DOI.

Two patterns matter for publishers. First, fabrication rates depend heavily on the model — GPT-4 and GPT-4o have cut fabrication substantially from the GPT-3.5 levels that drove early scholarly alarm, so quoting GPT-3.5 numbers as representative of “ChatGPT today” overstates the current state. Second, all three studies measure citations generated without retrieval augmentation. The fabrication problem is a property of how generic LLMs construct plausible-looking citation strings — not a property of “AI” as a category. Citation-grounded RAG architectures, where every output ties to a retrieved passage from a real corpus, are exactly the response to this failure mode.

For publishers, the problem cuts two ways. Reader-facing AI tools that hallucinate citations to the publisher’s content damage the publisher’s brand and editorial credibility. Submission workflows that admit AI-generated content with fabricated citations pollute the literature. Both demand citation-grounded AI architecture — see Citation-Grounded LLMs for the category-level framing.

The fix is structural, not cosmetic. A tool that adds citations as a post-hoc UI element to an unconstrained generator does not solve the problem. A tool that enforces citation at the architectural level — every claim ties to a retrieved passage, the model refuses when no passage supports a claim — does.

Reader-facing AI: chat over the journal, book, or encyclopedia

The reader-facing pattern is the most visible publishing-AI workflow in 2026. A reader (researcher, clinician, lawyer, student) wants a specific answer from a body of authoritative content. The traditional access pattern — read linearly, search by keyword, follow an index — is not optimised for the question-answering use case. Conversational access is.

The architectural pattern that works is straightforward in principle and exacting in execution:

Curated corpus. The AI’s source material is the publisher’s authoritative content, not the open web. The model can only draw from what is in the corpus.
Source-cited retrieval. Every answer cites the specific source location (page, section, paragraph) it derives from. The reader can verify in seconds.
Refusal on out-of-corpus questions. If the question is not addressed by the corpus, the system says so. It does not paper over the gap with general knowledge.
Brand-aligned voice. The reader-facing tone matches the publisher’s voice. This is where light fine-tuning or careful prompt engineering earns its keep — see RAG vs Fine-Tuning for when each is appropriate.

The deployments that fail visibly are the ones that skip one or more of these. A reader-facing chatbot built on a generic model with shallow grounding produces plausible answers that may or may not reflect the publisher’s actual content. The reader cannot tell. The publisher’s brand suffers.

The AAAi Chat Book is the most public example of the pattern done strictly. The American Arbitration Association’s case preparation and presentation materials power a reader-facing AI that arbitrators, advocates, and students use to query AAA’s actual guidance. Citations point to specific pages of the handbook. Out-of-corpus questions get an explicit refusal. The system makes the publisher’s content more useful without diluting its authority.

The same pattern works for journal back-catalogues, reference encyclopedias, clinical guideline collections, legal treatises, technical handbooks, and educational reference works. The architecture is reusable; the content and the calibration are publisher-specific.

Editorial AI: integrity checks, reference verification, peer-review support

The editorial side of publishing has adopted AI more quietly than the reader-facing side, but the workflows are deeper into operations.

Reference and citation verification. AI tools that read submitted manuscripts, extract every citation, and verify each one against the underlying source — does the paper exist, does it support the claim cited, is the citation format correct. The volume of citations in a typical journal makes manual verification impractical at scale; AI assistance makes it tractable.

AI-generated content screening. Identifying submissions that include AI-generated text without disclosure. The signals are subtle and the field is moving fast — model outputs become harder to detect as models improve — but the workflow is now standard at major journals.

Peer-review support. Suggesting potential reviewers, summarising review responses for editors, drafting initial review-decision letters. The human reviewer’s role is unchanged; the AI handles the surrounding logistics.

Editorial style and consistency. Identifying terminology inconsistencies, citation-format mismatches, structural issues in manuscripts. The kind of work copy-editors do at scale, accelerated by AI.

In each, the AI is an assistant, not an editor. The publisher’s editorial standards remain the editor’s responsibility. AI tools that try to replace editorial judgement rather than support it tend to fail in ways that hurt the publication’s reputation.

The architectural requirements are the same as for reader-facing tools: citation provenance, refusal where the AI is not confident, audit logging for editorial decisions. A tool that flags a reference as unverified should be able to show why; a tool that suggests an editorial decision should be able to show its reasoning. Black-box tools do not survive editorial scrutiny.

Reference work monetisation through licensed AI access

For publishers with valuable reference catalogues, AI-mediated access has become a distinct revenue product. The economic logic is straightforward: a reference work that was generating declining returns through traditional subscription access can generate substantially higher revenue per user when accessed conversationally, because the conversational interface is meaningfully more useful for the professional users who rely on the content.

The pattern in production:

The publisher selects content suitable for AI-mediated access — typically reference works, professional treatises, clinical guidelines, technical handbooks. Content where users have specific questions and want specific authoritative answers.
The content is indexed into a hallucination-proof RAG deployment, with chunking calibrated for the corpus and citation provenance enforced.
The deployment is offered as a subscription or per-use product, often at a higher price point than the underlying publication, reflecting the higher utility.
Citations and references back to the underlying content support sales of the underlying publication too — the AI does not replace the source, it points to it.

The economic case works because the content was already produced. The marginal cost of the AI deployment is modest relative to the original cost of producing the content. The revenue from professional users paying for conversational access often exceeds the revenue from the same content’s traditional subscription.

The Chat Book pattern is one variant; subscription chat interfaces over journal back-catalogues are another; per-use API access for downstream applications is a third. The right model depends on the publisher’s content, audience, and existing business architecture.

Paperpal, reader-facing RAG, and publisher infrastructure — different layers, different jobs

Three different AI tooling approaches sit alongside each other in academic and STM publishing in 2026, each serving a different user. Paperpal targets the author at the manuscript stage. Reader-facing RAG over a journal or book serves the subscriber at the consumption stage. Edtek-style infrastructure powers the publisher’s own AI products. Treating these as substitutes leads to bad procurement decisions; treating them as complementary layers leads to a working AI strategy.

Capability	Paperpal	Reader-facing RAG over a journal/book	Edtek-style infrastructure
Primary user	Author / researcher	Reader / subscriber	Publisher
Primary job	Improve manuscript before submission	Answer questions from published content	Infrastructure to power either of the other two
Source grounding	Suggestions from proprietary scholarly database (10,000+ citation styles, 250M+ papers)	Retrieval over the publisher’s licensed corpus	Per-collection retrieval with configurable thresholds
Generation scope	Language polishing, paraphrasing, copilot — does not draft full text	Question-answering grounded in corpus	Configurable: chat, draft, cite
Citation guarantee	Sources come from Paperpal’s database; not verified against cited papers’ content	Page-level citations to corpus	Page-level citations with refusal on out-of-corpus
Deployment	SaaS (web + Word add-in)	Typically SaaS via publisher portal	SaaS / private cloud / on-premise
Training on user data	”Never” (per Paperpal)	Depends on vendor — Edtek does not	No
Pricing model	$25/mo, $139/yr, Teams from $107 (as of 2025/2026)	Subscription / per-use over existing content	Custom per-deployment

Paperpal (built by Cactus Communications, also the parent of Editage) is a language-polishing and submission-readiness tool with a “Research feature” that surfaces citations from its proprietary scholarly database. The Research feature finds citations; it does not validate cited papers against their actual content. That distinction matters for publishers thinking about editorial verification — Paperpal addresses the author-side surface area, not the editorial-verification surface area.

Reader-facing RAG and publisher infrastructure are the layers where editorial control sits with the publisher.

STM AI policy compliance

The STM industry has been active in setting policy expectations for AI use in scholarly publishing. In September 2025, STM published Recommendations for a Classification of AI Use in Academic Manuscript Preparation — the final output of a 2024 Task and Finish Group on AI Labelling Terminology. The framework offers nine recommended classifications of AI activities in manuscript preparation and is designed as a basis for individual publishers to build disclosure policies on top of STM’s 2023 AI guidelines. It is a framework for publishers’ own policies rather than a binding standard.

The recurring themes across STM industry policy work:

Disclosure of AI in authorship. Authors should disclose use of AI in producing manuscripts; publishers should require this disclosure as part of submission. The STM 2025 classification is the practical schema most publishers are aligning with.
AI as author limitations. AI tools are generally not appropriate as listed authors; the responsible human authors remain accountable for the work.
Reader-facing AI transparency. Where publishers deploy reader-facing AI over their content, users should understand the AI’s scope, limitations, and source corpus.
Editorial integrity protections. AI use in editorial workflows should support, not replace, human editorial judgement.

Specific industry positions are published and updated by industry bodies (STM, COPE, individual publisher associations). The published policy is the authoritative source. For 2026, alignment with industry policy is now table stakes for publishers operating AI tooling at scale; off-policy deployments generate reputational risk and slow adoption.

How Edtek powers academic publisher AI without hallucinations

Academic publishing forces choices that a uniform RAG pipeline cannot make. The catalogue is heterogeneous in ways most enterprise corpora are not — and the techniques that work are the ones that take that heterogeneity seriously rather than averaging over it.

A heterogeneous catalogue forces per-collection calibration. A scholarly publisher’s holdings might include a 1980s monograph digitised through OCR with imperfect structure, a 2024 journal volume tagged to JATS schema, and a reference handbook whose two co-authors write in very different registers. A single similarity threshold across that mix either over-includes from the badly-tagged old material or under-includes from the rigorously-structured new material. Calibration runs per collection, against a labelled test set drawn from the actual content the deployment will index.

No single relevance metric captures the right idea on academic content. Citation matching, full-text similarity, and metadata richness all carry information, and which one matters most varies by collection. Modern reference works are dense in metadata; legacy texts are dense in prose. A linear combination of metrics with weights derived per collection is the operational answer; the weighting itself is a deployment-time decision, not a default.

Citation granularity follows the readership. Document-level citation suffices for exploratory browsing. Page-level is the working floor for most reader-facing access. For collections where readers will challenge claims — regulatory references, clinical guidelines, legal treatises — claim-level provenance is the standard, with each generated assertion anchored to the specific source span. The granularity is a corpus-level decision, not a product default.

Self-evaluation framed for editorial stakes, not latency. Pre-generation evaluation catches retrievals from the wrong subdiscipline — a clinical question pulling a historical-perspective article from the same journal, or a chemistry question retrieving the right keyword from the wrong methodology section. Post-generation evaluation catches paraphrases that drift outside the cited passage. The drift is the kind of error a reader who knows the field will spot in the first paragraph; catching it before publication preserves the publisher’s editorial credibility.

On-premise deployment as a rights-management posture. Licensed third-party content carries restrictions that publishers cannot delegate to a SaaS vendor’s terms of service. On-prem is not primarily a security preference for academic publishing — it is a rights-management requirement. See On-Premise RAG: Deployment Guide for Regulated Sectors for the deployment patterns.

The audit log as editorial intelligence. Beyond defensibility, the log tells the publisher which queries are answered well, which collections are under-served, which subject areas attract readers the catalogue cannot yet satisfy. For acquisitions and editorial planning, a year of audit-log telemetry is more useful than any subscriber survey.

The combination is what allows publishers to deploy AI over their authoritative content without inheriting the hallucination problems that generic AI brings to scholarly contexts.

Convert a Book to an AI Chatbot — the pattern at the centre of publisher AI deployments.
Citation-Grounded LLMs — the category framing for why source-verified tools are the only ones that work for scholarly content.
Chunking Strategies for Legal & Reference RAG Systems — the retrieval-side configuration this article references.
On-Premise RAG: Deployment Guide for Regulated Sectors — for licensed content and confidentiality-sensitive deployments.

Frequently asked questions

What is a citation-grounded LLM for publishers?

An AI system over the publisher’s content where every output cites specific source locations (page, section, paragraph) in the publisher’s catalogue. If the catalogue does not address the question, the system says so. Citation is architecturally enforced rather than added as a UI element. This is the only AI architecture that works for scholarly and reference publishing without undermining the publisher’s authority.

Will reader-facing AI cannibalise our traditional subscriptions?

Usually not — and often the opposite. Reader-facing AI is typically priced higher per user than traditional subscription, because the utility is higher. The AI also surfaces citations back to the underlying content, which can drive sales of the source publications. Publishers who have deployed AI-mediated access at scale generally report it adds revenue rather than replacing existing revenue. The risk is in poorly-built deployments that damage the brand; the upside is in well-built ones that monetise existing content more effectively.

How do we prevent AI from hallucinating citations to our content?

Architecturally, not through prompt instructions. The AI must retrieve real passages from the actual corpus and be constrained to answer only from them, with citations to the retrieved source locations. Refusal behaviour catches cases where retrieval is insufficient. Self-evaluation catches cases where the answer drifts from the retrieved context. Prompt-only enforcement of “do not hallucinate” is not reliable; architectural enforcement is.

What about reader-facing AI over licensed third-party content?

Deployable, with careful attention to licensing terms. Many third-party content licences pre-date AI and may not contemplate AI-mediated access; the licence terms should be reviewed before deployment. Where licensed content is in scope, in-perimeter or dedicated-tenancy deployment is often required by the licence. The architecture supports this; the legal review is the gating step.

How do we screen submissions for AI-generated content?

Through a combination of disclosure-by-submission policy (require authors to disclose AI use) and AI-detection tooling (which is imperfect and evolving). The detection tools are useful as a screening signal but not as a final determination — false positives and false negatives are real. The policy framework is the load-bearing piece; technical detection is a supporting tool.

Can editorial AI replace human peer reviewers?

No — and major publishers have explicitly held this line through their published AI policies. AI can support peer review (summarise responses for editors, suggest potential reviewers, flag formatting issues) but cannot replace the substantive scientific judgement that reviewers provide. Publishers deploying AI in ways that erode peer review face both reputational and integrity risks.

What does AI-mediated reference work monetisation look like in practice?

A subscription or per-use product for AI-mediated access to a curated reference catalogue, typically priced above the underlying publication’s traditional subscription. Professional users (clinicians, lawyers, researchers, technical specialists) pay for conversational access to authoritative content with citation provenance. The Chat Book pattern is one variant; subscription chat over a journal back-catalogue is another; per-use API for downstream applications is a third. The revenue depends on the value of the content to professional users.

How does on-premise deployment fit publisher AI?

For some deployments it is required (licensed third-party content with strict data-handling, sensitive corporate clients, jurisdictions with strong residency rules). For others it is optional but preferred (publishers who want full control over the content and audit trail). The architectural pattern supports SaaS, private cloud, customer-VPC, and full on-premise. The choice follows the content sensitivity and the licensing posture, not vendor preference.