Skip to main content
Research Article

Leveraging LLMs in Library Publishing: JATS XML Encoding with ChatGPT

Authors
  • Matthew Vaughn orcid logo (Indiana University Bloomington)
  • Richard Higgins orcid logo (Indiana University Bloomington)

Abstract

Introduction: Reliable and lightweight conversions of Microsoft Word documents to HTML have long eluded library publishers. We demonstrate how off-the-shelf large language models (LLMs) like ChatGPT offer a lean pathway forward for generating JATS XML, which current platforms are equipped to render into user-friendly HTML publications.

Methods: With careful prompting, ChatGPT can turn a plain text typescript into valid JATS. Leveraging a one- and few-shot approach for the <front> part of an XML file ensures that boilerplate data included in example(s) prompts the LLM to populate the correct data in its output. In <body> and <back/references> parts, zero-shot prompts with only the name and version of our JATS specification produce valid XML in ChatGPT 4.0.

Results: One- and few-shot prompting proved effective in directing ChatGPT 3.5 to consistently encode discrete, sequential sections of article typescripts. In retesting with ChatGPT 4.0, zero-shot approaches demonstrated that <body> and <back/references> parts need only the JATS specification name and version to convert typescript into valid XML. The <front> parts still benefit from a one- and few-shot approach.

Discussion: The primary bottleneck is token or source size limitations. Content must be broken up into separate sections for input and the output manually “stitched” together to form a complete XML file.

Conclusion: LLMs may offer a solution for publishers without the resources to encode JATS files by other means. As LLMs increase in scale, we expect workflows for encoding research articles in JATS to become even more accurate, with fewer restrictions on capacity.

Keywords: library publishing, JATS XML, ChatGPT, prompt engineering

How to Cite:

Vaughn, M. & Higgins, R., (2025) “Leveraging LLMs in Library Publishing: JATS XML Encoding with ChatGPT”, Journal of Librarianship and Scholarly Communication 13(1). doi: https://doi.org/10.31274/jlsc.18048

Rights:

© 2025 The Author(s). License: CC BY 4.0

Downloads:
Download pdf
View PDF

467 Views

32 Downloads

Published on
2025-01-16

Peer Reviewed