Leveraging LLMs in Library Publishing: JATS XML Encoding with ChatGPT
Abstract
Introduction: Reliable and lightweight conversions of Microsoft Word documents to HTML have long eluded library publishers. We demonstrate how off-the-shelf large language models (LLMs) like ChatGPT offer a lean pathway forward for generating JATS XML, which current platforms are equipped to render into user-friendly HTML publications.
Methods: With careful prompting, ChatGPT can turn a plain text typescript into valid JATS. Leveraging a one- and few-shot approach for the <front> part of an XML file ensures that boilerplate data included in example(s) prompts the LLM to populate the correct data in its output. In <body> and <back/references> parts, zero-shot prompts with only the name and version of our JATS specification produce valid XML in ChatGPT 4.0.
Results: One- and few-shot prompting proved effective in directing ChatGPT 3.5 to consistently encode discrete, sequential sections of article typescripts. In retesting with ChatGPT 4.0, zero-shot approaches demonstrated that <body> and <back/references> parts need only the JATS specification name and version to convert typescript into valid XML. The <front> parts still benefit from a one- and few-shot approach.
Discussion: The primary bottleneck is token or source size limitations. Content must be broken up into separate sections for input and the output manually “stitched” together to form a complete XML file.
Conclusion: LLMs may offer a solution for publishers without the resources to encode JATS files by other means. As LLMs increase in scale, we expect workflows for encoding research articles in JATS to become even more accurate, with fewer restrictions on capacity.
Keywords: library publishing, JATS XML, ChatGPT, prompt engineering
How to Cite:
Vaughn, M. & Higgins, R., (2025) “Leveraging LLMs in Library Publishing: JATS XML Encoding with ChatGPT”, Journal of Librarianship and Scholarly Communication 13(1). doi: https://doi.org/10.31274/jlsc.18048
Rights:
© 2025 The Author(s). License: CC BY 4.0
Downloads:
Download pdf
View PDF
467 Views
32 Downloads