Journal article
IEEE Transactions on Big Data, 2026, pp. 1-14
APA
Click to copy
Liu, S., & Healey, C. G. (2026). Abstractive Summarization of Large Document Collections Using GPT. IEEE Transactions on Big Data, 1–14. https://doi.org/10.1109/TBDATA.2026.3668604
Chicago/Turabian
Click to copy
Liu, S., and C. G. Healey. “Abstractive Summarization of Large Document Collections Using GPT.” IEEE Transactions on Big Data (2026): 1–14.
MLA
Click to copy
Liu, S., and C. G. Healey. “Abstractive Summarization of Large Document Collections Using GPT.” IEEE Transactions on Big Data, 2026, pp. 1–14, doi:10.1109/TBDATA.2026.3668604.
BibTeX Click to copy
@article{s2026a,
title = {Abstractive Summarization of Large Document Collections Using GPT},
year = {2026},
journal = {IEEE Transactions on Big Data},
pages = {1-14},
doi = {10.1109/TBDATA.2026.3668604},
author = {Liu, S. and Healey, C. G.}
}
This paper proposes a method of abstractive summarization designed to scale to document collections instead of individual documents. Our approach combines semantic clustering, document size reduction within topic clusters, semantic chunking of a cluster’s documents, GPT-based summarization and concatenation, and a combined sentiment and text visualization of each topic to support exploratory data analysis. A statistical comparison of our results with those of existing state-of-the-art systems, including BART, BRIO, PEGASUS, and MoCa, using ROUGE and METEOR summary scores showed statistically equivalent performance with BART and PEGASUS in the CNN/Daily Mail test dataset and with BART in the Gigaword test dataset. This finding is promising, since we view document collection summarization as more challenging than individual document summarization. We conclude with a discussion of how issues of scale are being addressed in the GPT.