Abstractive summarization of large document collections using GPT


Journal article


S. Liu, C. G. Healey
arXiv:2310.05690, 2023

View PDF arXiv Semantic Scholar
Cite

Cite

APA   Click to copy
Liu, S., & Healey, C. G. (2023). Abstractive summarization of large document collections using GPT. ArXiv:2310.05690.


Chicago/Turabian   Click to copy
Liu, S., and C. G. Healey. “Abstractive Summarization of Large Document Collections Using GPT.” arXiv:2310.05690 (2023).


MLA   Click to copy
Liu, S., and C. G. Healey. “Abstractive Summarization of Large Document Collections Using GPT.” ArXiv:2310.05690, https://arxiv.org/abs/2310.05690, 2023.


BibTeX   Click to copy

@article{s2023a,
  title = {Abstractive summarization of large document collections using GPT},
  year = {2023},
  journal = {arXiv:2310.05690},
  author = {Liu, S. and Healey, C. G.},
  howpublished = {https://arxiv.org/abs/2310.05690}
}

This paper proposes a method of abstractive summarization designed to scale to document collections instead of individual documents. Our approach applies a combination of semantic clustering, document size reduction within topic clusters, semantic chunking of a cluster’s documents, GPT-based summarization and concatenation, and a combined sentiment and text visualization of each topic to support exploratory data analysis. Statistical comparison of our results to existing state-of-the-art systems BART, BRIO, PEGASUS, and MoCa using ROGUE summary scores showed statistically equivalent performance with BART and PEGASUS on the CNN/Daily Mail test dataset, and with BART on the Gigaword test dataset. This finding is promising since we view document collection summarization as more challenging than individual document summarization. We conclude with a discussion of how issues of scale are being addressed in the GPT large language model, then suggest potential areas for future work.

Share

Tools
Translate to