Journal article
International Journal of Data Science and Analytics, vol. 15, 2023, pp. 105-118
APA
Click to copy
Villanes, A., & Healey, C. G. (2023). Domain-specific text dictionaries for text analytics. International Journal of Data Science and Analytics, 15, 105–118.
Chicago/Turabian
Click to copy
Villanes, A., and C. G. Healey. “Domain-Specific Text Dictionaries for Text Analytics.” International Journal of Data Science and Analytics 15 (2023): 105–118.
MLA
Click to copy
Villanes, A., and C. G. Healey. “Domain-Specific Text Dictionaries for Text Analytics.” International Journal of Data Science and Analytics, vol. 15, 2023, pp. 105–18.
BibTeX Click to copy
@article{a2023a,
title = {Domain-specific text dictionaries for text analytics},
year = {2023},
journal = {International Journal of Data Science and Analytics},
pages = {105-118},
volume = {15},
author = {Villanes, A. and Healey, C. G.}
}
We investigate the use of sentiment dictionaries to estimate sentiment for large document collections. Our goal in this paper is a semiautomatic method for extending a general sentiment dictionary for a specific target domain in a way that minimizes manual effort. General sentiment dictionaries may not contain terms important to the target domain or may score terms in ways that are inappropriate for the target domain. We combine statistical term identification and term evaluation using Amazon Mechanical Turk to extend the EmoLex sentiment dictionary to a domain-specific study of dengue fever. The same approach can be applied to any term-based sentiment dictionary or target domain. We explain how terms are identified for inclusion or re-evaluation and how Mechanical Turk generates scores for the identified terms. Examples are provided that compare EmoLex sentiment estimates before and after it is extended. We conclude by describing how our sentiment estimates can be integrated into an epidemiology surveillance system that includes sentiment visualization and discussing the strengths and limitations of our work.