Semiautomated text analytics for qualitative data synthesis

Emily Haynes; Ruth Garside; Judith Green; Michael P Kelly; James Thomas; Cornelia Guell

doi:10.1002/jrsm.1361

Semiautomated text analytics for qualitative data synthesis

Res Synth Methods. 2019 Sep;10(3):452-464. doi: 10.1002/jrsm.1361. Epub 2019 Jul 9.

Authors

Emily Haynes¹, Ruth Garside¹, Judith Green², Michael P Kelly³, James Thomas⁴, Cornelia Guell¹

Affiliations

¹ European Centre for Environment & Human Health, University of Exeter, Truro, UK.
² School of Population Health & Environmental Sciences, King's College London, London, UK.
³ Primary Care Unit, Cambridge Institute of Public Health, University of Cambridge, Cambridge, UK.
⁴ EPPI-Centre, Department of Social Science, University College London, London, UK.

Abstract

Approaches to synthesizing qualitative data have, to date, largely focused on integrating the findings from published reports. However, developments in text mining software offer the potential for efficient analysis of large pooled primary qualitative datasets. This case study aimed to (a) provide a step-by-step guide to using one software application, Leximancer, and (b) interrogate opportunities and limitations of the software for qualitative data synthesis. We applied Leximancer v4.5 to a pool of five qualitative, UK-based studies on transportation such as walking, cycling, and driving, and displayed the findings of the automated content analysis as intertopic distance maps. Leximancer enabled us to "zoom out" to familiarize ourselves with, and gain a broad perspective of, the pooled data. It indicated which studies clustered around dominant topics such as "people." The software also enabled us to "zoom in" to narrow the perspective to specific subgroups and lines of enquiry. For example, "people" featured in men's and women's narratives but were talked about differently, with men mentioning "kids" and "old," whereas women mentioned "things" and "stuff." The approach provided us with a fresh lens for the initial inductive step in the analysis process and could guide further exploration. The limitations of using Leximancer were the substantial data preparation time involved and the contextual knowledge required from the researcher to turn lines of inquiry into meaningful insights. In summary, Leximancer is a useful tool for contributing to qualitative data synthesis, facilitating comprehensive and transparent data coding but can only inform, not replace, researcher-led interpretive work.

Keywords: data pooling; machine learning; qualitative data synthesis; secondary analysis; social practice; text analytics; text mining.

MeSH terms

Algorithms
Data Accuracy
Data Mining / methods*
Data Science / methods*
Databases, Factual
Female
Humans
Machine Learning
Male
Normal Distribution
Pattern Recognition, Automated*
Qualitative Research*
Software
United Kingdom

Abstract

MeSH terms

Grants and funding