Towards User-centered Corpus Development: Lessons Learnt from Designing and Developing MedTator

AMIA Annu Symp Proc. 2023 Apr 29:2022:532-541. eCollection 2022.

Abstract

A gold standard annotated corpus is usually indispensable when developing natural language processing (NLP) systems. Building a high-quality annotated corpus for clinical NLP requires considerable time and domain expertise during the annotation process. Existing annotation tools may provide powerful features to cover various needs of text annotation tasks, but the target end users tend to be trained annotators. It is challenging for clinical research teams to utilize those tools in their projects due to various factors such as the complexity of advanced features and data security concerns. To address those challenges, we developed MedTator, a serverless web-based annotation tool with an intuitive user-centered interface aiming to provide a lightweight solution for the core tasks in corpus development. Moreover, we present three lessons learned from the designing and developing MedTator, which will contribute to the research community's knowledge for future open-source tool development.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Humans
  • Natural Language Processing*