From voice to ink (Vink): development and assessment of an automated, free-of-charge transcription tool

BMC Res Notes. 2024 Mar 29;17(1):95. doi: 10.1186/s13104-024-06749-0.

Abstract

Background: Verbatim transcription of qualitative audio data is a cornerstone of analytic quality and rigor, yet the time and energy required for such transcription can drain resources, delay analysis, and hinder the timely dissemination of qualitative insights. In recent years, software programs have presented a promising mechanism to accelerate transcription, but the broad application of such programs has been constrained due to expensive licensing or "per-minute" fees, data protection concerns, and limited availability of such programs in many languages. In this article, we outline our process of adapting a free, open-source, speech-to-text algorithm (Whisper by OpenAI) into a usable and accessible tool for qualitative transcription. Our program, which we have dubbed "Vink" for voice to ink, is available under a permissive open-source license (and thus free of cost).

Results: We conducted a proof-of-principle assessment of Vink's performance in transcribing authentic interview audio data in 14 languages. A majority of pilot-testers evaluated the software performance positively and indicated that they were likely to use the tool in their future research. Our usability assessment indicates that Vink is easy-to-use, and we performed further refinements based on pilot-tester feedback to increase user-friendliness.

Conclusion: With Vink, we hope to contribute to facilitating rigorous qualitative research processes globally by reducing time and costs associated with transcription and by expanding free-of-cost transcription software availability to more languages. With Vink running on standalone computers, data privacy issues arising within many other solutions do not apply.

Keywords: Automated speech recognition; Interview; Qualitative research; Speech-to-text algorithm; Transcription; Vink; Whisper.

MeSH terms

  • Ink*
  • Software
  • Speech
  • User-Computer Interface*