Joint Dictionary Learning-Based Non-Negative Matrix Factorization for Voice Conversion to Improve Speech Intelligibility After Oral Surgery

Szu-Wei Fu; Pei-Chun Li; Ying-Hui Lai; Cheng-Chien Yang; Li-Chun Hsieh; Yu Tsao

doi:10.1109/TBME.2016.2644258

Joint Dictionary Learning-Based Non-Negative Matrix Factorization for Voice Conversion to Improve Speech Intelligibility After Oral Surgery

IEEE Trans Biomed Eng. 2017 Nov;64(11):2584-2594. doi: 10.1109/TBME.2016.2644258.

Authors

Szu-Wei Fu¹, Pei-Chun Li², Ying-Hui Lai³, Cheng-Chien Yang⁴, Li-Chun Hsieh⁴, Yu Tsao⁵

Affiliations

¹ Department of Computer Science and Information EngineeringNational Taiwan University.
² Department of Audiology and Speech Language PathologyMackay Medical College.
³ Department of Electrical EngineeringYuan Ze University.
⁴ Mackay Memorial Hospital.
⁵ Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan, R.O.C.

PMID: 28026747
DOI: 10.1109/TBME.2016.2644258

Abstract

Objective: This paper focuses on machine learning based voice conversion (VC) techniques for improving the speech intelligibility of surgical patients who have had parts of their articulators removed. Because of the removal of parts of the articulator, a patient's speech may be distorted and difficult to understand. To overcome this problem, VC methods can be applied to convert the distorted speech such that it is clear and more intelligible. To design an effective VC method, two key points must be considered: 1) the amount of training data may be limited (because speaking for a long time is usually difficult for postoperative patients); 2) rapid conversion is desirable (for better communication). Methods: We propose a novel joint dictionary learning based non-negative matrix factorization (JD-NMF) algorithm. Compared to conventional VC techniques, JD-NMF can perform VC efficiently and effectively with only a small amount of training data. Results: The experimental results demonstrate that the proposed JD-NMF method not only achieves notably higher short-time objective intelligibility (STOI) scores (a standardized objective intelligibility evaluation metric) than those obtained using the original unconverted speech but is also significantly more efficient and effective than a conventional exemplar-based NMF VC method. Conclusion: The proposed JD-NMF method may outperform the state-of-the-art exemplar-based NMF VC method in terms of STOI scores under the desired scenario. Significance: We confirmed the advantages of the proposed joint training criterion for the NMF-based VC. Moreover, we verified that the proposed JD-NMF can effectively improve the speech intelligibility scores of oral surgery patients.Objective: This paper focuses on machine learning based voice conversion (VC) techniques for improving the speech intelligibility of surgical patients who have had parts of their articulators removed. Because of the removal of parts of the articulator, a patient's speech may be distorted and difficult to understand. To overcome this problem, VC methods can be applied to convert the distorted speech such that it is clear and more intelligible. To design an effective VC method, two key points must be considered: 1) the amount of training data may be limited (because speaking for a long time is usually difficult for postoperative patients); 2) rapid conversion is desirable (for better communication). Methods: We propose a novel joint dictionary learning based non-negative matrix factorization (JD-NMF) algorithm. Compared to conventional VC techniques, JD-NMF can perform VC efficiently and effectively with only a small amount of training data. Results: The experimental results demonstrate that the proposed JD-NMF method not only achieves notably higher short-time objective intelligibility (STOI) scores (a standardized objective intelligibility evaluation metric) than those obtained using the original unconverted speech but is also significantly more efficient and effective than a conventional exemplar-based NMF VC method. Conclusion: The proposed JD-NMF method may outperform the state-of-the-art exemplar-based NMF VC method in terms of STOI scores under the desired scenario. Significance: We confirmed the advantages of the proposed joint training criterion for the NMF-based VC. Moreover, we verified that the proposed JD-NMF can effectively improve the speech intelligibility scores of oral surgery patients.

Keywords: Data models; Dictionaries; Spectrogram; Speech; Surgery; Training; Training data.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Communication Aids for Disabled*
Humans
Machine Learning*
Male
Oral Surgical Procedures / adverse effects*
Speech Intelligibility / physiology*
Speech Recognition Software*