Classification of forensic autopsy reports through conceptual graph-based document representation model

Ghulam Mujtaba; Liyana Shuib; Ram Gopal Raj; Retnagowri Rajandram; Khairunisa Shaikh; Mohammed Ali Al-Garadi

doi:10.1016/j.jbi.2018.04.013

Classification of forensic autopsy reports through conceptual graph-based document representation model

J Biomed Inform. 2018 Jun:82:88-105. doi: 10.1016/j.jbi.2018.04.013. Epub 2018 May 5.

Authors

Ghulam Mujtaba¹, Liyana Shuib², Ram Gopal Raj³, Retnagowri Rajandram⁴, Khairunisa Shaikh⁵, Mohammed Ali Al-Garadi⁶

Affiliations

¹ Department of Information Systems, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia; Department of Computer Science, Sukkur IBA University, Sukkur, Sind, Pakistan. Electronic address: mujtaba@siswa.um.edu.my.
² Department of Information Systems, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia. Electronic address: liyanashuib@um.edu.my.
³ Department of Artificial Intelligence, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia. Electronic address: ramdr@um.edu.my.
⁴ Department of Surgery, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia. Electronic address: rretnagowri@um.edu.my.
⁵ Department of Social and Preventive Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia. Electronic address: khairunisashaikh@siswa.um.edu.my.
⁶ Department of Information Systems, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia. Electronic address: mohammedali@siswa.um.edu.my.

PMID: 29738820
DOI: 10.1016/j.jbi.2018.04.013

Abstract

Text categorization has been used extensively in recent years to classify plain-text clinical reports. This study employs text categorization techniques for the classification of open narrative forensic autopsy reports. One of the key steps in text classification is document representation. In document representation, a clinical report is transformed into a format that is suitable for classification. The traditional document representation technique for text categorization is the bag-of-words (BoW) technique. In this study, the traditional BoW technique is ineffective in classifying forensic autopsy reports because it merely extracts frequent but discriminative features from clinical reports. Moreover, this technique fails to capture word inversion, as well as word-level synonymy and polysemy, when classifying autopsy reports. Hence, the BoW technique suffers from low accuracy and low robustness unless it is improved with contextual and application-specific information. To overcome the aforementioned limitations of the BoW technique, this research aims to develop an effective conceptual graph-based document representation (CGDR) technique to classify 1500 forensic autopsy reports from four (4) manners of death (MoD) and sixteen (16) causes of death (CoD). Term-based and Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) based conceptual features were extracted and represented through graphs. These features were then used to train a two-level text classifier. The first level classifier was responsible for predicting MoD. In addition, the second level classifier was responsible for predicting CoD using the proposed conceptual graph-based document representation technique. To demonstrate the significance of the proposed technique, its results were compared with those of six (6) state-of-the-art document representation techniques. Lastly, this study compared the effects of one-level classification and two-level classification on the experimental results. The experimental results indicated that the CGDR technique achieved 12% to 15% improvement in accuracy compared with fully automated document representation baseline techniques. Moreover, two-level classification obtained better results compared with one-level classification. The promising results of the proposed conceptual graph-based document representation technique suggest that pathologists can adopt the proposed system as their basis for second opinion, thereby supporting them in effectively determining CoD.

Keywords: Forensic autopsy reports; Graph-based text classification; SNOMED CT concepts and descriptors; Supervised machine learning; Text classification.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Automation
Autopsy / methods*
Cause of Death*
Computer Graphics
Forensic Medicine / methods*
Humans
Information Storage and Retrieval
Machine Learning
Medical Informatics / methods*
Software
Systematized Nomenclature of Medicine*