Explainable deep learning ensemble for food image analysis on edge devices

Ghalib Ahmed Tahir; Chu Kiong Loo

doi:10.1016/j.compbiomed.2021.104972

Explainable deep learning ensemble for food image analysis on edge devices

Comput Biol Med. 2021 Dec:139:104972. doi: 10.1016/j.compbiomed.2021.104972. Epub 2021 Oct 27.

Authors

Ghalib Ahmed Tahir¹, Chu Kiong Loo²

Affiliations

¹ Department of Artificial Intelligence, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, 50603, Selangor, Malaysia. Electronic address: ghalib@siswa.um.edu.my.
² Department of Artificial Intelligence, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, 50603, Selangor, Malaysia. Electronic address: ckloo.um@um.edu.my.

PMID: 34749093
DOI: 10.1016/j.compbiomed.2021.104972

Abstract

Food recognition systems recently garnered much research attention in the relevant field due to their ability to obtain objective measurements for dietary intake. This feature contributes to the management of various chronic conditions. Challenges such as inter and intraclass variations alongside the practical applications of smart glasses, wearable cameras, and mobile devices require resource-efficient food recognition models with high classification performance. Furthermore, explainable AI is also crucial in health-related domains as it characterizes model performance, enhancing its transparency and objectivity. Our proposed architecture attempts to address these challenges by drawing on the strengths of the transfer learning technique upon initializing MobiletNetV3 with weights from a pre-trained model of ImageNet. The MobileNetV3 achieves superior performance using the squeeze and excitation strategy, providing unequal weight to different input channels and contrasting equal weights in other variants. Despite being fast and efficient, there is a high possibility for it to be stuck in the local optima like other deep neural networks, reducing the desired classification performance of the model. Thus, we overcome this issue by applying the snapshot ensemble approach as it enables the M model in a single training process without any increase in the required training time. As a result, each snapshot in the ensemble visits different local minima before converging to the final solution which enhances recognition performance. On overcoming the challenge of explainability, we argue that explanations cannot be monolithic, since each stakeholder perceive the results', explanations based on different objectives and aims. Thus, we proposed a user-centered explainable artificial intelligence (AI) framework to increase the trust of the involved parties by inferencing and rationalizing the results according to needs and user profile. Our framework is comprehensive in terms of a dietary assessment app as it detects Food/Non-Food, food categories, and ingredients. Experimental results on the standard food benchmarks and newly contributed Malaysian food dataset for ingredient detection demonstrated superior performance on an integrated set of measures over other methodologies.

Keywords: Data augmentation; Deep learning; Ensemble learning; Explainable AI; Food recognition; Mobile application; Neural network; User-centred explainable AI.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Artificial Intelligence*
Deep Learning*
Food
Image Processing, Computer-Assisted
Neural Networks, Computer