POCASUM : Policy Categorizer and Summarizer Based on Text Mining and Machine Learning

Soft comput. 2021 Jul;25(14):9365-9375. doi: 10.1007/s00500-021-05916-w. Epub 2021 Jun 11.

Abstract

Having control over your data is a right and a duty that every citizen has in our digital society. It is often that users skip entire policies of applications or websites to save time and energy without realizing the potential sticky points in these policies. Due to obscure language and verbose explanations majority of users of hypermedia do not bother to read them. Further, sometimes digital media companies do not spend enough effort in stating their policies clearly which often time can also be incomplete. A summarized version of these privacy policies that can be categorized into the useful information can help the users. To solve this problem, in this work we propose to use machine learning based models for policy categorizer that classifies the policy paragraphs under the attributes proposed like security, contact etc. By benchmarking different machine learning based classifier models, we show that artificial neural network model performs with higher accuracy on a challenging dataset of textual privacy policies. We thus show that machine learning can help summarize the relevant paragraphs under the various attributes so that the user can get the gist of that topic within a few lines.

Keywords: Text classification; artificial neural network; machine learning; privacy policy; text mining; text summarization.