Privacy Policies of IoT Devices: Collection and Analysis

Sensors (Basel). 2022 Feb 25;22(5):1838. doi: 10.3390/s22051838.

Abstract

Currently, personal data collection and processing are widely used while providing digital services within mobile sensing networks for their operation, personalization, and improvement. Personal data are any data that identifiably describe a person. Legislative and regulatory documents adopted in recent years define the key requirements for the processing of personal data. They are based on the principles of lawfulness, fairness, and transparency of personal data processing. Privacy policies are the only legitimate way to provide information on how the personal data of service and device users is collected, processed, and stored. Therefore, the problem of making privacy policies clear and transparent is extremely important as its solution would allow end users to comprehend the risks associated with personal data processing. Currently, a number of approaches for analyzing privacy policies written in natural language have been proposed. Most of them require a large training dataset of privacy policies. In the paper, we examine the existing corpora of privacy policies available for training, discuss their features and conclude on the need for a new dataset of privacy policies for devices and services of the Internet of Things as a part of mobile sensing networks. The authors develop a new technique for collecting and cleaning such privacy policies. The proposed technique differs from existing ones by the usage of e-commerce platforms as a starting point for document search and enables more targeted collection of the URLs to the IoT device manufacturers' privacy policies. The software tool implementing this technique was used to collect a new corpus of documents in English containing 592 unique privacy policies. The collected corpus contains mainly privacy policies that are developed for the Internet of Things and reflect the latest legislative requirements. The paper also presents the results of the statistical and semantic analysis of the collected privacy policies. These results could be further used by the researchers when elaborating techniques for analysis of the privacy policies written in natural language targeted to enhance their transparency for the end user.

Keywords: IoT; data collection; dataset; latent Dirichlet allocation; natural language processing; privacy policies; privacy policy corpus.

MeSH terms

  • Data Collection
  • Humans
  • Policy*
  • Privacy*