Automating the interpretation of PM2.5 time-resolved measurements using a data-driven approach

Hao Tang; Wanyu Rengie Chan; Michael D Sohn

doi:10.1111/ina.12780

Automating the interpretation of PM_2.5 time-resolved measurements using a data-driven approach

Indoor Air. 2021 May;31(3):860-871. doi: 10.1111/ina.12780. Epub 2020 Dec 28.

Authors

Hao Tang¹, Wanyu Rengie Chan², Michael D Sohn²

Affiliations

¹ Joint International Research Laboratory of Green Buildings and Built Environments, Chongqing University, Chongqing, China.
² Indoor Environment Group, Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.

PMID: 33369785
DOI: 10.1111/ina.12780

Abstract

The rapid development of automated measurement equipment enables researchers to collect greater quantities of time-resolved data from indoor and outdoor environments. While significant, the interpretation of the resulting data can be a time-consuming effort. This paper introduces an automated process of interpreting PM_2.5 time-resolved data and differentiating PM_2.5 emissions resulting from indoor and outdoor sources. We use Random Forest (RF), a machine learning approach, to study a dataset of 836 indoor emission events that occurred over a 2-week period in 18 apartments in California. In this paper, we show model development and evaluate its performance as the sample size and source vary. We discuss the characteristics of the dataset that tended to help the source identification and why. For example, we show that data from many events and from different apartments are essential for the model to be suitable for analyzing a new separate dataset. We also show that longitudinal data appear to be more helpful than the time frequency of measurements within a given apartment. We use the resulting RF model to analyze PM_2.5 data of an entirely separate dataset collected from 65 new homes in California. The RF model identifies 442 indoor emission events, with only a few misidentifications.

Keywords: PM2.5; indoor emission; machine learning; random forest; residential; time-resolved measurement.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Air Pollutants
Air Pollution, Indoor* / statistics & numerical data
Environmental Monitoring*
Humans
Machine Learning
Particle Size*
Particulate Matter

Substances

Air Pollutants
Particulate Matter