Pointwise probability reinforcements for robust statistical inference

Benoît Frénay; Michel Verleysen

doi:10.1016/j.neunet.2013.11.012

Pointwise probability reinforcements for robust statistical inference

Neural Netw. 2014 Feb:50:124-41. doi: 10.1016/j.neunet.2013.11.012. Epub 2013 Nov 21.

Authors

Benoît Frénay¹, Michel Verleysen²

Affiliations

¹ Machine Learning Group - ICTEAM, Université catholique de Louvain, Place du Levant 3, 1348 Louvain-la-Neuve, Belgium. Electronic address: benoit.frenay@uclouvain.be.
² Machine Learning Group - ICTEAM, Université catholique de Louvain, Place du Levant 3, 1348 Louvain-la-Neuve, Belgium. Electronic address: michel.verleysen@uclouvain.be.

PMID: 24300550
DOI: 10.1016/j.neunet.2013.11.012

Abstract

Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation.

Keywords: Cleansing; Filtering; Maximum likelihood; Outliers; Probability reinforcements; Robust inference.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Data Interpretation, Statistical*
Databases, Factual
Humans
Probability*
Regression Analysis
Reinforcement, Psychology*
Statistics, Nonparametric