Case study data for joint modeling of insurance claims and lapsation

Data Brief. 2021 Nov 26:39:107639. doi: 10.1016/j.dib.2021.107639. eCollection 2021 Dec.

Abstract

The dataset tracks 40,284 insurance clients over five years, between 2010 and 2015, who subscribed to both automobile and homeowners insurance. We have combined information on these customers. First, the characteristics including age, gender or driving experience, among others and dates of renewal for the two types of policies considered here. Note that we have only considered clients corresponding to persons and not commercial firms that can also underwrite home and motor insurance policies. Second, the policy data file for motor vehicle insurance consists of all vehicle insurance coverage including power, driving area or whether there is a second driver that drives the car occasionally. Third, the policy data file for homeowners insurance has information on the property such as value of the building (essentially the value of the home without any furniture, apparel and personal items), location and type of dwelling. Besides these three sources, we have access to data containing information on the number of claims and total cost of those claims per year and per policy type. So, for all policies that are in force, we finally have up to a five year record of the yearly cost of claims in the motor insurance and in the home coverage. If the customer does not renew one of those two policies or both, we do not have more information after this lapse occurs. After summarizing the data, we provide the usual marginal analysis, where we fit regression models using Tweedie distributions for claims and a logistic model for lapse. Data can be used for joint analysis of insurance policyholders with more than one product.

Keywords: Customer retention; Dependence; Heavy tails; Homeowners insurance; Loss data; Loyalty; Motor insurance; Premium; Ratemaking.