Cut points and contexts

Cancer. 2021 Dec 1;127(23):4348-4355. doi: 10.1002/cncr.33838. Epub 2021 Aug 23.

Abstract

In research, policy, and practice, continuous variables are often categorized. Statisticians have generally advised against categorization for many reasons, such as loss of information and precision as well as distortion of estimated statistics. Here, a different kind of problem with categorization is considered: the idea that, for a given continuous variable, there is a unique set of cut points that is the objectively correct or best categorization. It is shown that this is unlikely to be the case because categorized variables typically exist in webs of statistical relationships with other variables. The choice of cut points for a categorized variable can influence the values of many statistics relating that variable to others. This essay explores the substantive trade-offs that can arise between different possible cut points to categorize a continuous variable, making it difficult to say that any particular categorization is objectively best. Limitations of different approaches to selecting cut points are discussed. Contextual trade-offs may often be an argument against categorization. At the very least, such trade-offs mean that research inferences, or decisions about policy or practice, that involve categorized variables should be framed and acted upon with flexibility and humility. LAY SUMMARY: In research, policy, and practice, continuous variables are often turned into categorical variables with cut points that define the boundaries between categories. This involves choices about how many categories to create and what cut-point values to use. This commentary shows that different choices about which cut points to use can lead to different sets of trade-offs across multiple statistical relationships between the categorized variable and other variables. These trade-offs mean that no single categorization is objectively best or correct. This context is critical when one is deciding whether and how to categorize a continuous variable.

Keywords: data analysis; statistical data interpretation; statistics; translational medical research; translational medical science.

Publication types

  • Research Support, N.I.H., Extramural