Seriation Using Tree-penalized Path Length

Eur J Oper Res. 2023 Mar 1;305(2):617-629. doi: 10.1016/j.ejor.2022.06.026. Epub 2022 Jun 17.

Abstract

Given a sample of n data points and an n by n dissimilarity matrix, data seriation methods produce a linear ordering of the objects, putting similar objects nearby in the ordering. One may visualize the reordered dissimilarity matrix with a heat map and thus understand the structure of the data, while still displaying the full matrix of dissimilarities. Good orderings produce heat maps that are easy to read and allow for clear interpretation. We consider two popular seriation methods, minimizing path length by solving the Traveling Salesman Problem (TSP), and Optimal Leaf Ordering (OLO), which minimizes path length among all orderings consistent with a given tree structure. Learning from the strengths and weaknesses of the two methods, we introduce a new hybrid seriation method, tree-penalized Path Length (tpPL). The objective is a linear combination of path length and the extent of violations of the tree structure, with a parameter that transitions the optimal paths smoothly from TSP to OLO. We present a detailed study over 44 synthetic datasets which are designed to bring out the strengths and weaknesses of the three methods, finding that the hybrid nature of tpPL enables it to overcome the weaknesses of TSP and OLO.

Keywords: Optimal Leaf Ordering; TSP; combinatorial optimization; linear orderings; tree penalized distance.