Directional Association Measurement in Contingency Tables: Genomic Case

J Comput Biol. 2019 Mar;26(3):235-240. doi: 10.1089/cmb.2018.0202. Epub 2018 Dec 18.

Abstract

Analysis of large data sets is currently a major challenge. Strong efforts are being undertaken to tackle this problem by developing new methods or modifying existing ones. The Z association method is a new method for describing directional association in contingency tables. It allows to arbitrarily group categories for each of the two variables, for which the contingency table is analyzed. The Z coefficient was calculated on a sample data set with gene mutations in different cancer types. Results showed some association with both gene mutations and annotation groups. Detailed results obtained for particular cancer types versus particular genes and annotation groups were in line with well-known facts in cancer genomics. The "MEUSassociation" R library allows to analyze the directional association between two categorical variables, and the mutual relationship is summarized in a contingency table, by means of the Z association coefficient. The method implemented in the library allows to compute the standard Z coefficient and to apply it in a case, where all possible singular coefficients Z(A:B) are computed at the same time, giving information of association between particular rows and columns. Investigating the ranked list of the highest singular coefficients allows to reduce the complexity of a large-scale data set. Both the Z coefficient and its R implementation are important tools in categorical data analysis.

Keywords: association coefficient; associations in contingency tables.

MeSH terms

  • Big Data*
  • Genome, Human
  • Genome-Wide Association Study / methods*
  • Genome-Wide Association Study / standards
  • Genomics / methods*
  • Genomics / standards
  • Humans
  • Neoplasms / genetics*
  • Software*