Statistical integration of multiple `omics data sets — ASN Events

Statistical integration of multiple `omics data sets (#3)

Kim-Anh Le Cao 1
  1. University of Queensland Diamantina Institute, Wollongabba, QUEENSLAND, Australia

The advent of high throughput technologies has led to a wealth of biological data coming from different sources, the so-called ‘omics data. In order to understand biological mechanisms and uncover important biological insights, we need to adopt a holistic and systems biology approach to analyse those complex data.

Univariate statistical approaches consider each biological variable independently to explain or model biological conditions or phenotypes. To shift the univariate analysis paradigm, we have developed several multivariate methods to identify a subset of variables - a ‘molecular or microbial signature’. 

I will introduce several multivariate methods we have developed to statistically integrate several ‘omics data sets at once. Here I refer to data integration in a broad sense, either where the same individuals are profiled using different ‘omics platforms or where independent studies including different individuals are generated under similar biological conditions using the same ‘omics platform. Both types of methods attempt to address the issue of data heterogeneity due to inherent platform-specific artefacts or systematic differences arising due to being assayed at different geographical sites or different times.

All our methods are available in our R package mixOmics (www.mixOmics.org), which is dedicated to the exploration and integration of biological data sets.