Advances in experimental methods in Biology and reduced costs of performing high-throughput experiments have provided a vast pool of datasets of various types of measurements. These datasets provide insight into different dimensions of the biological system including the genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. To capture the complexity of biological systems, and to compensate for shortcomings of individual datasets, it is important to study these modalities in combination. To this end, finding an appropriate joint representation for multi-omic measurements is indispensable for any secondary analysis. Integrating different data types can provide a more complete representation of the cell and the biological processes under study, disentangle the causal relationship between the different omic layers, and shape new research questions in Biology.
Machine Learning methods have been extensively employed for integrating and analyzing these data types. However, existing multi-omics integration frameworks are limited in their scalability, performance, and the variety of data types they support. In this project, we aim to leverage recent advances in Representation Learning and Deep Learning to tackle these challenges. In particular, we aim to employ Implicit Generative Models and Latent Variable models to learn meaningful representations of multi-omics datasets and to analyze them for a secondary biological application. We focus on tailoring our solution to cope with the limitations of biological datasets, interpretability of the results, as well as model performance and reusability.