Feature identification for single-cell omics data (2018 - 2023)

Cells are the building blocks of all multicellular organisms. Generally speaking, the DNA in each cell in a single organism is identical. Yet each different type of cell has its specialized function. These functional differences occur because cells of a particular identity transcribe a distinct set of genes into RNA molecules, many of which the cell then translates into proteins that determine cell structure, function, and identity. We do not yet fully understand the mechanisms that determine which genes and proteins a given cell produces. What we do know, however, is that the packing of DNA into a structure called chromatin plays a role. It is this packing that permits a 2-meter-long strand of DNA to fit into a cell nucleus with a diameter of no more than roughly 6 micrometres. If a gene lies in a region of the DNA that is tightly packed, the gene is not accessible for binding by the molecules that govern its transcription into RNA molecules. Thus, genes in inaccessible chromatin regions are not transcribed into RNA. However, protein-encoding regions make up just 2% of the human genome, and the accessibility of genomic regions alone does not explain cell-to-cell differences. Namely, non-protein-coding regions of the DNA, e.g. cis-regulatory regions, regulate gene expression. These regions, too, cannot exert their function if they are not accessible. Ultimately, the abundance of particular RNAs and the accessibility of chromatin together provide a starting point for unravelling the processes underlying cell identity acquisition and cell function.
Recently, researchers have begun measuring RNA abundance, chormatin accessibility, and more, in individual cells using so called single-cell omics assys. Analysis of the data obtained from these single-cell omics assays may provide novel insights into how cells aquire their identity. However, analysis of this data is complicated by its high-dimensional, sparse, and noisy nature. High dimensionality refers to the fact that tens of thousands of genes or hundreds of thousands of DNA region are measured in thousands to millions of cells. Sparsity occurs because most genes are not expressed in any given cell, and most regions of chromatin are not accessible. Besides, due to technical limitations, not all genes that are expressed or chromatin regions that are accessible in a given cell are captured. The combination of inherent sparsity and futher technical limitations results in noisy data with a poor signal-to-noise ratio. Taken together, these data characteristics complicate the identifcation of biologically meaningful patterns from the data, especially for genes that expressed at very low levels, or in only a few cells. This is of particular concern when considering cells at different stages of development since differences between cells may be restricted to the expression of only a few genes or subtle changes in chromatin accessibility.
In this project, we aim to develop methods to identify RNA molecules and cis-regulatory regions that characterize cell types and regulate the acquisition of cell identity. For this, we will adapt existing analytical approaches for the analysis of data representing continuous differentiation processes, without discretizing cells indetities into distinct cell states. This criterion is essential if we hope to identify genes and cis-regulatory regions that govern the development of cells in health and disease, where disease occurs due to abberent cell functions induced by disregulation of gene expression.
Peer-reviewed Publications (journal or conference)
- P. Rautenstrauch, A.H.C. Vlot, S. Saran, and U. Ohler (2021). Intricacies of single-cell multi-omics data integration. Trends in Genetics.https://doi.org/10.1016/j.tig.2021.08.012
- R. Shahan, CW. Hsu, T.M. Nolan, B.J. Cole, I.W. Taylor, A.H.C. Vlot, P.N. Benfey, and U. Ohler (2022). A single cell Arabidopsis root atlas reveals developmental trajectories in wild type and cell identity mutants.Developmental Cell 57(4), 543-560.e9. https://doi.org/10.1016/j.devcel.2022.01.008
- A.H.C. Vlot, S. Maghsudi, and U. Ohler (2022). Cluster-independent marker feature identification from single-cell omics data using SEMITONES.Nucleic Acids Research, gkac639. https://doi.org/10.1093/nar/gkac639
Other (presentations at conferences or preprints)
- R. Shahan, CW. Hsu, T.M. Nolan, B.J. Cole, I.W. Taylor, A.H.C. Vlot, P.N. Benfey, and U. Ohler (2020). A single cell Arabidopsis root atlas reveals developmental trajectories in wild type and cell identity mutants. bioRxiv 2020.06.29.178863. https://doi.org/10.1101/2020.06.29.178863
A.H.C. Vlot, S. Maghsudi, and U. Ohler. Identification of marker genes and cis-regulatory regions using Single-cEll Marker IdentificaTiON by Enrichment Scoring (SEMITONES). (Poster presentation), 13th annual RECOMB/ISCB Conference on Regulatory & Systems Genomics with DREAM Challenges, Online, 16-19 November 2020.
A.H.C. Vlot, S. Maghsudi, and U. Ohler. Single-cEll Marker IdentificaTiON by Enrichment Scoring. (Poster and oral presentation), ISMB/ECCB 2021, Online, 25-30 July 2021.
- A.H.C. Vlot, S. Maghsudi, and U. Ohler. Identification of cis-regulatory regions using Single-cEll Marker IdentificaTiON by Enrichment Scoring (SEMITONES). (Poster presentation), EMBO Workshop Enhanceropathies: Understanding enhancer function to understand human disease, 6-9 October 2021.