Multi-resolution models for single-cell genomics data (2022 - )
Single-cell genomics can obtain molecular data for tens of thousands of cells simultaneously. A typical experiment is carried out on a complex sample that contains different cell types, and can measure different cellular properties, such as the number of messenger RNA molecules per cell. Typical tasks include identifying distinct cell types (e.g. via unsupervised embeddings, ) or inferring a pseudo-temporal ordering of cells along developmental stages. A particular opportunity arises from single-cell genome accessibility data, which provides information about which of several million gene switches, so called regulatory regions, are accessible/on or inaccessible/off . These data can be analyzed at multiple resolutions: At the level of whole regions, to identify where active switch regions are and to infer which genes they may regulate, or at the level of short DNA sequence patterns within the regions, which are recognized by proteins to specifically activate the switches in e.g. different cell-types. Models to utilize the power of single-cell genomics data, and accessibility in particular, are still in their infancy. The main challenge is that the higher number of cells (i.e. samples) is accompanied by high dropout: the readout covers only a few percent of all variables, and the resulting discrete count data is sparse. Additionally, ground truth experimental data only exists for a handful of scenarios, making it hard to develop practically useful methods that work beyond simulated data.
The project will utilize data from the Schuelke lab to develop deep neural network approaches in the Ohler lab that enable flexible multi-resolution analyses: the goal is to devise models that are able to infer both active regulatory regions and the functional sequence patterns in them, while (a) leveraging data from smaller or larger cell neighborhoods as needed; (b) accounting for confounders such as variable dropout and cell type mixtures; and (c) utilizing auxiliary data from other single-cell experiments.
Peer-reviewed Publications (journal or conference)
Other (presentations at conferences or preprints)