Lecture Series:

Cross-species integration of data and text

Wednesday, 19.06.2019 · 16:00

Speaker: Lars Juhl Jensen, University of Copenhagen, Denmark

Methodological advances have in recent years given us unprecedented information on the molecular details of living cells. However, it remains a challenge to collect all the available data on individual genes and to integrate it with what is described in the scientific literature. The latest version of the STRING database aims to address this by consolidating known and predicted protein–protein association data across more than 5000 organisms. I will give an overview of the general approach we use to unify heterogeneous data, provide comparable quality scores for all evidence types, automatically mine associations from the biomedical literature, and transfer data by orthology to predict both intra- and inter-species associations. I will also cover the new statistical network analysis methods introduced in the latest version of STRING. Finally, I will briefly talk about how the same techniques used in STRING are also used to create a suite of related database resources that link proteins to ontology terms for cell parts, tissues,
and diseases.