Alumni

Siddhant Agarwal
DLR - TU Berlin

Contact

Siddhant Agarwal
Unravelling the Interior Evolution of Terrestrial Planets Through Machine Learning

Supervisors:

Doris Breuer (DLR)

Nicola Tosi (TU)

Klaus-Robert Müller (TU)

 

Studying how rocky planets like Mercury, Venus, the Earth and Mars evolve over billions of years requires detailed modelling of mantle convection, the main driver of planetary evolution. The mantle - sandwiched between the crust and the core - behaves like a highly viscous fluid over geological time scales and hence can be quantified through equations describing conservation of mass, momentum and energy. These non-linear partial differential equations are typically solved numerically using fluid dynamics codes. However, the parameters and initial conditions to these equations are poorly known. Whereas certain outputs of the simulations (numerically solved equations) can be "observed'' via spacecraft missions and used to constrain key parameters and initial conditions, thus elucidating the basic physics and evolution of planets. Since each simulation can take from several hours to weeks to run, varying parameters extensively and repeatedly is often impractical. We aim to overcome this computational bottleneck by learning the mapping between parameters and observables through a combination of state-of-the-art geodynamic modelling, machine learning and high-performance computing.

 

Doctoral thesis

S. Agarwal (2022). Unraveling the interior evolution of terrestrial planets through machine learning. Technische Universität zu Berlin.

doi:10.14279/depositonce-15926

 

Full-length publications

  1. S. Agarwal, N. Tosi, D. Breuer, S. Padovan, P. Kessel, and G. Montavon (2020). A machine-learning-based surrogate model of Mars’ thermal evolution. Geophysical Journal International, 222(3), 1656-1670.  https://doi.org/10.1093/gji/ggaa234
  2. S. Agarwal, N. Tosi, P. Kessel, S. Padovan, D. Breuer, and G. Montavon (2021). Towards constraining Mars’ thermal evolution using Machine Learning.  Earth and Space Science.https://doi.org/10.1029/2020EA001484
  3. S. Agarwal, N. Tosi, P. Kessel,  D. Breuer, and G. Montavon (2021). Deep learning for surrogate modeling of two-dimensional mantle convection. Physical Review Fluids, 6, 113801. https://doi.org/10.1103/PhysRevFluids.6.113801

 

Conference presentations

  1. S. Agarwal, N. Tosi, D. Breuer, S. Padovan, P. Kessel, and G. Montavon. Unravelling interior evolution of terrestrial planets using Machine Learning. (Oral presentation), Artificial Intelligence in Astronomy at ESO, Garching, Germany, 22-26 July 2019.
  2. S. Agarwal, N. Tosi, D. Breuer, P. Kessel, and G. Montavon. Using machine learning to predict 1D steady-state temperature profiles from compressible mantle convection simulations. (Oral presentation), 72nd Annual Meeting of the APS Division of Fluid Dynamics, Seattle, USA, 23-26 November 2019.
  3. S. Agarwal, N. Tosi, P. Kessel, D. Breuer, S. Padovan, and G. Montavon. Mars’ thermal evolution from machine-learning-based 1D surrogate modelling. (Oral presentation), EGU General Assembly, Online, 7 May 2020.
  4. S. Agarwal, N. Tosi, P. Kessel, D. Breuer, S. Padovan, and G. Montavon. Learning high dimensional surrogates from mantle convection simulations. (Oral presentation), 73rd Annual Meeting of the APS Division of Fluid Dynamics, Online, 23 November 2020.
  5. S. Agarwal, N. Tosi, P. Kessel, S. Padovan, D. Breuer, and G. Montavon. Towards constraining Mars' thermal evolution using machine learning. (PICO presentation), EGU General Assembly, Online, 19-30 Apr 2021. https://doi.org/10.5194/egusphere-egu21-4044
  6. S. Agarwal, N. Tosi, P. Kessel, D. Breuer, and G. Montavon. Deep learning for surrogate modelling of 2D mantle convection, oral presentation. German-Swiss Geodynamics Workshop 2021, Bad Belzig, 29 Aug–1 Sep 2021.
  7. S. Agarwal, N. Tosi, P. Kessel, D. Breuer, and G. Montavon. Deep learning for surrogate modelling of 2D mantle convection. (Oral presentation), European Planetary Science Congress 2021, Online, 13–24 Sep 2021. https://doi.org/10.5194/epsc2021-218
  8. S. Agarwal, N. Tosi, P. Kessel, D. Breuer, and G. Montavon. Deep learning for surrogate modelling of 2D mantle convection. (Oral presentation), The 74th Annual Meeting of the Division of Fluid Dynamics, Online, 21-23 Nov 2021.
  9. S. Agarwal, N. Tosi, P. Kessel, D. Breuer, and G. Montavon. A machine learning framework for constraining mantle convection parameters. (Oral presentation), American Geophysical Union Fall Meeting,  New Orleans, 13-17 Dec 2021.

Philipp Baumeister
DLR - TU Berlin

Contact

Philipp Baumeister
Multi-satellite Approach of Monitoring Atmosphere/Magnetosphere Space Weather Interactions

Associated Doctoral Researcher

Supervisors:

Nicola Tosi (DLR)

Grégoire Montavon (TU)

 

Since the first discoveries of extrasolar planets in the 1990s, more than 4000 exoplanets have been discovered to date, and the number is growing rapidly with new dedicated space and ground-based surveys. From radius measurements via transit observations and mass estimations via radial velocity measurements, the inner structure of planets can be modeled numerically. This characterization is crucial for our understanding of the diversity of the observed planets, their formation processes, and the question whether or not they can support life. However, even with accurate radius and mass measurements, many different solutions for the internal structure can be found, since the relative proportions of iron, silicates, water ice, and volatile elements are not known.

The goal of this project is to implement machine-learning-based approaches to infer planetary interiors based on observational data, and use those to identify potentially observable parameters that can better constrain the range of possible interior structures. Machine learning can avoid the need for extensive interior modeling for each individual exoplanet by learning from large sets of precalculated data generated with suitable forward models. We aim to develop such an inference framework for the fast characterization of planetary interiors. For a comprehensive view of a planet's evolution, we will link thermal evolution models of the interior to models of atmospheric evolution, and aim at including results from population synthesis modeling. These models contain essential information on the structure, composition and evolution history of planets, linking the planet interior to the star system they reside in, and supply us with a large data set of synthetic planets that have formed under the physical constraints of their formation model.

The result of this project will be a comprehensive inference model capable of rapidly determining the range of physically meaningful interiors of observed exoplanets, which will open up new possibilities for finding observable parameters that are particularly important in constraining possible internal structures.

 

Doctoral thesis

P. Baumeister (2023). Interior structure, mantle-atmosphere co-evolution, and habitability of low-mass exoplanets. Technische Universität zu Berlin. doi.org/10.14279/depositonce-19452

 

Full-length publications

  1. P. Baumeister, S. Padovan, N. Tosi, G. Montavon, N. Nettelmann, J. MacKenzie, and M. Godolt (2020). Machine-learning Inference of the Interior Structure of Low-mass Exoplanets. Astrophysical Journal, 889, 42. https://doi.org/10.3847/1538-4357/ab5d32
  2. S. Padovan, T. Spohn, P. Baumeister, N. Tosi, D. Breuer, S. Csizmadia, H. Hellard and F. Sohl (2018). Matrix-propagator approach to compute fluid Love numbers and applicability to extrasolar planets.   Astronomy & Astrophysics, 620, A178. https://doi.org/10.1051/0004-6361/201834181

 

Conference presentations

  1. P. Baumeister, S. Padovan, N. Tosi, G. Montavon, N. Nettelmann, J. MacKenzie, and M. Godolt. Machine-learning inference of the interior structure of low-mass exoplanets. (Oral presentation), EGU General Assembly 2020, Vienna, Austria, 4 - 8 May 2020.
  2. P. Baumeister, S. Padovan, N. Tosi, G. Montavon, N. Nettelmann, J. MacKenzie and M. Godolt. Using machine learning to infer the interior structure of exoplanets. (Oral presentation), EPSC-DPS Joint Meeting 2019, Geneva, Switzerland, 15 - 20 September 2019.
  3. P. Baumeister, S. Padovan, N. Tosi, G. Montavon, J. MacKenzie and M. Godolt. Using mixture density networks to infer the interior structure of exoplanets. (Poster presentation), Artificial Intelligence in Astronomy Workshop, ESO, Garching, Germany,  22 - 26 July 2019.
  4. P. Baumeister, S. Padovan, N. Tosi, G. Montavon. Using deep learning neural networks to predict the interior composition of exoplanets. (Poster presentation), PLATO Theory Workshop 2018, Cambridge, UK, 3 - 5 December 2018.
  5. P. Baumeister, J. MacKenzie, N. Tosi, and M. Godolt. Effects of different equations of state on interior structure models of exoplanets. (Oral presentation), 7th Joint Workshop on High Pressure, Planetary and Plasma Physics (HP4), Berlin, Germany, 10 - 12 October 2018.
  6. P. Baumeister, J. MacKenzie, N. Tosi, and M. Godolt. Effects of different equations of state on interior structure models of exoplanets. (Oral presentation), European Planetary Science Congress 2018, Berlin, Germany, 16 - 21 September 2018.

Ivo Daniel
TU -Berlin

Contact

Ivo Daniel
Data-driven methods for anomaly detection in Water Distribution Networks

Associated Doctoral Researcher

Supervisors:

Andrea Cominola (TU)

 

Water losses are one of the main consequences of infrastructure failures in water distribution networks. While background leakages and pipe bursts in well maintained systems generally amount to only 3-7% of the total water supplied, they can account for more than 50% for poorly maintained networks worldwide. Methods that support prompt detection and accurate localization of leakages are crucial to help water utilities implement timely mitigation measures and avoid unnecessary loss of water.

Leakages can be classified as one type of anomaly occurring in water distribution networks. Broadly speaking, methods for their detection are referred to as anomaly detection methods. Anomaly detection methods have been studied extensively in the context of intrusion into information networks, and applied to water distribution networks in the similar context of cyber-attacks on SCADA systems. However, most current approaches for leakage detection rely on in-situ, engineering-based technology, while the development and application of data-driven approaches still poses several research challenges.

The goal of this project is to develop data-driven methods that are capable of detecting leakages in water distribution networks in real-time. As this research originated in an international competition, the BattLeDIM - Battle on Leakage Detection and Isolation Methods (http://battledim.ucy.ac.cy), its foundation is built upon the BattLeDIM dataset, inferring that the focus is put on the analysis of high resolution pressure data provided by a network of sensors located throughout the system. Data Mining and Machine Learning frameworks offer a wide range of opportunities for the analysis of this data and are comparatively utilized to identify and localize leakages as the primary type of anomaly.

The development of a data-driven methodology for leakage detection opens up the possibility to be extended to other applications in water distribution systems, including real-world systems, and assess their transferability to other problems where anomaly detection may be beneficial. The development of such an effective, decentralized framework implies the opportunity for additional research on IoT sensors, their communication interface, and their placement. Further research may be targeting wastewater systems to evaluate whether the developed methods may be cost-effectively transferred or adapted.

 

Doctoral thesis

I. Daniel (2023). Physics-Informed Anomaly Detection in Water Distribution Systems - Advancing Digital Transformation of Urban Water Management. Technische Universität zu Berlin. doi:10.14279/depositonce-19551

 

Full-length publications

  1. I. Daniel, J. Pesantez, S. Letzgus, M.A. Khaksar Fasee, F. Alghamdi, E. Berglund, G. Mahinthakumar, and A. Cominola (2022). A sequential pressure-based algorithm for data-driven leakage identification and model-based localization in water distribution networks. Journal of Water Resources Planning and Management, 148, 6. DOI:10.1061/(ASCE)WR.1943-5452.0001535
  2. I. Daniel, N.K. Ajami, A. Castelletti, D. Savic, R.A. Stewart, and A. Cominola (2023). A survey of water utilities’ digital transformation: drivers, impacts, and enabling technologies. npj Clean Water, 6, 51. https://doi.org/10.1038/s41545-023-00265-7
  3. I. Daniel, G.R. Abhijith, L. Kadinski, A. Ostfeld, and A. Cominola, A. (2023). A Machine Learning-Based Surrogate Model for Coupled Hydraulic and Water Quality Simulation in Water Distribution Networks. Proceedings of the World Environmental and Water Resources Congress, 817–830. https://ascelibrary.org/doi/10.1061/9780784484852.077
  4. I. Daniel, and A. Cominola (2023). Estimating irregular water demands with physics-informed machine learning to inform leakage detection. arXiv. https://doi.org/10.48550/ARXIV.2309.02935 [Preprint]

 

Conference presentations

  1. I. Daniel, N. Ajami, A. Castelletti, D. Savic, R. Stewart, M. Becker, and A. Cominola. How is digital transformation impacting the water utility sector? - Insights from a worldwide online utility survey. (Oral presentation), EGU General Assembly 2021, Online, 19–30 Apr 2021. https://doi.org/10.5194/egusphere-egu21-12540
  2. I. Daniel, J. Pesantez, S. Letzgus, M.A. Khaksar Fasee, F. Alghamdi, E. Berglund, G. Mahinthakumar, and A. Cominola. Leakage identification and localization on the BattLeDIM dataset: testing and performance evaluation of a high-resolution pressure-driven method. (Oral presentation), World Environmental & Water Resources Congress, Online, 7-11 Jun 2021.
  3. G. Pedron, I. Daniel, D. Tilcher, A. Cominola and A. Crescenti. Gaza H2.0: promoting sustainable water supply and demand and knowledge transfer to enhance water infrastructure resilience in the Gaza Strip. (Oral presentation), 42nd WEDC International Conference, online, 13-15 September 2021. https://hdl.handle.net/2134/16903540.v1
  4. I. Daniel, N. Ajami, A. Castelletti, D. Savic, R. Stewart, and A. Cominola. How Is Digital Transformation Impacting The Water Utility Sector? Insights From A Worldwide Online Utility Survey. (Oral presentation), IWA World Water Congress & Exhibtion, Copenhagen, 11-15 September 2022.
  5. I. Daniel, and A. Cominola. A calibration-free pressure-driven approach to leak detection and localization in water distribution networks. (Oral presentation), World Environmental & Water Resources Congress, Henderson, NV, USA, 21-24 May 2023.
  6. I. Daniel, and A. Cominola. Physics-Informed Neural Networks to enhance leakage detection in drinking water distribution systems. (Oral presentation), EGU General Assembly, Vienna, Austria, 24-28 April 2023. https://doi.org/10.5194/egusphere-egu23-12186
  7. A. Cominola, I. Daniel, D. Tilcher, A.J.S. Alasmar, R.M.M. Ziara, and G. Pedron. Enhancing the resilience of intermittent water supply systems in Khan Younis, Gaza Strip. Knowledge transfer and lessons learned from the Gaza H2.0 project. (Oral presentation), EGU General Assembly, Vienna, Austria, 24-28 April 2023. https://doi.org/10.5194/egusphere-egu23-13100

Veronika Doepper
AWI - TU Berlin

Contact

Veronika Doepper
Tracing 3-D high latitude environmental change with billions of remotely sensed points

Supervisors:

Ulrike Herzschuh (AWI)

Guido Grosse (AWI)

Birgit Kleinschmit (TU)

 

Our goal is to employ vomputer vision and data science methods to advance the data handling, analyses and interpretation of the wealth of Big Data 3-D remotely sensed environmental data acquired by the AWI on polar expeditions. Objective: Data acquisitions from drone-borne and airplane passive and active optical imaging sensors over large areas in Siberia and Alaska resulted in multi-temporal datasets of billions of remotely sensed points in spatially explicit point clouds. Computer vision already enables notable advancements in 2-D Big Data environmental data science. In addition, a wide range of 3-D sensors is employed to investigate polar terrestrial environments and permafrost landscapes: ground-based, drone-borne, and airplane active optical laser scanning devices and passive optical densely overlapping imaging from different view angles provide 3-D point cloud data consisting of billions of individual measurements of surface structures, including vegetation and permafrost landscape topography in the circumpolar region. Approach: 3-D point clouds containing billions of remotely sensed points over large areas are the products from the drone-borne and airplane LIDAR as well as high-resolution Red Green Blue and Red Green Near Infrared cameras that allow stereo photogrammetric derivation of point clouds. We need machine learning on the Big Data 3-D point clouds that will also allow us to analyse the characteristics and interactions among the points in 3-D space to enable recognition of tree species and terrestrial degradation features (classification), to enable segmentation to identify the meaning of the environmental objects, and to develop advanced automated change detection tools for multi-temporal point cloud datasets. We will apply two use cases using high latitude biodiversity and permafrost landscape diversity.

 

Full-length publications

-

Conference presentations

  1. V. Döpper, R. Jackisch, J. Gloy, T. Rettelbach, J. Boike, …, G. Grosse, and S. Kruse. Towards an automatic segmentation and classification of multi-source point clouds for Arctic to boreal permafrost ecosystem analysis. (Poster presentation), EGU General Assembly, Vienna, Austria, 24–28 April, 2023. https://doi.org/10.5194/egusphere-egu23-15600
  2. V. Döpper et al. Unlocking the Potential of Arctic to Boreal Multi-Source Point Clouds: Deep and Transfer Learning for Automated Segmentation and Classification. (Poster presentation), SilviLaser 2023, London, UK, 6-8 September, 2023.

Peter Hirsch
MDC

Contact

Peter Hirsch
Development and Application of Novel Methods to Analyze Cells and Cell Lineages in a High Throughput Manner

Associated Doctoral Researcher

Supervisors:

Dagmar Kainmueller (MDC)

 

Many experiments in biology require a large number of samples to allow for conclusive statements. It is thus of utmost importance to reduce the cost per sample as much as possible. This cost can be both in terms of money and time. If a single sample takes multiple hours and one needs hundreds or thousands of samples, the undertaking quickly becomes infeasible. Every significant reduction in manual work can thus make an infeasible project feasible.

In this project we want to study the effect of changes (mutations) in the genome - some of them lethal - on the development of C. elegans embryos and its cell lineage. Yet this requires the tracking of all their cells over time and through cell divisions. While some automatic methods to do this exist, all require several hours of manual curation per sample to get an error-free result.
To overcome this, we are developing new tracking algorithms employing modern machine learning methods applied to volumetric time series data (3d+time).

C. elegans provides us with a prime example. Its development is stereotypical, each wild type (without mutations) organism exhibits the identical number of cells and division pattern. This makes it possible to automatically pin-point both errors in the tracking algorithm and true changes in the development due to mutations.

Analyzing these changes will help us to expand our understanding of the gene regulatory networks induced by the genome, and how they are affected by mutations, a key challenge of developmental biology.

 

Doctoral thesis

P.J. Hirsch (2023). Segmentation and Tracking of Cells and Nuclei Using Deep Learning. Humboldt Universität. https://doi.org/10.18452/26934

 

Full-length publications

  1. A. Krull*, P. Hirsch*, C. Rother, A. Schiffrin, and C. Krull (2020). Artificial-intelligence-driven scanning probe microscopy. (*shared first) Commun Phys 354. https://doi.org/10.1038/s42005-020-0317-3
  2. P. Hirsch, and D. Kainmueller (2020). An auxiliary task for learning nuclei segmentation in 3D microscopy images. Proceedings of Machine Learning Research 121(304), 318.
  3. L. Mais*, P. Hirsch*, and D. Kainmueller (2020). PatchPerPix for instance segmentation. (*shared first) In: Vedaldi A., Bischof H., Brox T., Frahm JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science, vol 12370. Springer, Cham. https://doi.org/10.1007/978-3-030-58595-2_18
  4. J.L. Rumberger*, X. Yu*, P. Hirsch*, M. Dohmen*, V.E. Guarino*, A. Mokarian, L. Mais, J. Funke, and D. Kainmueller (2021). How Shift equivariance impacts metric learning for instance segmentation. (*shared first) In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
  5. L. Mais, P. Hirsch, C. Managan, K. Wang, K. Rokicki, R.R. Svirskas, B.J. Dickson, W. Korff, G.M. Rubin, G. Ihrke, G.W. Meissner, and D. Kainmueller (2021). PatchPerPixMatch for Automated 3d Search of Neuronal Morphologies in Light Microscopy. bioRxiv. https://doi.org/10.1101/2021.07.23.453511

  6. P. Hirsch, C. Malin-Mayor, A. Santella, S. Preibisch, D. Kainmueller, and J. Funke (2022). Tracking by Weakly-Supervised Learning and Graph Optimization for Whole-Embryo C. elegans lineages. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. Lecture Notes in Computer Science, vol 13434. Springer, Cham. https://doi.org/10.1007/978-3-031-16440-8_3
  7. J.L. Rumberger, E. Baumann, P. Hirsch, A. Janowczyk, I. Zlobec and D. Kainmueller (2022). Panoptic segmentation with highly imbalanced semantic labels. 2022 IEEE International Symposium on Biomedical Imaging Challenges (ISBIC), p. 1-4. https://doi.org/10.1109/ISBIC56247.2022.9854551
  8. P. Hirsch, L. Epstein, and L. Guignard (2020). Chapter 20 - Mathematical and bioinformatic tools for cell tracking. In: M. Schnoor, L-M. Yin, S.X. Sun (eds) Cell Movement in Health and Disease, Academic Press, 2022, p. 341-361, ISBN 9780323901956. https://doi.org/10.1016/B978-0-323-90195-6.00013-9
  9. C. Malin-Mayor, P. Hirsch, L. Guignard, K. McDole, Y. Wan, W.C. Lemon, D. Kainmueller, P.J. Keller, S. Preibisch, and J. Funke (2023).  Automated reconstruction of whole-embryo cell lineages by learning from sparse annotations. Nat Biotechnol 41, 44–49. https://doi.org/10.1038/s41587-022-01427-7

 

Conference presentations

  1. P. Hirsch and D. Kainmueller. An Auxiliary Loss for Learning Nuclei Segmentation in 3D Microscopy Images. (Poster presentation), Frontiers in Imaging Science II, Janelia Research Campus, 1-4 May 2019.
  2. P. Hirsch, J.L. Rumberger, X. Yu,  M. Dohmen, V.E. Guarino, A. Mokarian, L. Mais, J. Funke, and D. Kainmueller. What can go wrong with tile&stitch? (Poster presentation), Crick Bioimage Analysis Symposiym, London, U.K., 22-23 November 2021.

Thorren Kirschbaum (né Gimm)
HZB - FU Berlin

Contact

Thorren Kirschbaum (né Gimm)
Data-Driven Time-Dependent Multiphysics Simulation and Optimization of Electron Solvation from Nanodiamonds

Supervisors:

Joachim Dzubiella (HZB)

Frank Noé (FU)

 

The world is facing an ever-increasing demand for energy and resources as the scarcity of resources and the pollution of the environment are forcing us to redesign the foundations of global economies. New methods of producing „green“ energy and chemical base materials are in heavy demand. Hydrogen generated from environmentally neutral processes has the potential to provide both: a zero-emission energy carrier and chemical feedstock. However, the processes needed for the „clean“ production of hydrogen are not yet economically viable on a large scale. This project explores a novel way to generate hydrogen by splitting water into its elements, H2 and O2.

A key goal of modern energy research is to find efficient ways to achieve this splitting. The process relies on the efficient reduction of water hydrogen and oxidation of water oxygen. It has long been known that electrons solvated in water are the ideal, most direct agents to induce this reduction, but typically generating them has required harsh reaction conditions that have limited this approach. Very recently, however, a relatively mild production process was experimentally achieved using hydrogen-covered nanodiamonds illuminated by light. The process is conceived as follows: (1) The nanodiamond is excited and an electron moves towards the particle‘s surface, which permits (2) the electron to transfer into the interfacial water and (3) to move into the solution, where (4) it eventually reacts. Still we are far from understanding the precise mechanism underlying this effect, which is a key to improving and scaling-up its performance.

To learn more about the electron generating processes, we plan to model the electron transfer and solvation dynamics (described in (1)-(3) above) using coupled multi-scale electron and nuclear dynamics methods. Additionally, we will optimize the reaction paramters through a combination of quantum chemistry and machine learning. Steps (1-2) require intricate quantum electron dynamics (ED) calculations, which can be done only for a small number of molecular conformations. Steps (2-3) rely on electron hopping/transfer rates in conjunction with statistical interface physics and simulations of the molecular dynamics (MD) of the diamond/water interface. Deep learning will be used to approximate results from ED to parametrize MD simulations and create a time-dependent multi-physics description of the full process. This should give us a significantly better understanding of the system. Subsequently, we will use methods of optimal control to find the most efficient electron solvation process, in which the optimal control parameters are surface decoration, UV pulse (intensity, duration, shape), and temperature. Furthermore, the nanodiamonds‘ electronic properties will be optimized for excitation by sunlight through an approach that combines density functional theory (DFT) and supervised machine learning.

 

Doctoral thesis

T. Kirschbaum (2023). On the Electronic Structure of Nanodiamonds for Photocatalysis. Freie Universität Berlin.

 

Full-length publications

  1. J. Ren, L. Lin, K. Lieutenant, C. Schulz, D. Wong, T. Gimm, A. Bande, X. Wang, and T. Petit (2020). Role of dopants on the local electronic structure of polymeric carbon nitride photocatalysts. Small Methods 2000707. https://doi.org/10.1002/smtd.202000707
  2. T. Kirschbaum, T. Petit, J. Dzubiella, and A. Bande (2022). Effects of oxidative adsorbates and cluster formation on the electronic structure of nanodiamonds.  J. Comput. Chem., 43,13, 923-929. https://doi.org/10.1002/jcc.26849
  3. F. Buchner, T. Kirschbaum, A. Venerosy, H. Girard, J-C. Arnault, B. Kiendl, A. Krueger, K. Larsson, A. Bande, T. Petit, and C. Merschjann (2022). Early dynamics of the emission of solvated electrons from nanodiamonds in water. Nanoscale, 14,17188-17195. https://doi.org/10.1039/D2NR03919B
  4. K. Palczynski, T. Kirschbaum, A. Bande, and J. Dzubiella (2023). Hydration Structure of Diamondoids from Reactive Force Fields. J. Phys. Chem. C, 127, 6, 3217–3227. https://doi.org/10.1021/acs.jpcc.2c07777
  5. T. Kirschbaum, B. von Seggern, J. Dzubiella, A. Bande, and F. Noé (2023). Machine Learning Frontier Orbital Energies of Nanodiamonds. J. Chem. Theory Comput. 19, 14, 4461–4473. https://doi.org/10.1021/acs.jctc.2c01275
  6. T. Kirschbaum, X. Wang, and A. Bande (2023). Ground and excited state charge transfer at aqueous nanodiamonds. J. Comput. Chem. https://doi.org/10.1002/jcc.27279
  7. X. Wang, P. Krause, T. Kirschbaum, K. Palczynski, J. Dzubiella and A. Bande (2024). Photo-excited charge transfer from adamantane to electronic bound states in water. Phys. Chem. Chem. Phys. https://doi.org/10.1039/D3CP04602H

 

Conference presentations

  1. T. Gimm, X. Wang, K. Palczynski, A. Bande, and J. Dzubiella. Nanodiamond-adsorbate interactions studied by DFT. (Poster presentation), Bunsen-Tagung 2021 - Multi-scale modelling & physical chemistry of colloids, Online, 10-12 May 2021.
  2. T. Gimm, X. Wang, K. Palczynski, A. Bande, and J. Dzubiella. Nanodiamond-adsorbate interactions studied by DFT. (Poster presentation), 57th Symposium of Theoretical Chemistry, Online, 20-24 September 2021.
  3. T. Kirschbaum, B. von Seggern, J. Dzubiella, A. Bande and F. Noé. Machine Learning Frontier Orbital Energies of Nanodiamonds. (Poster presentation), 58th Symposium of Theoretical Chemistry, Heidelberg, Germany, 18-22 September 2022.
  4. T. Kirschbaum, B. von Seggern, J. Dzubiella, A. Bande, and F. Noé. Machine Learning Frontier Orbital Energies of Nanodiamonds. (Oral presentation), Asia Pacific Conference of Theoretical and Computational Chemistry, Quy Nhon, Vietnam, 19-23 February 2023.

Henning Lilienkamp
GFZ - TU Berlin

Contact

Henning Lilienkamp
Enhanced Computational Approaches for Seismic Risk Assessment of Infrastructure Networks

Doctoral thesis

H. Lilienkamp (2024). Enhanced computational approaches for data-driven characterization of earthquake ground motion and rapid earthquake impact assessment. University of Potsdam.

 

Supervisors:

Fabrice Cotton (GFZ)

Giuseppe Caire (TU)

 

In many regions of the world earthquakes pose a persistent threat to the built environment, especially with respect to the civil infrastructures that are now fundamental to our society. In the aftermath of recent earthquakes, such as the 2010‐2011 Christchurch (New Zealand) events, damage to road, railway and utility/communications networks may be the dominant contributor to economic loss, with socio‐economic impacts that can last for a long period after the event and impede the recovery. The importance of analysing the seismic risk and vulnerability of spatially distributed infrastructure networks is becoming widely recognized by engineers, insurers and the scientific community at large. Such analyses present a challenge to scientists and engineers due to the complex interactions between interconnected elements within the infrastructure. The statistical models require a computational complexity so large as to prohibit the real‐time assessment of the post‐event network state. Conversely, simplified models may fail to capture the correlations and dependencies within a system in its entirety. In this project we introduce novel machine learning techniques into this process to provide statistically robust assessments of the performance of a network, in terms of both connectivity and flow, that would allow for rapid evaluation of the impact of an event for use in the immediate aftermath and recovery phase, or as part of a probabilistic assessment of economic loss.

 

Full-length publications

  1. H. Lilienkamp, S. von Specht, G. Weatherill, G. Caire, and F. Cotton (2022). Ground-motion modeling as an image processing task: Introducing a neural network based, fully data-driven, and nonergodic approachBull. Seismol. Soc. Am. 112, 1565–1582. https://doi.org/10.1785/0120220008
  2. H. Lilienkamp, R. Bossu, F. Cotton, F. Finazzi, M. Landès, G. Weatherill, and S. von Specht (2023). Utilization of Crowdsourced Felt Reports to Distinguish High‐Impact from Low‐Impact Earthquakes Globally within Minutes of an Event. The Seismic Record, 3 (1): 29–36. https://doi.org/10.1785/0320220039

 

Conference presentations

  1. H. Lilienkamp, G. Weatherill, F. Cotton, and G. Caire. The role of spatial cross-correlation structures of ground motion fields forseismic risk assessment of spatially distributed assets and infrastructurenetworks.  (Poster presentation), EGU General Assembly, Vienna, Austria, 7–12 April 2019.
  2. H. Lilienkamp, S. Specht, G. Weatherill, and F. Cotton. Exploring the physics and uncertainties in spatial cross-correlation models for ground motion intensity measures.  (Oral presentation), AGU General Assembly, San Francisco, USA, 9–13 December 2019.
  3. H. Lilienkamp, F. Cotton, G. Caire, G. Weatherill, and S. von Specht. Fully data-driven, partially non-ergodic ground motion modeling using convolutional neural networks.  (Poster presentation), Taiwan Earthquake Research Center Annual Meeting, Taiwan, 20-22 October 2020.
  4. H. Lilienkamp, R. Bossu, F. Cotton, F. Finazzi, G. Weatherill, and S. von Spech. Utilization of crowdsourced macroseismic observations to distinguish “high-impact” from “low-impact” earthquakes globally within minutes of an event. (Oral presentation), IUGG23 General Assembly, Berlin, Germany, 11-20 July 2023.


     

Jannes Münchmeyer
GFZ - HU Berlin

Contact

Jannes Münchmeyer
Machine learning for fast and accurate assessment of earthquake source parameters

Supervisors:

Frederik Tilmann (GFZ)

Ulf Leser (HU)

Earthquakes count among the largest natural threats to humans. The current state of research suggests that it will not be possible to predict earthquakes reliably anytime in the near future. What is possible, on the other hand, is to provide reliable early warning in the context of ongoing earthquakes. The goal is to provide warnings a few seconds before strong shaking occurs. Such warnings can trigger automatic reactions, like stopping trains, or alert humans early enough to allow them to seek cover.

Usually, early warnings are based on recording the early, relatively weak shaking caused by an earthquake, inferring its size, and then predicting the level of shaking to follow. However, a look at the physics behind earthquakes reveals a crucial issue in attempting to make such predictions. An earthquake emits seismic waves from a rapidly growing rupture between two tectonic plates. These ruptures can traverse distances of tens or even hundreds of kilometers, and consequently, even when a rupture grows quickly, it might take tens of seconds or even minutes for the full rupture to occur.

There is no scientific consensus on how accurately the size of an earthquake can be assessed at what time during an ongoing rupture. There are two basic positions among experts: one holds that the size of an earthquake can be accurately predicted from its onset or during the first few seconds, and the other that accurate assessment is impossible until the rupture is largely finished. Which of these positions is correct will have a profound impact on the potential of early warning methods: if the earthquake's size can only be determined at the end of the rupture, then only short warning times – if any at all – will be possible.

In this PhD project, I am taking a novel, data-driven approach to the question of predictability. Using machine learning, I will build real-time assessment systems to predict the size of an event during an ongoing rupture. If we can design a model that can accurately assess the size of an earthquake from its first seconds, this will be a demonstration that ruptures can be feasibly predicted. A further step will be to integrate our real-time assessment model into earthquake early warning systems, to improve their performance with our state-of-the-art estimation methodology.

 

Doctoral thesis

J. Münchmeyer (2022). Machine learning for fast and accurate assessment of earthquake source parameters. Humboldt Universität Berlin. doi: 10.18452/25174

 

Full-length publications

  1. L. Weber, J. Münchmeyer, T. Rocktäschel, M. Habibi, and U. Leser (2019). HUNER: Improving biomedical NER with pretraining. Bioinformatics, 36(1), 295-302. 10.1093/bioinformatics/btz528
  2. L. Weber, P. Minervini, J. Münchmeyer, U. Leser, and T. Rocktäschel (2019). NLProlog: Reasoning with weak unification for question answering in natural language. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,  6151-6161. 10.18653/v1/P19-1618

  3. J. Münchmeyer, D. Bindi, C. Sippl, U. Leser, and F. Tilmann (2019). Low uncertainty multi-feature magnitude estimation with 3D corrections and boosting tree regression: Application to North Chile. Geophysical Journal International, 220(1), 142-159.  doi.org/10.1093/gji/ggz416

  4. J. Münchmeyer, D. Bindi, U. Leser, and F. Tilmann (2020). The transformer earthquake alerting model: A new versatile approach to earthquake early warning. Geophysical Journal International, ggaa609. doi.org/10.1093/gji/ggaa609

  5. L. Weber, M. Sänger, J. Münchmeyer, M. Habibi, U. Leser, and A. Akbik (2021). HunFlair: An easy-to-use tool for state-of-the-art biomedical Named Entity Recognition. Bioinformatics, btab042, https://doi.org/10.1093/bioinformatics/btab042

  6. J. Münchmeyer,  D. Bindi, U. Leser, and F. Tilmann (2021). Earthquake magnitude and location estimation from real time seismic waveforms with a Transformer Network. Geophysical Journal International, 226(2), 1086-1104. https://doi.org/10.1093/gji/ggab139

  7. W.J. Foster, G. Ayzel, J. Münchmeyer, T. Rettelbach, N. Kitzmann, T.T. Isson, M. Mutti, and M. Aberhan (2021). Machine learning identifies ecological selectivity patterns across the end-Permian mass extinction. Paleobiology, 1-15.  https://doi.org/10.1017/pab.2022.1 

  8. L. Weber, S. Garda, J. Münchmeyer, and U. Leser (2021). Extend, don’t rebuild: Phrasing conditional graph modification as autoregressive sequence labelling. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 1213–1224.

  9. J.* Münchmeyer, J.* Woollam, F. Tilmann, A. Rietbrock, D. Lange, ..., and H. Soto (2021). Which picker fits my data? A quantitative evaluation of deep learning based seismic pickers. Journal of Geophysical Research: Solid Earth, 127, 1, e2021JB023499. https://doi.org/10.1029/2021JB023499 *Equal contribution

  10. K. Singh, J. Münchmeyer, L. Weber, U. Leser and A. Bande (2022). Graph Neural Networks for Learning Molecular Excitation Spectra. J. Chem. Theory Comp., 18, 7, 4408-4417. DOI: 10.1021/acs.jctc.2c00255

  11. J.* Woollam, J.* Münchmeyer, F. Tilmann, A. Rietbrock, D. Lange, ..., and H. Soto (2022). SeisBench - A Toolbox for Machine Learning in Seismology. Seismological Research Letters, 93(3), 1695–1709. https://doi.org/10.1785/0220210324 *Equal contribution

  12. J. Münchmeyer, U. Lesera and F. Tilmann (2022). A probabilistic view on rupture predictability: All earthquakes evolve similarly. Geophysical Research Letters, 49, 13, e2022GL098344. https://doi.org/10.1029/2022GL098344

 

Conference presentations

  1. J. Münchmeyer, D. Bindi, C. Sippl, and F. Tilmann. Increasing magnitude scale consistency by combining multiple waveform features through machine learning. (Oral presentation), EGU General Assembly, Vienna, 7-12 April 2019.
  2. J. Münchmeyer, D. Bindi, U. Leser, and F. Tilmann. Convolutional event embeddings for fast probabilistic earthquake assessment.  (Poster presentation), AGU Fall Meeting, San Francisco, USA, 9-13 December 2019.
  3. J. Münchmeyer, D. Bindi, U. Leser, and F. Tilmann. End-to-end PGA estimation for earthquake early warning using transformer networks. (Oral presentation), EGU General Assembly, Online, 4-8 May 2020.
  4. J. Münchmeyer, D. Bindi, U. Leser, and F. Tilmann. The Transformer Earthquake Alerting Model: Improving Earthquake Early Warning with Deep Learning. (Oral presentation), AGU Fall Meeting, Online, 13-17 December 2020.
  5. J. Münchmeyer, D. Bindi, U. Leser, and F. Tilmann. Insights into deep learning for earthquake magnitude and location estimation. (PICO presentation), EGU General Assembly, Online, 19-30 April 2021. https://doi.org/10.5194/egusphere-egu21-4718
  6. J. Münchmeyer, D. Bindi, U. Leser, and F. Tilmann. The Transformer Earthquake Alerting Model: A Data Driven Approach to Early Warning. (Oral presentation), Seismological Society of America (SSA) Annual Meeting, Online, 19-23 April 2021.
  7. J. Münchmeyer, J. Woollam, ..., D. Lange, A. Rietbrock, and F. Tilmann. SeisBench: A framework for machine learning in seismology. (Oral presentation), 37th General Assembly of the European Seismological Commission, Online, 19-24 September 2021.
  8. J. Münchmeyer, U. Leser, and F. Tilmann. A probabilistic view of earthquake rupture predictability. (Oral presentation), AGU Fall Meeting, Online & New Orleans, USA, 13-17 December 2021.
  9. J. Münchmeyer, J. Woollam, F. Tilmann, A. Rietbrock, D. Lange, ..., and H. Soto. Which picker fits my data? A quantitative evaluation of deep learning based seismic pickers. EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022. https://meetingorganizer.copernicus.org/EGU22/EGU22-4071.html
  10. J. Münchmeyer, J. Woollam, F. Tilmann, A. Rietbrock, D. Lange, ..., and H. Soto. (2022). SeisBench: A toolbox for machine learning in seismology. Helmholtz AI Conference, Dresden, Germany, 2-3 June 2022.

Peter Tillmann
HZB - FU Berlin

Contact

Peter Tillmann
Optimizing nanotextured solar cells for realistic weather conditions

Supervisors:

Christiane Becker (HZB)

Klaus Jäger (HZB)

Christof Schütte (FU)

 

Currently, perovskite-silicon (pero-Si) tandem solar cells are the most investigated concept to overcome the theoretical limit for the power conversion efficiency of single-junction silicon solar cells, with is 29.4%. Optical simulations are extremely valuable to study the distribution of light within the solar cells, and allow to minimize losses from reflection and parasitic absorption. For monolithic perovskite-silicon solar cells, it is vital that the available light is equally distributed between the two subcells, which is known as current matching. Nanotextures have proven to strongly reduce reflective losses. In this project we investigate, how realistic weather conditions affect the performance of pero-Si modules. We study, how different light management approaches, such as pyramidal texturing or (sinusoidal) nanotexturing influence the sensitivity of the solar module to the illumination condition. In contrast to single-junction silicon solar cells, (two-terminal) tandem solar cells are more sensitive to the spectral distribution of the incident light.

 

Doctoral Thesis

P. Tillmann (2023). Optimizing Bifacial Tandem Solar Cells for Realistic Operation Conditions Freie Universität Berlin. http://dx.doi.org/10.17169/refubium-39571

 

Full-length publications

  1. P. Tillmann, K. Jäger, and C. Becker (2020). Minimising the levelised cost of electricity for bifacial solar panel arrays using Bayesian optimization. Sustainable Energy Fuels, 4, 254-264. https://doi.org/10.1039/C9SE00750D
  2. K. Jäger, P. Tillmann, E.A. Katz, and C. Becker (2020). Perovskite/silicon tandem solar cells: Effect of luminescent coupling and bifaciality. Sol. RRL. https://doi.org/10.1002/solr.202000628
  3. K. Jäger, P. Tillmann, and C. Becker (2020). Detailed illumination model for bifacial solar cells. Opt. Express, 28, 4, 4751-4762. https://doi.org/10.1364/OE.383570
  4. P. Tillmann, B. Bläsi, S. Burger, M. Hammerschmidt, O. Höhn, C. Becker, and K. Jäger (2021).  Optimizing metal grating back reflectors for III-V-on-silicon multijunction solar cells. Opt. Express, 29, 22517. https//doi.org/10.1364/OE.426761
  5. P. Tillmann, K. Jäger, A. Karsenti, L. Kreinin, and C. Becker (2022). Model-Chain Validation for Estimating the Energy Yield of Bifacial Perovskite/Silicon Tandem Solar Cells. Sol. RRL, 202200079. https://doi.org/10.1002/solr.202200079

 

Conference presentations

  1. P. Tillmann, C. Becker, and K. Jäger. Analysing the angular reflection losses of bifacial solar cells.  (Poster presentation), European Photovoltaic Solar Energy Conference and Exhibition (EU PVSEC), Online, 7-11 September 2020.
  2. P. Tillmann,  K. Jäger, E.A. Katz, and C. Becker. Relaxed current-matching constraints in perovskite/silicon tandem solar cell by bifacial operation and luminescent coupling. (Oral presentation), IEEE Photovoltaic Specialists Conference (PVSC), Online, 20-25 June 2021.
  3. P. Tillmann, K. Jäger, A. Karsenti, L. Kreinin, and C. Becker. Validation of Energy Yield Model for Bifacial Solar Cells and Prediction of Perovskite/silicon Tandem Solar Cell Performance. (Poster presentation), TandemPV, Freiburg, Germany, 30 May - 1 June 2022.

Anna Vlot
MDC - Uni Tübingen

Contact

Anna Vlot
Identifying markers of cell identity from single-cell omics data

Supervisors:

Uwe Ohler (MDC)

Setareh Maghsudi (Uni Tübingen)

 

Cells are the building blocks of all multicellular organisms. Generally speaking, the DNA in each cell in a single organism is identical. Yet each different type of cell has its specialized function. These functional differences occur because cells of a particular identity transcribe a distinct set of genes into RNA molecules, many of which the cell then translates into proteins that determine cell structure, function, and identity. We do not yet fully understand the mechanisms that determine which genes and proteins a given cell produces. What we do know, however, is that the packing of DNA into a structure called chromatin plays a role. It is this packing that permits a 2-meter-long strand of DNA to fit into a cell nucleus with a diameter of no more than roughly 6 micrometres. If a gene lies in a region of the DNA that is tightly packed, the gene is not accessible for binding by the molecules that govern its transcription into RNA molecules. Thus, genes in inaccessible chromatin regions are not transcribed into RNA. However, protein-encoding regions make up just 2% of the human genome, and the accessibility of genomic regions alone does not explain cell-to-cell differences. Namely, non-protein-coding regions of the DNA, e.g. cis-regulatory regions, regulate gene expression. These regions, too, cannot exert their function if they are not accessible. Ultimately, the abundance of particular RNAs and the accessibility of chromatin together provide a starting point for unravelling the processes underlying cell identity acquisition and cell function.

Recently, researchers have begun measuring RNA abundance, chormatin accessibility, and more, in individual cells using so called single-cell omics assys. Analysis of the data obtained from these single-cell omics assays may provide novel insights into how cells aquire their identity. However, analysis of this data is complicated by its high-dimensional, sparse, and noisy nature. High dimensionality refers to the fact that tens of thousands of genes or hundreds of thousands of DNA region are measured in thousands to millions of cells. Sparsity occurs because most genes are not expressed in any given cell, and most regions of chromatin are not accessible. Besides, due to technical limitations, not all genes that are expressed or chromatin regions that are accessible in a given cell are captured. The combination of inherent sparsity and futher technical limitations results in noisy data with a poor signal-to-noise ratio. Taken together, these data characteristics complicate the identifcation of biologically meaningful patterns from the data, especially for genes that expressed at very low levels, or in only a few cells. This is of particular concern when considering cells at different stages of development since differences between cells may be restricted to the expression of only a few genes or subtle changes in chromatin accessibility.

In this project, we aim to develop methods to identify RNA molecules and cis-regulatory regions that characterize cell types and regulate the acquisition of cell identity. For this, we will adapt existing analytical approaches for the analysis of data representing continuous differentiation processes, without discretizing cells indetities into distinct cell states. This criterion is essential if we hope to identify genes and cis-regulatory regions that govern the development of cells in health and disease, where disease occurs due to abberent cell functions induced by disregulation of gene expression.

 

Doctoral thesis

Vlot AHC. (2023). Identifying markers of cell identity from single cell omics data. Humboldt Universität Berlin. doi:10.18452/27236

 

Full-length publications

  1. P. Rautenstrauch, A.H.C. Vlot, S. Saran, and U. Ohler (2021). Intricacies of single-cell multi-omics data integration. Trends in Genetics. https://doi.org/10.1016/j.tig.2021.08.012
  2. R. Shahan, C.W. Hsu, T.M. Nolan, B.J. Cole, I.W. Taylor, A.H.C. Vlot, P.N. Benfey, and U. Ohler  (2022).  A single cell Arabidopsis root atlas reveals developmental trajectories in wild type and cell identity mutants. Developmental Cell 57(4), 543-560.e9. https://doi.org/10.1016/j.devcel.2022.01.008
  3. A.H.C. Vlot, S. Maghsudi, and U. Ohler (2022). Cluster-independent marker feature identification from single-cell omics data using SEMITONES. Nucleic Acids Research, gkac639. https://doi.org/10.1093/nar/gkac639

 

Conference presentations

  1. R. Shahan, C.W. Hsu, T.M. Nolan, B.J. Cole, I.W. Taylor, A.H.C. Vlot, P.N. Benfey, and U. Ohler  (2020).  A single cell Arabidopsis root atlas reveals developmental trajectories in wild type and cell identity mutants. bioRxiv 2020.06.29.178863.  https://doi.org/10.1101/2020.06.29.178863
  2. A.H.C. Vlot, S. Maghsudi, and U. Ohler. Identification of marker genes and cis-regulatory regions using Single-cEll Marker IdentificaTiON by Enrichment Scoring (SEMITONES). (Poster presentation), 13th annual RECOMB/ISCB Conference on Regulatory & Systems Genomics with DREAM Challenges, Online, 16-19 November 2020.

  3. A.H.C. Vlot, S. Maghsudi, and U. Ohler. Single-cEll Marker IdentificaTiON by Enrichment Scoring. (Poster and oral presentation), ISMB/ECCB 2021, Online, 25-30 July 2021.

  4. A.H.C. Vlot, S. Maghsudi, and U. Ohler. Identification of cis-regulatory regions using Single-cEll Marker IdentificaTiON by Enrichment Scoring (SEMITONES). (Poster presentation), EMBO Workshop Enhanceropathies: Understanding enhancer function to understand human disease, 6-9 October 2021.

Leon Weber
HU Berlin - MDC

Contact

Leon Weber
Corpus-wide inference of gene relationships using semantic word representations

Supervisors:

Ulf Leser (HU)

Jana Wolf (MDC)

 

Current attempts to decipher the molecular basis of cellular processes and human diseases are based on quantitative or qualitative models of the complex interplay between molecules in the cell, for instance in gene regulation, cellular signaling, or the metabolism. Obtaining such models in sufficient quality and breadth is a laborious task which today is predominantly based on human experts manually searching and reading the scientific literature with the aim to collect the many dispersed pieces of knowledge necessary to derive at a comprehensive picture. This work can be supported by using Text Mining, however, current research in this area focuses on extracting information from isolated sentences, which often produces unsatisfactory results as important contextual information is ignored (such as the experimental evidence of a reported fact, the precise species in which a finding was experimentally observed, the strength of the observed effects, possible previous treatments (with certain drugs) of the experimental system etc.). In this PhD project, we follow a radically different approach. We use the entire corpus of available scientific publications (roughly 30 Million abstracts, 1.5 Million full texts, possibly patents) as the source of inference for single relationships. To this end, a machine learning setup will be designed, where models of valid relationships are learned from all mentions of their constituents trained on a set of proven relationships. We use that approach to significantly expand the molecular network of several clinically relevant molecular pathways of which the PIs have comprehensive background knowledge, such as NF-kB signaling pathway, a pathway that is critically involved in cell faith decisions and perturbed in a number of diseases including cancer and inflammatory diseases, and the p53 pathway, which is strongly perturbed in cancer. The central aim of the PhD project is the extension of the currently available restricted pathway models, however, additional directions of expansion will also be investigated, such as development of cell-type -specific models, or elucidation of cross-talk to other pathways. We also envision using the new method to study connections between signaling pathways and existing targeted cancer therapies, for which patent texts would be extremely useful. Results from such text mining algorithms will be rigorously assessed in terms of their quality and relevance for biomedical research by (a) qualitatively checking the results at the literature level, and (b) quantitatively evaluating the performance of the expanded or improved pathways in typical analysis settings using OMICS data, such as pathways enrichment analysis and predictive power for selected phenotypes. The approach would allow a new way of predicting treatments that ideally can be adapted and specified for subgroups harboring individual combinations of perturbations in the disease-relevant pathways.

 

Doctoral thesis

L. Weber (2023). Text Mining for Pathway Curation. Humboldt Universität Berlin. doi: 10.18452/27520

 

Full-length publications

  1. L. Weber, J. Münchmeyer, T. Rocktäschel, M. Habibi, and U. Leser (2019). HUNER: Improving biomedical NER with pretraining. Bioinformatics, 36(1), 295-302. 10.1093/bioinformatics/btz528
  2. L. Weber, P. Minervini, J. Münchmeyer, U. Leser, and T. Rocktäschel (2019). NLProlog: Reasoning with weak unification for question answering in Natural Language. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,  6151-6161. 10.18653/v1/P19-1618
  3. L. Weber, K. Thobe, O.A.M. Lozano, J. Wolf, and U. Leser (2020). PEDL: Extracting protein-protein associations using deep language models and distant supervision.  Bioinformatics, 36(1), i490–i498. https://doi.org/10.1093/bioinformatics/btaa430
  4. W.D. Xing, L. Weber, and U. Leser (2020). Biomedical event extraction as multi-turn question answering. In Proceedings of the 11th Int. Workshop on Health Text Mining and Information Analysis, 88-96. 10.18653/v1/2020.louhi-1.10
  5. L. Weber, M. Sänger, J. Münchmeyer, M. Habibi, U. Leser, and A. Akbik (2021). HunFlair: An easy-to-use tool for state-of-the-art biomedical named entity recognition. Bioinformatics, btab042. https://doi.org/10.1093/bioinformatics/btab042
  6. L. Weber, M. Sänger, S. Garda, F. Barth, C. Alt, and U. Leser (2021). Humboldt @ DrugProt: Chemical-protein relation extraction with pretrained transformers and entity descriptions. In Proceedings of the 7th BioCreative Challenge Evaluation Workshop.
  7. L. Weber, S. Garda, J. Münchmeyer, and U. Leser (2021). Extend, don’t rebuild: Phrasing conditional graph modification as autoregressive sequence labelling. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 1213–1224.
  8. K. Singh, J. Münchmeyer, L. Weber, U. Leser, and A. Bande (2022)Graph Neural Networks for Learning Molecular Excitation Spectra.J. Chem. Theory Comp., 18, 7, 4408-4417. DOI: 10.1021/acs.jctc.2c00255
  9. J.A. Fries, N. Seelam, G. Altay, L. Weber, M. Kang, D. Datta, R. Su, S. Garda, B. Wang, S. Ott, M. Samwald, and W. Kusa (2022). Dataset Debt in Biomedical Language Modeling. In Proceedings of the Workshop on Challenges & Perspectives in Creating Large Language Models137-145. https://doi.org/10.18653/v1/2022.bigscience-1.10
  10. X. Wang, U. Leser, and L. Weber (2022). BEEDS: Large-Scale Biomedical Event Extraction using Distant Supervision and Question Answering. In Proceedings of BioNLP, 298-309. 10.18653/v1/2022.bionlp-1.28
  11. L. Weber, M. Sänger, S. Garda, F. Barth, C. Alt and U. Leser (2022). Chemical-Protein Relation Extraction with Ensembles of Carefully Tuned Pretrained Language Models. Database, 2022, baac098. https://doi.org/10.1093/database/baac098
  12. J. Fries, L. Weber, N. Seelam, G. Altay, D. Datta, S. Garda, .. , M.Sänger, … , B. Beilharz (2022). Bigbio: a framework for data-centric biomedical natural language processing. Advances in Neural Information Processing Systems, 35, 25792-25806.
  13. H. Laurençon, L. Saulnier, T. Wang, C. Akik, A. V. del Moral, T. Le Scao, ... L. Weber, ... et al. (2022). The BigScience Corpus A 1.6 TB Composite Multilingual Dataset. https://openreview.net/forum?id=UoEw6KigkUn [Preprint]
  14. L. Weber, F. Barth, L. Lorenz, F. Konrath, K. Huska, J. Wolf, and U. Leser (2023). PEDL+: Protein-centered relation extraction from PubMed at your fingertip. Bioinformatics, 39, 11. doi:10.1093/bioinformatics/btad603

 

Conference presentations

  1. L. Weber, P. Minervini, J. Münchmeyer, U. Leser, and T. Rocktäschel. NLProlog: Reasoning with weak unification for question answering in Natural Language. (Poster presentation) 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July - 2 August, 2019.
  2. M. Saenger, L. Weber, and U. Leser. WBI at MEDIQA 2021: Summarizing Consumer Health Questions with Generative TransformersBioNLP Workshop - MEDIQA, 11 June 2021. https://www.aclweb.org/anthology/2021.bionlp-1.9.pdf