Kanishka Singh

Kanishka Singh

Machine Learning Meets Theoretical Chemistry: Data-driven Analysis of Grapheneoxide

In recent years data science more and more finds its way into the field of materials sciences, usually in the form of regression or classification approaches which are trained on known properties of certain materials and applied to predict these properties for less well characterized yet physically and chemically similar other materials. However, a pertinent problem in this field is the search for materials which are optimal with regard to some property, a problem which only partly matches to the above described scenario, since it is impossible to compute properties of the typically exponentially many candidate structures. For such a setting, the methods for calculating the properties of given structures must be complemented by intelligent algorithms which decide which structures are assessed in which order, and use the results of previous assessments to guide this search process through the space of all candidates. In this project, we research such methods and apply them to a particular important problem in the development of Solar Fuels: The search for an optimal material for photocatalysis, namely water oxidation or even water splitting. The searched material class will be sustainable grapheneoxide quantum dots. The desired properties are: (1) a band edge of more than 1.23 eV that is to be centered around the energies for the water-to-oxygen oxidation and the water-to-hydrogen reduction, (2) a good representation of its diverse spectra (X-ray absorption and emission (XAS and XES), UV/vis, and infrared), (3) sizes of the quantum dots, and (4) their oxygen and eventually dopand content. Many of these properties can be computed numerically with standard quantum chemistry software as e.g. the ORCA program, but only at the cost of extremely resource-intensive calculations which currently limits the number of materials that can be characterized. We study methods that allow a goal-driven search through the space of all candidate materials to find those that maximize a cost function derived from the materials properties. To this end, we plan to apply methods for reinforcement learning and heuristic search procedures, possibly combined with Deep Learning network to model correlations between the target properties and other, material-derived features. Particular challenges in this project are the inhomogeneity and incompleteness of data with respect to the target properties. The expected impact is high, because the project will create a complete workflow from high-throughput data production, database generation, and learning of structureproperty relationships that can as well be applied to many other interesting materials and classes of properties in the material sciences.

Journal & Conference Publications

-

Oral & Poster Presentations at Conferences

-