Lecture Series:

Learning to Perceive and to Act – Disentangling Tales from (Structured) Latent Space

Wednesday, 18.01.2023 · 16:00

Speaker: Ingmar Posner, Oxford University

Unsupervised learning is experiencing a renaissance. Driven by an abundance of unlabelled data and the advent of deep generative models, machines are now able to synthesise complex images, videos and sounds. In robotics, one of the most promising features of these models - the ability to learn structured latent spaces - is gradually gaining traction. The ability of a deep generative model to disentangle semantic information into individual latent-space dimensions seems naturally suited to state-space estimation. Combining this information with generative world-models, models which are able to predict the likely sequence of future states given an initial observation, is widely recognised to be a promising research direction with applications in perception, planning and control. Yet, to date, designing generative models capable of decomposing and synthesising scenes based on higher-level concepts such as objects remains elusive in all but simple cases. In this talk I will motivate and describe our recent work using deep generative models for unsupervised object-centric scene inference and generation. Furthermore, I will make the case that exploiting correlations encoded in latent space, and learnt through experience, lead to a powerful and intuitive way to disentangle and manipulate task-relevant factors of variation. I will show that this not only casts a novel light on affordance learning, but also that the same framework is capable of generating plans executable on complex real-world robot platforms.