Clustering in Pure-Attention Hardmax Transformers

Next Thursday May 2, 2024:

Organized by: FAU DCN-AvH, Chair for Dynamics, Control, Machine Learning and Numerics – Alexander von Humboldt Professorship at FAU, Friedrich-Alexander-Universität Erlangen-Nürnberg (Germany)
Title: Clustering in Pure-Attention Hardmax Transformers
Speaker: Albert Alcalde
Affiliation: PhD student at FAU DCN-AvH Chair for Dynamics, Control, Machine Learning and Numerics – Alexander von Humboldt Professorship.

Abstract. We study the behaviour in the infinite-depth limit of a transformer model with hardmax self-attention and normalization sublayers, by viewing it as a discrete-time dynamical system acting on a collection of points. Leveraging a simple geometric interpretation of our transformer connected with ideas of hyperplane separation, we establish convergence to a clustered equilibrium and prove that clusters are completely determined by special points called leaders. We apply our theoretical understanding to design a model based on our transformer to solve the sentiment analysis task in an interpretable way: the transformer filters out meaningless words by clustering them towards the leaders, identified with words carrying the sentiment of the text such as ‘amazing’ or ‘terrible’.

WHEN

Thu. May 2, 2024 at 12:00H

WHERE

On-site: Room 03.323
Friedrich-Alexander-Universität Erlangen-Nürnberg
Cauerstraße 11, 91058 Erlangen
GPS-Koord. Raum: 49.573764N, 11.030028E

_
See all Seminars at FAU DCN-AvH

Don’t miss out our last news and connect with us!

LinkedIn | Twitter | Instagram