Figure from Google's Blog The adoption of attention mechanisms combined with this constraint led to the creation of the now State of the Art Transformer Models that we know and use today such as BERT ...
Implementation of simples transformers, including BERT for text classification and simple GPT using AliBi as positional encoding.