News & Events
Events

Advances in Deep Learning: Transformers

30.06.2024

Advances in Deep Learning: Transformers

Prof. Yossi Keshet

Sep 22-23, 2024

Please leave your information here, and we will get back to you shortly when registration opens.

The Transformer architecture has emerged as a groundbreaking deep learning model, revolutionizing various domains with its powerful representation of sequential data. This course delves into the fascinating world of Transformers, exploring their fundamental concepts, theoretical underpinnings, and practical applications.

We begin by thoroughly examining the Transformer model itself, unpacking its innovative self-attention mechanism and its ability to capture long-range dependencies efficiently. This theoretical foundation will provide a solid understanding of the architectural principles that have propelled Transformers to the forefront of modern deep learning.

Building upon this knowledge, we will explore the transformative impact of Transformers across multiple fields. In natural language processing (NLP), we will study how Transformers have enabled the development of large language models like BERT and ChatGPT, pushing the boundaries of text generation, understanding, and analysis.

Furthermore, we will delve into the realm of automatic speech recognition (ASR), where Transformer-based models such as wav2vec, HuBERT, and Whisper have achieved state-of-the-art performance, revolutionizing the way we interact with spoken language.

Extending our exploration to computer vision, we will examine how Transformers have been adapted to handle visual data, leading to groundbreaking models that have reshaped image and video understanding tasks.

Throughout the course, we will critically analyze the latest research developments, discussing theoretical advancements, practical applications, and the potential future directions of Transformer architectures. By the end of this comprehensive journey, you will possess a deep understanding of Transformers, equipping you with the knowledge and skills to harness their power in your own research or industry projects.

Introduction
The roots of the Transformer architecture and how it was developed: RNN, LSTM, sequence-to-sequence, attention mechanism, LAS
Transformers structure and the self-attention mechanism
Components of Transformers and their usage
What is the expressive power of Transformers?
Reinforcement learning from human feedback (RLHF) and Direct Preference Optimization (DPO)
Efficient Transformers: Flash Networks; RWKV
In-context learning
Prompts and the limitation of the alignment problem
Transformers implementation in NLP
Transformers Implementation in Speech
Transformers Implementation in Vision

back to index