Transformer – Part 8 – Decoder (3): Encoder-decoder self-attention

November 18, 2020

2 views

1 min read

Cinema Mode

This is the third video about the transformer decoder and the final video introducing the transformer architecture. Here we mainly learn about the encoder-decoder multi-head self-attention layer, used to incorporate information from the encoder into the decoder. It should be noted that this layer is also commonly known as the cross-attention layer.

The video is part of a series of videos on the transformer architecture, https://arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here:
https://www.youtube.com/playlist?list=PLDw5cZwIToCvXLVY2bSqt7F2gu8y-Rqje

Slides are available here:
https://chalmersuniversity.box.com/s/c2a64rz0hlp44pdouq9mc24msbz60xf2

Transformer – Part 8 – Decoder (3): Encoder-decoder self-attention

Add comment

Cancel reply

Categories

All Topics

210,000 CODERS lost jobs as NVIDIA released NEW coding language.

Kurzweil: AI will be smarter than all humans combined by 2029

The AI Revolution: Will Robots Take Your Job?

Artificial Intelligence | 60 Minutes Full Episodes

The A.I. Dilemma – March 9, 2023

In the Age of AI (full documentary) | FRONTLINE

Transformer – Part 8 – Decoder (3): Encoder-decoder self-attention

You may also like

Add comment

Categories

All Topics