BERT Research – Ep. 6 – Inner Workings III – Multi-Headed Attention

February 20, 2024

6 views

1 min read

Cinema Mode

The “Self-Attention” mechanism that we learned about in Episode 5 is actually replicated multiple times in the Transformer architecture–this is referred to as “Multi-Headed Attention”. In this video we’ll dig into this in detail, and see how it impacts the compute the cost of the overall model.

We’ll again be relying heavily on the excellent illustrations in Jay Alammar’s post, “The Illustrated Transformer”: http://jalammar.github.io/illustrated-transformer/.

==== Full Series ====
The Bert Research Series is complete! All 8 Episodes are up:
https://www.youtube.com/playlist?list=PLam9sigHPGwOBuH4_4fr-XvDbe5uneaf6

==== Updates ====
Sign up to hear about new content across my blog and channel: https://www.chrismccormick.ai/subscribe

BERT Research – Ep. 6 – Inner Workings III – Multi-Headed Attention

Add comment

Cancel reply

Categories

All Topics

210,000 CODERS lost jobs as NVIDIA released NEW coding language.

Kurzweil: AI will be smarter than all humans combined by 2029

The AI Revolution: Will Robots Take Your Job?

Artificial Intelligence | 60 Minutes Full Episodes

The A.I. Dilemma – March 9, 2023

In the Age of AI (full documentary) | FRONTLINE

BERT Research – Ep. 6 – Inner Workings III – Multi-Headed Attention

You may also like

Add comment

Categories

All Topics