Update: The BERT eBook is out! You can buy it from my site here: https://www.chrismccormick.ai/bert-ebook?utm_source=youtube&utm_medium=vid_desc&utm_campaign=bert_ebook&utm_content=vid3

In Episode 2 we’ll look at:
– What a word embedding is.
– How BERT’s WordPiece model tokenizes text.
– The contents of BERT’s vocabulary.

The Colab Notebook from the second half of the video (`Inspect BERT Vocabulary.ipynb`) can be found here: https://colab.research.google.com/drive/1fCKIBJ6fgWQ-f6UKs7wDTpNTL9N-Cq9X

I’ve also uploaded the `vocabulary.txt` file here if you want to peruse it without running the above notebook: https://drive.google.com/open?id=12jxEvIxAmLXsskVzVhsC49sLAgZi-h8Q

==== Updates ====
Sign up to hear about new content across my blog and channel: https://www.chrismccormick.ai/subscribe

Add comment

Your email address will not be published. Required fields are marked *

Categories

All Topics