In Episode 2 we’ll look at:
– What a word embedding is.
– How BERT’s WordPiece model tokenizes text.
– The contents of BERT’s vocabulary.
The Colab Notebook from the second half of the video (`Inspect BERT Vocabulary.ipynb`) can be found here: https://colab.research.google.com/drive/1fCKIBJ6fgWQ-f6UKs7wDTpNTL9N-Cq9X
I’ve also uploaded the `vocabulary.txt` file here if you want to peruse it without running the above notebook: https://drive.google.com/open?id=12jxEvIxAmLXsskVzVhsC49sLAgZi-h8Q
==== Updates ====
Sign up to hear about new content across my blog and channel: https://www.chrismccormick.ai/subscribe
Add comment