This video is part of the Hugging Face course: http://huggingface.co/course
Open in colab to run the code samples:
https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/videos/building_tokenizer.ipynb
Related videos:
– Training a new tokenizer: https://youtu.be/DJimQynXZsQ
– Byte Pair Encoding Tokenization: https://youtu.be/HEikzVL-lZU
– Unigram Tokenization: https://youtu.be/TGZfZVuF9Yc
– WordPiece Tokenization: https://youtu.be/qpv6ms_t_1A
Don’t have a Hugging Face account? Join now: http://huggingface.co/join
Have a question? Checkout the forums: https://discuss.huggingface.co/c/course/20
Subscribe to our newsletter: https://huggingface.curated.co/
Add comment