The first step when tokenizing texts is called normalization. But what does this mean? This video will tell you all about it.

This video is part of the Hugging Face course: http://huggingface.co/course
Open in colab to run the code samples:
https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/videos/normalization.ipynb

Related videos:
– What is pre-tokenization? https://youtu.be/grlLV8AIXug
– Training a new tokenizer: https://youtu.be/DJimQynXZsQ

Don’t have a Hugging Face account? Join now: http://huggingface.co/join
Have a question? Checkout the forums: https://discuss.huggingface.co/c/course/20
Subscribe to our newsletter: https://huggingface.curated.co/

Add comment

Your email address will not be published. Required fields are marked *

Categories

All Topics