1. Defining evaluation metrics (performance metrics like faithfulness/relevancy or system metrics like latency/cost)
2. Creating an evaluation dataset
3. Defining a baseline
4. Trying out different approaches
We’re excited to feature Wenqi Glantz, an open-source evangelist who has a series of wonderful blogs on this topic:
https://levelup.gitconnected.com/evaluation-driven-development-the-swiss-army-knife-for-rag-pipelines-dba24218d47e
https://levelup.gitconnected.com/exploring-zephyr-7b-alpha-through-the-lens-of-evaluation-driven-development-faf69e9d9ec7
Add comment