Continuous Improvement of RAG Results: Enhancing Performance Metrics

Ben ColbornMember of Knowledge Staff

The previous blogs described some of DevRev’s ‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‍‌‍‍‌‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‌‍‌‍‍‌‍‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‍‌‍‍‌‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‌‍‌‍‍‌‍‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌early experiences with building a RAG‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‍‌‍‍‌‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‍‌‌‍‌‍‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‍‌‍‍‌‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‍‌‌‍‌‍‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌ and ‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‍‌‍‍‌‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‍‌‍‌‍‌‍‌‌‍‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‍‌‍‍‌‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‍‌‍‌‍‌‍‌‌‍‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌how we see what’s going on‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‍‌‍‍‌‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‍‌‌‍‌‍‍‌‌‍‌‌‌‌‍‍‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‍‌‍‍‌‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‍‌‌‍‌‍‍‌‌‍‌‌‌‌‍‍‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌ inside it. The example of an inconsistent response to multiple instances of the same query showed how to narrow down where the failure occurred.‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‍‌‍‍‌‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‍‌‌‍‍‌‌‍‌‌‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‍‌‍‍‌‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‍‌‌‍‍‌‌‍‌‌‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌

So that’s all well and good, but not at all scalable. As we continue to optimize Turing and even look to tune Turing for different use cases, we can’t manually make such an investment for every query. Rather, we would need to specify a set of queries with success criteria that can be run against various configurations. Then we would need to be able to automatically send all those queries to a particular configuration of Turing and check if what’s returned is correct.‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‍‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‍‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌

Parameters‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‍‌‌‍‍‍‌‍‌‌‌‌‍‌‌‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‌‌‍‌‍‍‌‍‌‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‍‌‌‍‍‍‌‍‌‌‌‌‍‌‌‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‌‌‍‌‍‍‌‍‌‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌

At every step of the RAG pipeline, there are parameters that can be modified to yield different results. Some examples are below.‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‍‌‌‌‍‌‌‍‌‌‍‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‍‌‌‍‌‍‌‌‌‍‍‌‌‌‍‌‍‌‍‌‍‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‍‌‌‌‍‌‌‍‌‌‍‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‍‌‌‍‌‍‌‌‌‍‍‌‌‌‍‌‍‌‍‌‍‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌

Stage	Parameters
Indexing	Contents of KB; size of chunks; embedding algorithm
Rephrasing	LLM (ChatGPT, Gemini, Claude, etc.); prompt
Search	Search providers; syntactic + semantic search hybrid score calculation
Validation	Search result ranking; score sensitivity and thresholds
Answer creation	LLM; prompt; number of chunks to use

In addition, we would want to experiment with adding new stages.‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‍‌‍‍‍‌‌‍‌‌‌‍‌‌‍‌‌‍‌‍‌‍‌‍‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‍‌‍‌‌‌‌‍‌‍‌‍‌‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‍‌‍‍‍‌‌‍‌‌‌‍‌‌‍‌‌‍‌‍‌‍‌‍‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‍‌‍‌‌‌‌‍‌‍‌‍‌‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌

That’s a lot of moving parts. It’s necessary to have a set of tests that can be run while modifying one parameter at a time so that we can see the effects of changes.‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‌‌‌‌‍‍‍‌‍‍‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‌‍‍‍‌‍‌‌‌‍‌‌‌‍‌‍‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‌‌‌‌‍‍‍‌‍‍‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‌‍‍‍‌‍‌‌‌‍‌‌‌‍‌‍‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌

Test specification‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‌‌‍‌‍‍‌‍‌‌‍‍‌‍‌‍‍‍‍‌‍‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‌‌‍‌‍‍‌‍‌‌‍‍‌‍‌‍‍‍‍‌‍‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌

The specification and execution of tests is shown in the following diagram.‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‍‍‌‌‍‌‌‍‌‍‌‍‍‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‍‌‌‍‍‌‌‌‍‌‍‌‍‌‌‍‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‍‍‌‌‍‌‌‍‌‍‌‍‍‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‍‌‌‍‍‌‌‌‍‌‍‌‍‌‌‍‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌

Here is an example test.‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‌‌‍‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‍‌‍‌‌‍‌‍‍‌‍‌‍‌‍‌‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‌‌‍‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‍‌‍‌‌‍‌‍‍‌‍‌‍‌‍‌‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌

Query	What are articles?
Ground truth answer	Articles are pieces of information about your product or organization that are stored in a knowledge base. They are used to answer customer queries and provide self-service support. Articles are associated with a specific part (product or service) and can be created by internal users. Customers can search across articles to find answers to their questions, reducing the need to wait for support team assistance.
Ground truth sources	ARTICLE-908

Query

What are articles?

Ground truth answer

Articles are pieces of information about your product or organization that are stored in a knowledge base. They are used to answer customer queries and provide self-service support. Articles are associated with a specific part (product or service) and can be created by internal users. Customers can search across articles to find answers to their questions, reducing the need to wait for support team assistance.

Ground truth sources

ARTICLE-908

The ground truth answer may either be written by a person or be generated by an LLM from the ground truth sources and verified by a person.‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‍‌‍‌‍‌‍‍‌‍‍‌‍‌‍‍‌‌‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‍‌‍‌‌‍‌‌‍‍‌‍‌‌‍‌‍‌‍‌‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‍‌‍‌‍‌‍‍‌‍‍‌‍‌‍‍‌‌‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‍‌‍‌‌‍‌‌‍‍‌‍‌‌‍‌‍‌‍‌‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌

Test result evaluation‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‍‌‍‌‍‌‌‍‍‍‌‍‍‌‍‌‌‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‍‌‍‍‌‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‍‌‍‌‍‌‌‍‍‍‌‍‍‌‍‌‌‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‍‌‍‍‌‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌

The framework measures various metrics to assess the AI’s performance. These include the answering rate (whether the system provided any answer), correctness (whether the answer was factually correct), and similarity (semantic similarity between the provided answer and the ground truth). The system records detailed data about each step of the query processing pipeline using an observability platform for LLMs, which helps in debugging and further refining the system by pinpointing where improvements are needed.‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‍‌‌‌‌‌‍‌‌‍‌‌‍‌‌‍‌‌‍‌‍‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‍‌‍‌‌‌‍‌‍‌‍‌‌‌‌‍‌‌‍‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‍‌‌‌‌‌‍‌‌‍‌‌‍‌‌‍‌‌‍‌‍‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‍‌‍‌‌‌‍‌‍‌‍‌‌‌‌‍‌‌‍‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌

After running the tests, the framework outputs detailed results including the questions asked, the answers received, the expected answers, and detailed logs from the observability platform. This comprehensive documentation aids in analyzing the system’s performance and making informed decisions about future enhancements.‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‍‌‍‌‍‌‍‌‍‌‍‌‍‌‍‌‌‍‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‌‌‍‌‍‌‍‍‍‌‍‌‌‍‌‌‌‍‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‍‌‍‌‍‌‍‌‍‌‍‌‍‌‍‌‌‍‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‌‌‍‌‍‌‍‍‍‌‍‌‌‍‌‌‌‍‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌

A run of the example test above could return the following.‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‍‌‍‌‍‌‍‌‍‌‌‌‌‍‌‌‍‌‍‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‍‌‍‌‍‌‍‌‍‌‌‌‌‍‌‌‍‌‍‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌

Query	What are articles?
Received answer	Articles are pieces of information about a product or organization that are stored in a knowledge base. They are used to answer customer queries through platforms like Turing bot, customer portal, or PLuG. Articles are associated with a specific part (product or service) and can be created by internal users. Customers can search through these articles to find answers to their questions.
Received sources	ARTICLE-293

The received answer is compared to the ground truth answer and the received sources to the ground truth sources. While the answers have a high similarity score, the sources differ, which may indicate a need to examine the contents of the KB or the search parameters.‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‌‍‌‌‍‌‍‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‍‌‍‌‌‌‌‍‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‌‍‌‌‍‌‍‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‍‌‍‌‌‌‌‍‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌

Testing at scale‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‌‍‌‌‌‍‍‌‌‍‌‌‌‍‌‌‍‌‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‍‌‌‍‌‍‌‍‌‍‌‍‌‌‍‌‌‍‌‍‍‍‍‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‌‍‌‌‌‍‍‌‌‍‌‌‌‍‌‌‍‌‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‍‌‌‍‌‍‌‍‌‍‌‍‌‌‍‌‌‍‌‍‍‍‍‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌

Using the test database and test environments, DevRev can continuously improve the performance of Turing. At present we have several hundred tests that can be run with various configurations of the RAG pipeline. These test runs generate the same analytics as user queries.‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‌‍‌‌‍‌‌‌‌‌‌‌‌‌‍‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‍‌‌‍‌‌‍‍‌‍‌‌‌‌‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‍‌‍‌‌‍‌‌‌‌‌‌‌‌‌‍‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‍‌‌‍‌‌‍‍‌‍‌‌‌‌‌‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌

One of the challenges is the manual work to revalidate the test specifications when the contents of the KB changes. This means that we can’t always use the latest KB for testing. To fill this gap, we have a smaller set of queries that are are more customer-critical. The support team manages and runs this set of tests, which that can be updated more frequently and can use the latest KB.‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‍‍‌‍‍‌‍‍‌‍‌‍‌‍‌‌‍‌‌‍‌‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‌‌‌‍‌‍‌‍‌‌‍‌‌‌‌‍‍‍‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‍‍‌‍‍‌‍‍‌‍‌‍‌‍‌‌‍‌‌‍‌‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‌‌‌‍‌‍‌‍‌‌‍‌‌‌‌‍‍‍‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌

The Turing team continuously optimizes the RAG pipeline based on customer feedback. By continuously running the tests, they ensure that their changes are always improvements, never regressions.‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‌‍‍‌‌‌‌‌‌‍‍‍‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‌‍‍‌‍‌‍‌‍‌‌‍‌‍‌‍‌‌‍‌‍‍‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‍‌‍‌‍‍‌‍‌‌‌‌‌‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‌‍‍‌‌‌‌‌‌‍‌‌‌‌‍‌‌‍‌‌‍‍‌‌‍‌‌‍‌‍‌‌‍‌‌‍‌‍‍‌‍‌‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‌‍‌‌‍‍‌‌‌‍‌‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‍‍‌‌‌‌‌‌‍‌‌‍‍‌‌‍‌‌‌‍‌‌‍‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‍‌‍‌‌‌‌‍‍‍‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‍‍‌‍‌‌‌‍‌‍‌‍‍‌‌‍‌‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‍‌‌‌‍‌‍‍‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‍‍‍‌‍‌‍‍‌‍‌‍‌‍‍‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‌‌‌‍‌‍‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‍‌‌‍‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‍‍‌‍‌‍‌‍‌‍‍‌‌‍‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‌‍‍‌‍‌‍‌‍‌‌‍‌‍‌‍‌‌‍‌‍‍‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‍‌‍‌‍‍‌‍‌‌‌‌‌‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‌‌‍‌‍‌‌‌‌‌‌‌‌‍‍‌‍‌‍‍‌‌‌‍‍‌‍‌‌‌‍‌‍‍‌‌

Continuous improvement of RAG results