Designing AI Evaluation Frameworks: How to Benchmark, Test, and Monitor LLM Performance in Production Workflows | Wicked Smart Data