🛠 Kitchen Rush — A Benchmark for Evaluating LLMs in Real-Time
Kitchen Rush has been introduced to evaluate the ability of LLMs to call tools under time constraints. Unlike static tests, latency directly impacts task success here.
🌍 This allows for the evaluation of models for real-time systems (assistants, agents) where speed is critical.
👤 It helps in selecting models suitable for live interaction.
Source 1: https://github.com/bassimeledath/kitchen-rush
