A customer experience AI startup, Sierra, has developed a new benchmark that helps in evaluating the performance of AI chatbot agents. The benchmark is named TAU-bench and is evaluated by having conversations with LLM-stimulated users while doing complex tasks. The results show that AI agents which are made with simple LLMs are not able to ...