MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models Paper • 2401.16745 • Published Jan 30, 2024 • 1
FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models Paper • 2310.20410 • Published Oct 31, 2023
M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models Paper • 2310.19240 • Published Oct 30, 2023
ToolACE: Winning the Points of LLM Function Calling Paper • 2409.00920 • Published Sep 2, 2024 • 2
ACEBench: Who Wins the Match Point in Tool Usage? Paper • 2501.12851 • Published Jan 22, 2025 • 4
ToolACE-MT: Non-Autoregressive Generation for Agentic Multi-Turn Interaction Paper • 2508.12685 • Published Aug 18, 2025 • 1
ToolACE-MCP: Generalizing History-Aware Routing from MCP Tools to the Agent Web Paper • 2601.08276 • Published Jan 13 • 7
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL Paper • 2605.18703 • Published 8 days ago • 48
Cooperative Retriever and Ranker in Deep Recommenders Paper • 2206.14649 • Published Jun 28, 2022