OpenComputer: Verifiable Software Worlds for Computer-Use Agents Paper • 2605.19769 • Published 8 days ago • 57
AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation Paper • 2605.12925 • Published 14 days ago • 3
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published 24 days ago • 163