「We introduce OdysseyBench, a comprehensive benchmark for evaluating agents on long- horizon workflows across multiple office applications, consisting of OdysseyBench+ and OdysseyBench-Neo. 」、「• We propose HOMERAGENTS, a multi-agent framework that automates the generation of long-horizon tasks, enabling scalable and diverse benchmark creation.」とベンチマーク作成フレームワークを含むベンチマークの提案。