2026年3月24日 – arXiv最新論文の紹介

CUBE: A Standard for Unifying Agent Benchmarks [139.0]
MCPとGymをベースとしたユニバーサルプロトコル標準CUBE(Common Unified Benchmark Environments)を提案する。 CUBEは、任意の準拠プラットフォームがカスタム統合なしで、評価、RLトレーニング、データ生成のための準拠ベンチマークにアクセスできるようにする。
論文参考訳（メタデータ） (Mon, 16 Mar 2026 18:31:37 GMT)
「We propose CUBE (Common Unified Benchmark Envi- ronments), a protocol standard designed to unify the ML Community by establishing a universal interface between benchmarks and evaluation frameworks.1 The core insight is simple: if we define a consistent API contract, any CUBE- compliant benchmark becomes immediately usable by any CUBE-compliant platform.」と、ベンチマーク評価基盤を統合していこうという取り組み。「The importance of multi-benchmarking cannot be overstated. There are currently over 300 agentic benchmarks available, many of which are highly innovative but remain largely unknown because they are too difficult to set up.」はその通りで重要な取り組み（だが簡単ではない・・・）
リポジトリはGitHub – The-AI-Alliance/cube-standard: Standardize benchmark wrapping so the community can wrap various otherwise-incompatible benchmarks uniformly and use them everywhere. · GitHub

Omnilingual MT: Machine Translation for 1,600 Languages [58.7]
我々は,1600以上の言語をサポートする最初の機械翻訳システムであるOmnilingual Machine Translation (OMT)を提案する。このスケールは、大規模な公開多言語コーパスと新たに作成されたデータセットを統合する包括的なデータ戦略によって実現されている。 OMTモデルは言語間移動を改善し、1,600の評価において、MTのパズルの「理解」部分を解くのに近づいている。
論文参考訳（メタデータ） (Wed, 18 Mar 2026 16:25:51 GMT)
NLLBを超える1600言語に対応した翻訳モデル。
「BOUQuET dataset (a newly created, largest-to-date multilingual evaluation collection built from scratch and manually extended across a wide range of linguistic families)」によるリーダーボードが公開されている。Bouquet – a Hugging Face Space by facebook