WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation

WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation [98.5]
We present WEAVE, the first suite for in-context interleaved cross-modality comprehension and generation。 WeAVE-100kは、370Kのダイアログターンと500Kイメージにまたがる100Kのインターリーブサンプルの大規模なデータセットである。 WeAVEBenchは480の画像に基づいた100のタスクを備えた人手によるベンチマークである。
論文参考訳（メタデータ） (Fri, 14 Nov 2025 16:02:38 GMT)
「WEAVE- 100k is a large-scale dataset of 100K interleaved samples spanning over 370K dialogue turns and 500K images, covering comprehension, editing, and generation tasks that require reasoning over historical context.」とマルチターンな生成に関するベンチマークの提案、評価方法は「we employ a key-point- based scoring approach using structured evaluation criteria.」
（最新版ではないようだが）NanoBananaのスコアがとても高い。
プロジェクトサイトはWeave

コメントを残す

コメントを残す コメントをキャンセル