2025年1月20日 – arXiv最新論文の紹介

MiniMax-01: Scaling Foundation Models with Lightning Attention

MiniMax-01: Scaling Foundation Models with Lightning Attention [59.4]
MiniMax-Text-01とMiniMax-VL-01は、より長いコンテキストを処理するのに優れた機能を提供する。 MiniMax-Text-01は、トレーニング中に最大100万のトークンに到達でき、推論時に400万のトークンを安価な価格で外挿できる。私たちのビジョン言語モデルであるMiniMax-VL-01は、512億のビジョン言語トークンによる継続的なトレーニングによって構築されます。
論文参考訳（メタデータ） (Tue, 14 Jan 2025 18:50:05 GMT)
456B（32エキスパート、アクティブパラメータ 45.9B）のMoE構成の大規模な公開LLM。性能はGPT-4oなど商用モデルに匹敵するうえ、扱えるコンテキスト長が4Mトークンととても長い。「We demonstrate the first successful large-scale implementation of linear attention.」と主張（「After extensive experimentation, we settled on a hybrid architecture mainly using lightning attention (Qin et al , 2024b), an I/O-aware implementation of a linear attention variant (Qin et al , 2022a).」ともある通りハイブリッド構成）。
リポジトリはGitHub – MiniMax-AI/MiniMax-01

Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains [114.8]
大規模言語モデル(LLM)は近年顕著なパフォーマンスを達成しているが、基礎となるトレーニングデータによって根本的に制限されている。本稿では,言語モデルのマルチエージェント社会にファインタニングを適用した自己改善への補完的アプローチを提案する。
論文参考訳（メタデータ） (Fri, 10 Jan 2025 04:35:46 GMT)
「Instead of fine-tuning a single model, our method finetunes a multiagent set of language models from the same base model and then independently specializes each model to capture parts of a task of interest.」という自己改善アプローチの提案。Generation ModelとCritic Modelを同時にチューニングしていき、マルチエージェントなディベートを通して統合という動き。Critic modelの重要性も高そう。
リポジトリはMultiagent Finetuning: Self Improvement with Diverse Reasoning Chains

WebWalker: Benchmarking LLMs in Web Traversal [55.4]
WebWalkerQAは,LLMがWebトラバースを実現する能力を評価するためのベンチマークである。本稿では,WebWalkerを提案する。WebWalkerは,探索的・批判的パラダイムを通じて,人間のようなWebナビゲーションを模倣するマルチエージェントフレームワークである。
論文参考訳（メタデータ） (Mon, 13 Jan 2025 18:58:07 GMT)
「It evaluates the capacity of LLMs to traverse a website’s subpages to extract high-quality data systematically.」というWEBサイトをめぐりながら必要な情報をとれるか否かのベンチマークWebWalkerQAとそれを解くためのマルチエージェントフレームワークWebWalkerの提案。Agenticな動作を行い、かつ、GPT-4oなど先端モデルを使っても解くのが難しいデータセットになっている。（やや意外）
プロジェクトサイトはWebWalker、リポジトリはGitHub – Alibaba-NLP/WebWalker: 🌐 WebWaker: Benchmarking LLMs in Web Traversal、WebWalkerQALeaderboard – a Hugging Face Space by callanwuもある

What Limits LLM-based Human Simulation: LLMs or Our Design? [43.5]
我々は, LLMに基づく人間シミュレーションの進展には, LLM固有の制約とシミュレーションフレームワークの設計課題の両方に対処する必要があると論じている。この分野でのさらなる研究を支援するため、我々はLLMに基づく人体シミュレーションリソースのキュレートされたコレクションを提供する。
論文参考訳（メタデータ） (Wed, 15 Jan 2025 04:59:49 GMT)
「LLM-based human simulation」の課題分析、整理。「Compared to tasks in NLP or CV, LLM-based human simulations present a much greater complexity」はそうだろうと思う。
リポジトリはGitHub – Persdre/llm-human-simulation: Collection of papers related to llm human simulation