2025年3月4日 – arXiv最新論文の紹介

Chain of Draft, Tree-of-Debate

Chain of Draft: Thinking Faster by Writing Less [37.5]
Chain of Draft (CoD)は、人間の認知プロセスにインスパイアされた新しいパラダイムである。 CoD はChain-of-Thought (CoT) と精度で一致し、トークンの7.6%しか使用していない。
論文参考訳（メタデータ） (Tue, 25 Feb 2025 19:36:06 GMT)
Chain of 〇〇、Chain of Draftの提案。推論ステップのワード（トークン）数を限定するアプローチ
「In CoD, we also asked the model to think step by step. However, the model is asked to limit each reasoning step to five words at most.」とこれでうまくいくもは面白い。

Tree-of-Debate: Multi-Persona Debate Trees Elicit Critical Thinking for Scientific Comparative Analysis [27.7]
本稿では,科学論文をそれぞれの新奇性を議論するペルソナに変換するフレームワークであるTree-of-Debate(ToD)を紹介する。 ToDは議論ツリーを動的に構築し、学術論文の中で独立した新規性議論のきめ細かい分析を可能にする。
論文参考訳（メタデータ） (Thu, 20 Feb 2025 17:43:40 GMT)
こちらはTree of 〇〇。「TREE-OF-DEBATE, a structured approach that models papers as personas engaging in a debate to extract their key similarities and differences.」

Self-rewarding correction for mathematical reasoning [19.5]
我々は,大規模言語モデル(LLM)の自己回帰的推論について研究する。 LLMは、ステップバイステップの推論を同時に生成し、外部からのフィードバックを伴わない推論時間における出力の正しさを評価する。本稿では,自己生成データのみを用いて自己回帰推論モデルを構築するための2段階のアルゴリズムフレームワークを提案する。
論文参考訳（メタデータ） (Wed, 26 Feb 2025 23:01:16 GMT)
「self-rewarding reasoning framework for LLMs, which integrates the generator and reward model into a single LLM, enabling autonomous reasoning, evaluation, and correction.」、「self-correction in mathematical reasoning and propose a two-stage framework that relies only on self-generated data.」の提案。
リポジトリはGitHub – RLHFlow/Self-rewarding-reasoning-LLM: Recipes to train the self-rewarding reasoning LLMs.

PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving [89.6]
制約,検証,選択という3つの重要な要素を持つモデルに依存しない,スケーラブルなエージェントフレームワークであるPlanGENを提案する。具体的には、推論時間アルゴリズムの性能を向上させるために、制約誘導反復検証を提案する。
論文参考訳（メタデータ） (Sat, 22 Feb 2025 06:21:56 GMT)
「PlanGEN comprises three specialized LLM agents: a constraint agent, a verification agent, and a selection agent.」というマルチエージェントフレームワーク。「Further, we introduced a Mixture of Algorithms, an iterative framework that integrates the selection agent (Figure 1) to dynamically choose the best algorithm.」とのことだが、MoAのAがAgentのものと紛らわしい。。
Gemini-1.5-Pro, Gemini-2.0-Flash, GPT-4o、それぞれ単一で使うよりも性能が向上しているようでアンサンブル的な効果は出ている。

Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts [65.9]
TimeTravelは、10つの主要な歴史的地域にわたる266の異なる文化にまたがる10,250のエキスパート認定サンプルのベンチマークである。 TimeTravelは、原稿、アートワーク、碑文、考古学的発見のAIによる分析のために設計されている。我々は、TimeTravelで現代のAIモデルを評価し、その強みを強調し、改善すべき領域を特定する。
論文参考訳（メタデータ） (Thu, 20 Feb 2025 18:59:51 GMT)
「By integrating AI with historical research, TimeTravel fosters AI-powered tools for historians, archaeologists, researchers, and cultural tourists to extract valuable insights while ensuring technology contributes meaningfully to historical discovery and cultural heritage preservation.」という変わったベンチマークの提案。日本の土偶や勾玉も含まれている。
プロジェクトサイトはTimeTravel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts

Continuous Diffusion Model for Language Modeling [57.4]
離散データに対する既存の連続拡散モデルは、離散的アプローチと比較して性能が限られている。本稿では,下層の分類分布の幾何学を組み込んだ言語モデリングのための連続拡散モデルを提案する。
論文参考訳（メタデータ） (Mon, 17 Feb 2025 08:54:29 GMT)
ARモデルに匹敵するRiemannian Diffusion Language Model (RDLM),の提案。
リポジトリはhttps://github.com/harryjo97/RDLM
画像ではDiffusion Model → Autoregressive modelという流れもありつつ、言語ではDiffusion Modelを使うInception Labs, Mercury Coderが話題になっているのが面白い。

Energy-Based Diffusion Language Models for Text Generation [126.2]
エネルギーベース拡散言語モデル(Energy-based Diffusion Language Model, EDLM)は、拡散ステップごとに全シーケンスレベルで動作するエネルギーベースモデルである。我々のフレームワークは、既存の拡散モデルよりも1.3$times$のサンプリングスピードアップを提供する。
論文参考訳（メタデータ） (Fri, 28 Feb 2025 08:41:03 GMT)
こちらも「Through experiments on both small and large language modeling benchmarks, EDLM demonstrates state-of-the-art performance among diffusion models and approaches the quality of autoregressive models, while offering significant sampling speedup.」を主張。