LLM – ページ 11 – arXiv最新論文の紹介

MiniMax-01: Scaling Foundation Models with Lightning Attention

MiniMax-01: Scaling Foundation Models with Lightning Attention [59.4]
MiniMax-Text-01とMiniMax-VL-01は、より長いコンテキストを処理するのに優れた機能を提供する。 MiniMax-Text-01は、トレーニング中に最大100万のトークンに到達でき、推論時に400万のトークンを安価な価格で外挿できる。私たちのビジョン言語モデルであるMiniMax-VL-01は、512億のビジョン言語トークンによる継続的なトレーニングによって構築されます。
論文参考訳（メタデータ） (Tue, 14 Jan 2025 18:50:05 GMT)
456B（32エキスパート、アクティブパラメータ 45.9B）のMoE構成の大規模な公開LLM。性能はGPT-4oなど商用モデルに匹敵するうえ、扱えるコンテキスト長が4Mトークンととても長い。「We demonstrate the first successful large-scale implementation of linear attention.」と主張（「After extensive experimentation, we settled on a hybrid architecture mainly using lightning attention (Qin et al , 2024b), an I/O-aware implementation of a linear attention variant (Qin et al , 2022a).」ともある通りハイブリッド構成）。
リポジトリはGitHub – MiniMax-AI/MiniMax-01

Confidence v.s. Critique: A Decomposition of Self-Correction Capability for LLMs

Confidence v.s. Critique: A Decomposition of Self-Correction Capability for LLMs [34.2]
大規模言語モデル(LLM)は自己生成応答を補正することができるが、自己補正後の精度の低下も観察されている。自己訂正能力は、自信(回答を正す自信)と批判(間違った回答を正しいものにする)に分解します。我々の戦略は両方の能力においてバニラSFTより優れており、自己補正後の精度ははるかに高い。
論文参考訳（メタデータ） (Fri, 27 Dec 2024 08:09:11 GMT)
Confidence scoreとCriticの分析、および、自己修正能力を高める手法の提案
「Confidence prompt/ICL example can lead higer CL and lower CS; critique prompt/ICL example can cause lower CL and higher CS.」（Confidence Level (CL) and Critique Score (CS)）とトレードオフの関係にあるとのこと。
両者を改善するために「Critique Improvement Tuning (CCT), which can be divided into Confidence Level Improvement Tuning (CLT) and Critique Score Improvement Tuning (CST).」を提案
リポジトリはGitHub – Zhe-Young/SelfCorrectDecompose: Code for paper “Confidence v.s. Critique: A Decomposition of Self-Correction Capability for LLMs”

Large Concept Models: Language Modeling in a Sentence Representation Space

Large Concept Models: Language Modeling in a Sentence Representation Space [62.7]
本稿では,概念を命名した明示的な高レベルな意味表現に基づくアーキテクチャの試みを行う。概念は言語とモダリティに依存しないものであり、フローにおけるより高いレベルの考えや行動を表している。本モデルでは,多くの言語に対して,ゼロショットの一般化性能が顕著であることを示す。
論文参考訳（メタデータ） (Sun, 15 Dec 2024 21:20:12 GMT)
トークン単位ではなくコンセプト単位に言語を扱ったモデルの提案、「In this study, as proof of feasibility, we assume that a concept corresponds to a sentence, and use an existing sentence embedding space, SONAR, which supports up to 200 languages in both text and speech modalities. The Large Concept Model is trained to perform autoregressive sentence prediction in an embedding space.」という設定で「The LCM outperforms Llama-3.1-8B-IT on English and on the average over foreign languages officially supported by the LLM.」との興味深い結果。一方で「We acknowledge that there is still a long path to reach the performance of current flagship LLMs.」との記載も。
リポジトリはGitHub – facebookresearch/large_concept_model: Large Concept Models: Language modeling in a sentence representation space

Deliberation in Latent Space via Differentiable Cache Augmentation

Deliberation in Latent Space via Differentiable Cache Augmentation [48.2]
凍結した大規模言語モデルをオフラインコプロセッサで拡張し,キー値(kv)キャッシュで動作することを示す。このコプロセッサは、後続の復号化の忠実性を改善するために設計された遅延埋め込みのセットでキャッシュを増強する。キャッシュが拡張されると、デコーダは多数のトークンに対して低いパープレキシティを達成できることを示す。
論文参考訳（メタデータ） (Mon, 23 Dec 2024 18:02:25 GMT)
「This paper introduces differentiable cache augmentation, a novel method for enhancing frozen decoderonly language models by incorporating a learned coprocessor that operates on the model’s kv-cache.」という提案。coprocessorはトレーニング可能。
論文でも言及のある Pause Token と考え方は似ているが、この手法のほうがより強力とのこと。

Knowledge Boundary of Large Language Models: A Survey

Knowledge Boundary of Large Language Models: A Survey [75.7]
大規模言語モデル(LLM)はパラメータに膨大な量の知識を格納するが、特定の知識の記憶と利用に制限がある。これは、LLMの知識境界を理解するための重要な必要性を強調している。本稿では,LLM知識境界の包括的定義を提案し,知識を4つの異なるタイプに分類する形式化された分類法を提案する。
論文参考訳（メタデータ） (Tue, 17 Dec 2024 02:14:02 GMT)
LLMの知識境界に関するサーベイ
面白い視点

DeepSeek v3, QVQ-72B-Preview, YuLan-Mini

公開モデルも高性能化が続いている。DeepSeek v3は671Bと非常に大きなモデル（だが、アクティブパラメータは37BのMoE）でGPT-4oやClaude 3.5 Sonnet競合を主張。 GitHub – deepseek-ai/DeepSeek-V3

QVQ-72B-PreviewはQwen 2.5, Qwen 2 VL, GRIN-MoE, Pixtral – arXiv最新論文の紹介のQwen2 VLから推論能力を強化、GPT-4oだけでなくタスクによってはOpenAI o1と競合する性能を主張。QVQ: To See the World with Wisdom | Qwen

YuLan-Miniは2.42B、1.08Tトークンでのトレーニングと比較的小規模だが、競合する公開モデルを上回る性能を主張。YuLan-Mini/README_ja.md at main · RUC-GSAI/YuLan-Mini · GitHub

中国の研究機関はモデルや手法をかなり公開してくれている印象。非常にありがたい。

YuLan-Mini: An Open Data-efficient Language Model [111.0]
2.42Bパラメータを持つ高い能力を持つベースモデルであるYuLan-Miniは、同様のパラメータスケールのモデルで上位層のパフォーマンスを実現する。注目すべきは、1.08TトークンでトレーニングされたYuLan-Miniは、はるかに多くのデータを必要とする業界主導のモデルに匹敵するパフォーマンスを達成することだ。
論文参考訳（メタデータ） (Mon, 23 Dec 2024 17:47:53 GMT)
「Our approach includes three major contributions to enhance training efficacy: (1) an elaborately designed data pipeline that combines data cleaning with data schedule strategies; (2) a systematic optimization method that can effectively mitigate training instability; (3) an effective annealing approach that integrate targeted data selection and long context training.」とのこと。

DeepSeek-V3 Technical Report [147.2]
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token。我々は14.8兆の多様性と高品質のトークンでDeepSeek-V3を事前訓練し、その後にSupervised Fine-Tuning and Reinforcement Learningのステージを受講した。包括的な評価によると、DeepSeek-V3は他のオープンソースモデルよりも優れており、主要なクローズドソースモデルに匹敵するパフォーマンスを実現している。
論文参考訳（メタデータ） (Fri, 27 Dec 2024 04:03:16 GMT)
「During the pre-training stage, training DeepSeek-V3 on each trillion tokens requires only 180K H800 GPU hours, i.e., 3.7 days on our cluster with 2048 H800 GPUs. Consequently, our pretraining stage is completed in less than two months and costs 2664K GPU hours. Combined with 119K GPU hours for the context length extension and 5K GPU hours for post-training, DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M.」ととてもコストパフォーマンスが良い。もっとも「Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.」

Aurora-M: Open Source Continual Pre-training for Multilingual Language and Code [123.7]
本稿では,英語,フィンランド語,ヒンディー語,日本語,ベトナム語,コードに基づく15Bパラメータの多言語オープンソースモデルであるAurora-Mを提案する。これは、人間がレビューした安全命令を微調整した初めてのオープンソース多言語モデルである。我々はAurora-Mを幅広いタスクや言語で評価し、破滅的な忘れ物に対する頑健さを示した。
論文参考訳（メタデータ） (Fri, 27 Dec 2024 03:53:21 GMT)
aurora-m/aurora-m-biden-harris-redteamed · Hugging Face こういったモデルも存在。対応言語に日本語が明記されている。

Knowledge Injection via Prompt Distillation

Knowledge Injection via Prompt Distillation [48.7]
本稿では,新しい知識を学習するための新しい微調整手法を提案し,RAGの性能に到達できることを示す。提案手法は, 急速蒸留と呼ばれる自己蒸留法に基づいている。
論文参考訳（メタデータ） (Thu, 19 Dec 2024 15:44:01 GMT)
LLMにない知識を用いる場合はRAGを利用することが多いが、それと同様の性能を発揮できるfine tuning手法、 prompt distillation の提案。RAGと組み合わせることも可能とのこと。

RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation

RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation [21.8]
RetroLLMは、検索と生成を単一の凝集プロセスに統合する統合フレームワークである。制約付きエビデンス生成の過程での偽プルーニングを軽減するために,階層的FM-Index制約を導入する。 5つのオープンドメインQAデータセットの実験では、ドメイン内タスクとドメイン外タスクの両方にわたって、RetroLLMの優れたパフォーマンスが示されている。
論文参考訳（メタデータ） (Mon, 16 Dec 2024 16:03:25 GMT)
検索と生成をシームレスにつなぐフレームワークの提案、
リポジトリはGitHub – sunnynexus/RetroLLM: RetroLLM: Empowering LLMs to Retrieve Fine-grained Evidence within Generation

CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy

CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy [88.1]
CC-OCRは、マルチシーンテキスト読取、多言語テキスト読取、文書解析、キー情報抽出の4つのOCR中心のトラックで構成されている。 CC-OCRは、OCR中心のタスクにおけるLMMの能力を総合的に評価し、LMMの進歩を促進することを目的としている。
論文参考訳（メタデータ） (Tue, 03 Dec 2024 07:03:25 GMT)
MLLMのためのOCRベンチマーク、全般的にGemini Proの性能が高い
リポジトリはhttps://github.com/QwenLM/CC-OCR

From Intention To Implementation: Automating Biomedical Research via LLMs

From Intention To Implementation: Automating Biomedical Research via LLMs [32.0]
本稿では,バイオメディカル研究プロセス全体を合理化するために設計された,初のエンドツーエンド自動システムであるBioResearcherを紹介する。複雑なタスクを論理的に関連するサブタスクに分解することで、BioResearcherは多分野要求と論理複雑性の課題を効果的に解決する。 BioResearcherは8つの未測定研究目標に対して平均実行成功率63.07%を達成している。
論文参考訳（メタデータ） (Thu, 12 Dec 2024 16:35:05 GMT)
「BioResearcher employs a modular multi-agent architecture, integrating specialized agents for search, literature processing, experimental design, and programming.」とのこと。
解釈が難しい数値とはいえ、達成率はかなり高い印象。。。

2025年11月
月	火	水	木	金	土	日
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30