長文 – arXiv最新論文の紹介

Artificial Hippocampus Networks for Efficient Long-Context Modeling

Artificial Hippocampus Networks for Efficient Long-Context Modeling [17.2]
ロングシーケンス・モデリングは、RNNのようなモデルにおける圧縮固定サイズメモリの効率と、注目ベースのトランスフォーマーにおけるメモリの増大の忠実さとのトレードオフに直面している。認知科学における多段階モデルに着想を得て,人工ニューラルネットワークのメモリフレームワークを導入する。長文ベンチマークのLV-EvalとInfiniteBenchの実験は、AHN拡張モデルがスライディングウインドウベースラインを一貫して上回ることを示した。
論文参考訳（メタデータ） (Wed, 08 Oct 2025 17:59:55 GMT)
「AHNs address the efficiency limitation of standard transformers by maintaining a sliding window of KV cache as lossless memory while transforming out-of-window information into a fixed-size compressed memory This approach enables AHN-augmented models to achieve constant memory and computational complexity per token over long sequences. Experiments」と長文に強い構造の提案。
リポジトリはGitHub – ByteDance-Seed/AHN: AHN: Artificial Hippocampus Networks for Efficient Long-Context Modeling

When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs [64.3]
最近のLong-Context Language Modelsは、1つのプロンプトで数十万のトークンを処理することができる。我々は、従来の問題解決トレースから導かれた再利用可能な思考キャッシュとして、推論をリキャストする。本稿では,自然言語フィードバックによって学習データから得られるテンプレートを反復的に洗練する更新戦略を提案する。
論文参考訳（メタデータ） (Wed, 08 Oct 2025 19:52:35 GMT)
「Thought Template Augmented LCLMs (TOTAL), that equips long- context models with reusable reasoning patterns and iteratively refines them through natural language feedback.」というアプローチの提案。ロングコンテキストをうまく使う記憶というイメージだろうか。
リポジトリはhttps://github.com/starsuzi/ToTALとのことだが現時点では404

Who Gets Cited Most? Benchmarking Long-Context Language Models on Scientific Articles

Who Gets Cited Most? Benchmarking Long-Context Language Models on Scientific Articles [81.9]
SciTrekは、科学論文を用いた大規模言語モデル(LLM)の長文推論能力を評価するために設計された、新しい質問応答ベンチマークである。本分析により,モデルの基本的数値演算を行ない,特定の情報を長い文脈で正確に特定する能力において,系統的な欠点が明らかとなった。
論文参考訳（メタデータ） (Thu, 25 Sep 2025 11:36:09 GMT)
「This paper introduced SciTrek, a benchmark designed for testing the ability of LLMs to perform multi-document information synthesis and structured reasoning over full-text scientific articles. 」と科学分野のマルチドキュメント・長文ベンチマーク。
リポジトリはGitHub – oaimli/SciTrek: Benchmarking long-context language models on scientific articles

A Survey of Long-Document Retrieval in the PLM and LLM Era

A Survey of Long-Document Retrieval in the PLM and LLM Era [19.1]
この調査は、LDR(Long-Docment Search)の最初の包括的治療を提供する。古典的語彙モデルと初期ニューラルモデルから近代事前学習モデル(PLM)および大規模言語モデル(LLM)への進化を体系化する。我々は、ドメイン固有のアプリケーション、特別な評価リソースをレビューし、効率のトレードオフ、マルチモーダルアライメント、忠実さといった重要なオープン課題を概説する。
論文参考訳（メタデータ） (Tue, 09 Sep 2025 13:57:53 GMT)
長い文書の取り扱いに関するサーベイ

MiniMax-01: Scaling Foundation Models with Lightning Attention

MiniMax-01: Scaling Foundation Models with Lightning Attention [59.4]
MiniMax-Text-01とMiniMax-VL-01は、より長いコンテキストを処理するのに優れた機能を提供する。 MiniMax-Text-01は、トレーニング中に最大100万のトークンに到達でき、推論時に400万のトークンを安価な価格で外挿できる。私たちのビジョン言語モデルであるMiniMax-VL-01は、512億のビジョン言語トークンによる継続的なトレーニングによって構築されます。
論文参考訳（メタデータ） (Tue, 14 Jan 2025 18:50:05 GMT)
456B（32エキスパート、アクティブパラメータ 45.9B）のMoE構成の大規模な公開LLM。性能はGPT-4oなど商用モデルに匹敵するうえ、扱えるコンテキスト長が4Mトークンととても長い。「We demonstrate the first successful large-scale implementation of linear attention.」と主張（「After extensive experimentation, we settled on a hybrid architecture mainly using lightning attention (Qin et al , 2024b), an I/O-aware implementation of a linear attention variant (Qin et al , 2022a).」ともある通りハイブリッド構成）。
リポジトリはGitHub – MiniMax-AI/MiniMax-01

A Controlled Study on Long Context Extension and Generalization in LLMs

A Controlled Study on Long Context Extension and Generalization in LLMs [85.5]
広義のテキスト理解とテキスト内学習は、完全な文書コンテキストを利用する言語モデルを必要とする。長期コンテキストモデルを直接訓練する際の実装上の課題のため、長期コンテキストを扱うためにモデルを拡張する多くの方法が提案されている。我々は,一貫したベースモデルと拡張データを利用して,標準化された評価による拡張メソッドの制御プロトコルを実装した。
論文参考訳（メタデータ） (Wed, 18 Sep 2024 17:53:17 GMT)
長文の取り扱いに関する手法の評価、「Our study underscores the role of perplexity as a crucial, performance indicator at length and highlights the trade-offs inherent in different attention mechanisms.」
リポジトリはGitHub – Leooyii/LCEG: Long Context Extension and Generalization in LLMs

What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices

What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices [91.7]
拡張コンテキストウィンドウを持つLong Language Model (LLM) は、情報抽出、質問応答、複雑な計画シナリオなどのタスクを大幅に改善した。既存のメソッドは通常、Self-Instructフレームワークを使用して、長いコンテキスト能力を改善するために命令チューニングデータを生成する。本稿では,品質検証エージェント,シングルホップ質問生成エージェント,複数質問サンプリング戦略,マルチホップ質問マーガーエージェントを組み込んだマルチエージェント対話型マルチホップ生成フレームワークを提案する。以上の結果から,我々の合成高品位長文指導データにより,多量の人体で訓練したモデルよりも,モデル性能が著しく向上することが示唆された。
論文参考訳（メタデータ） (Tue, 03 Sep 2024 13:30:00 GMT)
Multi-Agent Interactive Multi-hop Generation (MIMG) frameworkによるマルチホップなデータ合成とそのデータの有効性検証。さまざまな研究でAgenticな動作によるデータ合成は有効であることが知られていて、この分野のベストプラクティスとしても有効。「a quality verification agent, a single-hop question generation agent, a multiple question sampling strategy, and a multi-hop question merger agent」と多数のエージェントが協調。
リポジトリはGitHub – WowCZ/LongMIT: LongMIT: Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets

MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations [105.1]
MMLongBench-Doc は 1,062 のエキスパート注釈付き質問を含む長文マルチモーダルベンチマークである。 130の長いPDFフォーマットの文書の上に構築されており、平均49.4ページと20,971のテキストトークンがある。 14個のLVLMの実験により、長いコンテキストのDUが現在のモデルに大きく挑戦することを示した。
論文参考訳（メタデータ） (Mon, 01 Jul 2024 17:59:26 GMT)
マルチモーダルかつ長文のベンチマーク。GPT-4oの優秀さが目立ち、OCR＋LLMを超えている。
リポジトリはMMLongBench-Doc (mayubo2333.github.io)

PINE : Position-INvariant inferencE

Eliminating Position Bias of Language Models: A Mechanistic Approach [119.3]
位置バイアスは現代言語モデル (LM) の一般的な問題であることが証明されている。因果的注意は一般的に、モデルが遠方のコンテンツを好むのに対して、RoPEのような相対的な位置エンコーディングは近くのものを好む。本研究では,異なる入力セグメント順序(例えばLM-as-a-judgeのオプション,QAの検索文書)によって生じる位置バイアスを,TRAINING-FREE ZERO-SHOT方式で推定する。
論文参考訳（メタデータ） (Mon, 01 Jul 2024 09:06:57 GMT)
位置バイアスを除去する手法の提案。アテンションスコアの類似性を使って位置情報を割り当てなおすアプローチのよう（？）、トレーニングフリーだが計算コストは高めに思える。
位置バイアスは「Further, our empirical study on object detection reveals that position bias is also present in vision-language models (VLMs).」とMLLMでも影響ありとのこと。
リポジトリはGitHub – wzq016/PINE: Offcial Repo of Paper “Eliminating Position Bias of Language Models: A Mechanistic Approach””

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems [124.8]
我々は、文書のHaystackを合成する手順を設計し、特定のテキストが文書間で繰り返されることを保証します。すると、”Summary of a Haystack”(SummHay)タスクは、Haystackを処理し、クエリ、関連する洞察を特定し、ソースドキュメントを正確に引用する要約を生成するシステムを必要とする。
論文参考訳（メタデータ） (Mon, 01 Jul 2024 15:23:42 GMT)
長文・大量の文書を要約できるかに関する（合成データによる）SummHay ベンチマークを構築、様々なLLM及びRAGを比較した論文。「achieving strong coverage of key insights in a large corpus of text does not require retrieval, given a sufficiently capable long-context LLM.」、「for use-cases where citation quality is important, optimizing retrieval is paramount: it removes irrelevant documents from the summarizer’s context, narrowing and focusing options for citation.」とユースケースによってRAGの有効性が変わるよう。Gemini 1.5 ProはRAGなしでも相当有効に機能しているようなことも興味深い。Retrieveの戦略も複数比較されており参考になる。
リポジトリはGitHub – salesforce/summary-of-a-haystack: Codebase accompanying the Summary of a Haystack paper.

2025年10月
月	火	水	木	金	土	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31