2025年10月27日 – arXiv最新論文の紹介

ChatGPT Atlas, Ring-1T, DeepSeek OCR, olmOCR 2

先週はChatGPT Atlas（ChatGPT Atlas）の話題が多かった。GUI Agent（より正確にはブラウザエージェント）のように人が操作しているようにUIを使うエージェントには期待大。

Ring-1TはAnt groupによるLRM、1TパラメータのMoE構成で性能も高い。

また、DeepSeek OCRもバズっていた。OCR性能というよりもコンテキストとして画像データを使う有効性が興味深い。OCRとしてはOlmoOCRのv2も出ていてOSSの動きも盛ん。

Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model [100.9]
Ring-1Tは、数兆のパラメータを持つ最初のオープンソースの最先端の思考モデルである。総パラメータは1兆で、1トークンあたり約500億を活性化する。
論文参考訳（メタデータ） (Tue, 21 Oct 2025 17:46:14 GMT)
大規模なLRM、規模が大きいということもあるがDeepSeek V3.1など既存の公開モデルを超える性能を主張
リポジトリはGitHub – inclusionAI/Ring-V2: Ring-V2 is a reasoning MoE LLM provided and open-sourced by InclusionAI.。モデルはinclusionAI/Ring-1T · Hugging Face

DeepSeek-OCR: Contexts Optical Compression [15.6]
我々は,DeepSeek-OCRを,光学的2次元マッピングによる長期コンテキストの圧縮の実現可能性に関する最初の調査として紹介する。 DeepSeek-OCRはDeepEncoderとDeepSeek3B-MoE-A570Mの2つのコンポーネントで構成されている。実験により、テキストトークンの数がビジョントークンの10倍以内であれば、モデルがデコード(OCR)精度を97%達成できることが示された。
論文参考訳（メタデータ） (Tue, 21 Oct 2025 02:41:44 GMT)
ドキュメントの画像をコンテキストとした扱う構成のLLM、「In this technical report, we propose DeepSeek-OCR and preliminarily validate the feasibility of contexts optical compression through this model, demonstrating that the model can effectively decode text tokens exceeding 10 times the quantity from a small number of vision tokens. We believe this finding will facilitate the development of VLMs and LLMs in the future.」と効率的なよう。
リポジトリはGitHub – deepseek-ai/DeepSeek-OCR: Contexts Optical Compression

olmOCR 2: Unit Test Rewards for Document OCR [29.5]
olmOCR 2は、PDFのようなデジタル化された印刷文書を、クリーンで自然に順序付けられたプレーンテキストに変換する強力なOCRシステム群の最新版です。 olmOCR 2は、強化学習を用いて訓練された7B視覚言語モデル(VLM)であるolmOCR-2-7B-1025で駆動される。これらのテストケースに対するRLトレーニングは、我々の英語OCRベンチマークであるolmOCR-Benchにおける最先端のパフォーマンスをもたらすことを示す。
論文参考訳（メタデータ） (Wed, 22 Oct 2025 17:53:02 GMT)
こちらはOCR、olmOCRのバージョン2。「To scale unit test creation, we develop a pipeline for generating synthetic documents with diverse and challenging layouts, known ground-truth HTML source code, and extracted test cases.」と合成データを活用するアプローチ。
リポジトリはGitHub – allenai/olmocr: Toolkit for linearizing PDFs for LLM datasets/training

A Definition of AGI [208.3]
人工知能の具体的な定義の欠如は、今日の専門的なAIと人間レベルの認知のギャップを曖昧にしている。そこで本研究では,AGIを認知的多目的性と熟達度に適合するものとして,これに対応するための定量的枠組みを提案する。
論文参考訳（メタデータ） (Tue, 21 Oct 2025 01:28:35 GMT)
AGIをよく教育された成人と同レベルの認知的な多様性と熟練度を持つものと定義、定量化のフレームワークを提案。「This paper introduces a quantifiable framework to address this, defining AGI as matching the cognitive versatility and proficiency of a well-educated adult. To operationalize this, we ground our methodology in Cattell-Horn-Carroll theory, the most empirically validated model of human cognition. The framework dissects general intelligence into ten core cognitive domains—including reasoning, memory, and perception—and adapts established human psychometric batteries to evaluate AI systems.」
定義やスコア（GPT-4は27%、GPT-5は58%）に対する見解は様々だと思うが、「Long-Term Memory Storage (MS): The capability to continually learn new information (associative, meaningful, and verbatim).」が最大の課題となっているように見え、そこは納得。

FinDeepResearch: Evaluating Deep Research Agents in Rigorous Financial Analysis [110.6]
HisRubricは階層的な分析構造ときめ細かいグレーディングルーブリックを備えた新しい評価フレームワークである。 FinDeepResearchは、4つの言語にまたがる8つの金融市場から64の上場企業からなるベンチマークである。 6つのDRエージェント、深い推論能力と探索能力を備えた5つのLLM、深い推論能力を持つ5つのLLMを含む16の代表的な手法を用いてFinDeepResearchに関する広範な実験を行った。
論文参考訳（メタデータ） (Wed, 15 Oct 2025 17:21:56 GMT)
金融ドメインのDeepResearchの評価。o3 deepresearchの性能が高い（Grok4やGemini 2.5 Proとは僅差）が「Our experiments suggest that even top-performing DR agents struggle to consistently balance a coherent analytical structure with factual accuracy. This imbalance remains the primary barrier to their deployment in high-stakes applications.」とのこと。。