2025年5月26日 – arXiv最新論文の紹介

Google I/O, Claude 4 Sonnet / Opus

Google I/Oで発表されたGemini 2.5 Proの性能（含DeepThink）、Imagen 4やVeo 3といった画像生成・動画生成モデル及び同時期に発表された拡散モデルなGemini DiffusionはGoogleが全方位で生成AIに取り組み、かつ、高い成果を出している証拠でさすがGoogleという感じ。

GoogleはIt’s All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization – arXiv最新論文の紹介などNext Transformerな研究も積極的に行っておりとても興味深い。このあたりもさすがGoogle。

AnthropicからはClaude 4が発表されている。Agenticな動作で重要となる機能やコード生成で高い性能を主張しており、期待大。

Introducing Claude 4 \ Anthropic

OpenAI一強の時代から一歩進んだ印象。オープンな取り組みも活発だが、商用モデルも立て続けに興味深い発表がされており非常に競争が激しい。

When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners

When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners [111.5]
言語固有のアブレーションは多言語推論性能を継続的に向上させることを示す。トレーニング後のアブレーションと比較して、トレーニング不要のアブレーションは、計算オーバーヘッドを最小限に抑えながら、同等または優れた結果が得られる。
論文参考訳（メタデータ） (Wed, 21 May 2025 08:35:05 GMT)
「Drawing inspiration from cognitive neuroscience, which suggests that human reasoning functions largely independently of language processing, we hypothesize that LLMs similarly encode reasoning and language as separable components that can be disentangled to enhance multilingual reasoning」に基づき、「Through targeted interventions in the LLMs’ activation space, we demonstrate that removing language-specific information significantly improves reasoning performance across languages.」とのこと。
仮説も検証結果も非常に興味深い。LLMは人間の脳とは全く別のはずだが近い動き（機能分解）になっているのは何故なんだろう・・・

Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought

Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought [190.9]
Hunyuan-TurboSは、Transformer-Mamba Mixture of Expertsの大型ハイブリッドモデルである。高いパフォーマンスと効率のバランスを保ち、推論コストを低く抑えている。
論文参考訳（メタデータ） (Wed, 21 May 2025 12:11:53 GMT)
TencentによるMamba hybrid、MoE、Adaptive CoTと全部盛り感のあるモデル（Mistral Small 3.1, Hunyuan-T1 – arXiv最新論文の紹介にも関連）。
- Hunyuan-TurboS features an adaptive long-short chain-of-thought (CoT) mechanism, dynamically switching between rapid responses for simple queries and deep ”thinking” modes for complex problems, optimizing computational resources. Architecturally, this 56B activated (560B total) parameter model employs 128 layers (Mamba2, Attention, FFN) with an innovative AMF/MF block pattern.
Mambaアーキテクチャ（ハイブリッド）モデルでベンチマークのスコアも非常に高い。「LMSYS Chatbot Arena with a score of 1356, outperforming leading models like Gemini-2.0-Flash-001 (1352) and o4-mini-2025-04-16 (1345)」とのこと。（LLM？LRM？という疑問はありつつ）個別タスクだと他のオープンソースモデルや商用モデルを超えているものもある。オープンな比較対象はLlama-4-Maverick, DeepSeek-V3 , Qwen3-235B-A22Bと最新のもの。
「The inference of the Hunyuan-TurboS model is powered by the AngelHCF Inference Acceleration Framework. For the Mamba Hybrid architecture of the TurboS model, we have implemented optimizations across folloing three key dimensions, ultimately achieving a 1.8x speedup compared to Hunyuan-Turbo, which is a pure Transformers MoE model」とMambaの有効性もしてしており、全般的に非常に先進的なモデルに見える。

LLMs unlock new paths to monetizing exploits

LLMs unlock new paths to monetizing exploits [85.6]
大規模言語モデル(LLM)はすぐにサイバー攻撃の経済性を変えるだろう。 LLMは、敵がユーザーごとにカスタマイズされた攻撃を起動することを可能にする。
論文参考訳（メタデータ） (Fri, 16 May 2025 17:05:25 GMT)
LLMの悪用可能性に関する報告。より適合的な攻撃ができるのはそうだろうと思う。
「To demonstrate this capability, we divide all emails from the Enron dataset into 150 (potentially overlapping) sets, grouped by the Enron employee who has sent or received that email. We then feed each of these collections of emails into a LLM (Claude 3.5 Sonnet) and ask it to describe everyone who this employee is emailing. Doing this identifies one Enron employee (John G.) who is having an extramarital affair with a coworker.」は大規模データ分析の点からも興味深い。

R&D-Agent: Automating Data-Driven AI Solution Building Through LLM-Powered Automated Research, Development, and Evolution

R&D-Agent: Automating Data-Driven AI Solution Building Through LLM-Powered Automated Research, Development, and Evolution [60.8]
R&D-Agentは反復探索のための二重エージェントフレームワークである。 Researcherエージェントはパフォーマンスフィードバックを使用してアイデアを生成し、Developerエージェントはエラーフィードバックに基づいてコードを洗練する。 R&D-AgentはMLE-Benchで評価され、最高のパフォーマンスの機械学習エンジニアリングエージェントとして登場した。
論文参考訳（メタデータ） (Tue, 20 May 2025 06:07:00 GMT)
GitHub – openai/mle-bench: MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineeringでSoTAを主張、「the framework employs two specialized agents – the “Researcher” and the “Developer” – which correspond to the two types of feedback provided in each exploration step: solution performance and execution error information.」という構成。現実に近いような。。。
リポジトリはGitHub – microsoft/RD-Agent: Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automating these high-value generic R&D processes through R&D-Agent, which lets AI drive data-driven AI. 🔗https://aka.ms/RD-Agent-Tech-Report

2025年5月
月	火	水	木	金	土	日
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31