2025年7月18日 – arXiv最新論文の紹介

NeoBabel: A Multilingual Open Tower for Visual Generation

NeoBabel: A Multilingual Open Tower for Visual Generation [32.8]
我々は,新しい多言語画像生成フレームワークNeoBabelを紹介する。英語、中国語、オランダ語、フランス語、ヒンディー語、ペルシア語という6つの言語をサポートしている。それは、強い英語能力を維持しながら、最先端の多言語のパフォーマンスを達成する。
論文参考訳（メタデータ） (Tue, 08 Jul 2025 16:19:45 GMT)
「This paper introduces NeoBabel, a novel multilingual image generation framework that represents the first scalable solution for direct text-to-image synthesis across six languages. Through meticulous curation of high-quality multilingual vision-language datasets and end-to-end training, NeoBabel establishes direct cross-lingual mappings between textual descriptions and visual outputs across all supported languages.」という翻訳を介さない多言語対応画像生成モデルの提案。文化に関わる単語を翻訳するのは困難であり、このようなモデルは重要。
リポジトリはNeoBabel: A Multilingual Open Tower for Visual Generation

Robust Multimodal Large Language Models Against Modality Conflict [94.1]
マルチモーダル大言語モデル(MLLM)は、現実のシナリオにおいて幻覚を起こす傾向がある。我々は、MLLMをジレンマに配置し、幻覚に直接導く異なるモダリティからの入力における固有の矛盾について研究する。モダリティ衝突による幻覚を緩和する3つの方法が提案されている。
論文参考訳（メタデータ） (Wed, 09 Jul 2025 11:18:38 GMT)
MLLM特有のハルシネーション（モダリティ間の不整合に関連するもの）に対する対策の整理「Multimodal Modality Conflict (MMMC) 」というデータセットも作成し検証。検証の中ではプロンプトエンジニアリング、SFT、強化学習でのハルシネーション軽減を試し「Our results show that the reinforcement learning method achieves the best performance in mitigating the hallucination under modality conflict, while the supervised fine- tuning method shows promising and stable performance.」とのこと。
リポジトリはGitHub – zmzhang2000/MMMC: Official repository for Robust Multimodal Large Language Models Against Modality Conflict

Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs [45.8]
大規模言語モデル(LLM)は、幅広いタスクを解くことができる汎用エージェントへと急速に進歩してきた。彼らは、タスクの複雑さに関わらず、固定推論時間計算を適用し、しばしば難しいことを考えながら単純な問題を過小評価する。本調査では, LLM推論の計算効率向上を目的とした, 効率的なテスト時間計算戦略の総合的なレビューを行う。
論文参考訳（メタデータ） (Wed, 02 Jul 2025 18:27:42 GMT)
「This survey presents a comprehensive review of efficient test-time compute (TTC) strategies, which aim to improve the computational efficiency of LLM reasoning. We introduce a two-tiered taxonomy that distinguishes between L1 controllability—methods that operate under fixed compute budgets—and L2 adaptiveness—methods that dynamically scale inference based on input difficulty or model confidence.」というサーベイ。
商用モデルでのハイブリッドアプローチも流行っていて色々と苦労している部分なんだろうなと思う。

Predicting thinking time in Reasoning models [42.6]
推論モデルは長く隠れた思考の連鎖を生み出します。ユーザーは、答えを返す前にモデルが推論にどれくらいの時間を費やすかについての洞察がほとんどない。
論文参考訳（メタデータ） (Sun, 29 Jun 2025 15:01:01 GMT)
LRMにおける推論時間の予測に関する報告。
「In this paper, we explore methods for online prediction of thinking time in reasoning models. Our experiments demonstrate that current models encode a notion of progress in their internal representations, with an mlp probe achieving 45% accuracy over 10 classes, moreover the errors appear highly local (MAE 1).」