LLM – ページ 2 – arXiv最新論文の紹介

In-Context Clustering with Large Language Models

In-Context Clustering with Large Language Models [50.3]
ICCは、注意機構を通じて入力間の複雑な関係をキャプチャする。事前学習したLLMは、テキスト符号化された数値データに対して、印象的なゼロショットクラスタリング機能を示す。我々の研究は、文脈内学習を教師なしの設定に拡張し、クラスタリングにおけるLLMの有効性と柔軟性を示します。
論文参考訳（メタデータ） (Thu, 09 Oct 2025 17:07:55 GMT)
LLMの内部知識を用いたクラスタリングモデルの提案。fine tuningによって性能を大きく向上させている。軸設定が強力にできるのが素晴らしい。
プロジェクトサイトはIn-Context Clustering

Gemini 2.5 Computer Use, OpenAI Dev Day, RWKV-8, Mamba3

先週の注目ニュースはGemini 2.5 computer use（Introducing the Gemini 2.5 Computer Use model）、OpenAI Dev Dayの様々なサービスの発表（個人的に注目はApps SDK、Agents – OpenAI API、OpenAI Guardrails Python）だった。各社基盤モデルだけでなくビジネスの領域に踏み込んでくる感は継続している。

アーキテクチャ面だとRWKV-8の順調そうな投稿（XユーザーのBlinkDLさん: 「The new mechanism in RWKV-8 “Heron” 🪶 is named ROSA (acronym, note SA ≠ Self-Attention here) 🌹 ROSA is compromise-free: we get efficient, scalable, genuine infinite ctx, by applying some beautiful algorithms. https://t.co/meM1MRtIhI」 / X、XユーザーのBlinkDLさん: 「RWKV-8 ROSA 🌹 mechanism: neurosymbolic infinite-range lossless information propagator beyond attention, enabling LLMs to invent their own inner monologue languages. First step towards scalable post-neural methods, for a new era in AI 🌌 https://t.co/kAcc7YfKeo」 / X）、Mamba3（著者不明だがMamba-3: Improved Sequence Modeling using State Space Principles | OpenReview）にも注目という感じ。SSMとTransformerハイブリッドの小型推論モデル、ai21labs/AI21-Jamba-Reasoning-3B · Hugging Faceも高性能そうでSSMの発展には期待が大きい。

毎年恒例の🪩 The State of AI Report 2025 🪩をみつつ（一部微妙な記載もあるが）研究の進展が速いのと、応用領域が広がっていることを感じている。International Astronomy & Astrophysics OlympiadでLLMが好成績をおさめる報告も興味深い。

Large Language Models Achieve Gold Medal Performance at International Astronomy & Astrophysics Olympiad [43.5]
我々は,国際天文学・天体物理学試験(IOAA)において,5つの大きな言語モデル(LLM)をベンチマークした。平均スコアは85.6%、84.2%で、ジェミニ2.5 ProとGPT-5は4つのIOAA理論試験で200-300人中上位2位にランクインした。 GPT-5は88.5%のスコアで試験に合格しており、最新の4つのIOAAの参加者の中ではトップ10にランクインしている。
論文参考訳（メタデータ） (Mon, 06 Oct 2025 16:58:47 GMT)

MuSLR: Multimodal Symbolic Logical Reasoning

MuSLR: Multimodal Symbolic Logical Reasoning [133.9]
マルチモーダルな論理的推論は、自律運転や診断などの高度な応用において重要である。形式論理規則を基礎としたマルチモーダルな記号論理的推論のための最初のベンチマーク Mu SLR を導入する。我々は,GPT-4.1のChain-of-Thought性能を14.13%向上させるモジュール型フレームワークであるLogiCAMを提案する。
論文参考訳（メタデータ） (Tue, 30 Sep 2025 06:42:20 GMT)
Multimodal symbolic logical reasoningを対象とするベンチマークMuSLRの構築。またベースラインとしてモジュラー構成のLogiCAMを提案している。現在のフロンティアなモデルでも難しいベンチマークのよう。
改善のための「First, integrating dedicated symbolic modules is essential: the LogiCAM outperforms base VLMs precisely because it extracts multimodalities based on logic and embeds explicit symbolic reasoning steps. Second, existing VLMs struggle to align and fuse visual and textual information when performing formal logic; Future work should explore tighter multimodal integration, such as cross-modal architectures trained with logic-grounded objectives, to bridge this gap.」という指摘が興味深く、現行モデルは形式的な処理に苦労しているように見える。
リポジトリはMuSLR: Multimodal Symbolic Logical Reasoning

Fluid Language Model Benchmarking

Fluid Language Model Benchmarking [126.9]
我々は,複数の次元にわたるLMベンチマークを進展させる新しい評価手法であるFluid Benchmarkingを紹介する。サイコメトリックスにインスパイアされたFluid Benchmarkingは、ベンチマーク項目の相対値がLMの能力レベルに依存するという洞察に基づいている。効率性,妥当性,分散性,飽和性の4つの次元を検証した結果,Fluid Benchmarkingがすべてにおいて優れた性能を発揮することがわかった。
論文参考訳（メタデータ） (Sun, 14 Sep 2025 05:49:42 GMT)
「we introduce FLUID BENCHMARKING, a new evaluation approach that advances LM benchmarking across multiple dimensions. Inspired by psychometrics, FLUID BENCHMARKING is based on the insight that the relative value of benchmark items depends on an LM’s capability level, suggesting that evaluation should adapt to each LM. Methodologically, FLUID BENCH- MARKING estimates an item response model based on existing LM evaluation results and uses the inferred quantities to select evaluation items dynamically, similar to computerized adaptive testing in education.」との評価方法の提案。
リポジトリはGitHub – allenai/fluid-benchmarking: Fluid Language Model Benchmarking

Hunyuan3D-Omni, Qwen3-Omni, LongCat-Flash-Thinking, EmbeddingGemma, Logics-Parsing

公開モデルの開発はとても盛んで、先週はQwen3 Omniが話題になることが多かったように思う。arXivではQwen3 Omini以外にも有望なモデルの発表が相次いでいる。

Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets [34.7]
Hunyuan3D-Omniは、Hunyuan3D 2.1上に構築されたきめ細かい制御可能な3Dアセット生成のための統一されたフレームワークである。我々のモデルは単一のクロスモーダルアーキテクチャで全ての信号を統一する。実験により、これらの追加制御により生成精度が向上し、幾何認識変換が可能となり、生産の堅牢性も向上することが示された。
論文参考訳（メタデータ） (Thu, 25 Sep 2025 14:39:17 GMT)
3Dにフォーカスした実装
リポジトリはGitHub – Tencent-Hunyuan/Hunyuan3D-Omni: Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets

Qwen3-Omni Technical Report [105.1]
Qwen3-Omniは、テキスト、画像、オーディオ、ビデオ間で最先端のパフォーマンスを維持する単一のマルチモーダルモデルである。 Qwen3-OmniはQwenシリーズ内の同一サイズのシングルモーダルモデルのパフォーマンスと一致し、特にオーディオタスクに優れる。 119言語でのテキストインタラクション、19言語での音声理解、および10言語での音声生成をサポートする。
論文参考訳（メタデータ） (Mon, 22 Sep 2025 13:26:24 GMT)
Qwen系のマルチモーダルモデル
リポジトリはGitHub – QwenLM/Qwen3-Omni: Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

LongCat-Flash-Thinking Technical Report [116.8]
LongCat-Flash-ThinkingはオープンソースのMixture-of-Experts (MoE)推論モデルである。高度な能力は、巧妙に製作された訓練プロセスを通じて育成される。 LongCat-Flash-Thinkingは、複雑な推論タスクのスイート上で、オープンソースモデル間の最先端のパフォーマンスを達成する。
論文参考訳（メタデータ） (Tue, 23 Sep 2025 10:25:48 GMT)
MoEなLRM、OSSなモデルでのSoTAを主張
リポジトリはmeituan-longcat/LongCat-Flash-Thinking · Hugging Face

EmbeddingGemma: Powerful and Lightweight Text Representations [42.4]
EmbeddingGemmaはGemma 3言語ファミリに基づいた、新しい軽量でオープンなテキスト埋め込みモデルである。スプレッドアウト正規化器を用いてモデル頑健性と表現性を向上する。さらなる研究を促進するため、コミュニティに EmbeddingGemma をリリースします。
論文参考訳（メタデータ） (Wed, 24 Sep 2025 17:56:51 GMT)
小規模、強力なEmbeddingモデル
リポジトリはEmbeddingGemma – a google Collection

Logics-Parsing Technical Report [9.0]
我々は、強化学習を付加したエンドツーエンドのLVLMモデルであるLogics-Parsingを提案する。本モデルでは、複雑なレイアウト解析と読み出し順序推定を最適化するために、厳密に設計された報酬機構を組み込んでいる。 LogicsParsingBenchは、9つの主要なカテゴリと20以上のサブカテゴリにまたがる1,078ページレベルのPDFイメージのキュレートされたセットである。
論文参考訳（メタデータ） (Wed, 24 Sep 2025 04:54:37 GMT)
Document Understandingに有効なLVLM
リポジトリはGitHub – alibaba/Logics-Parsing

A Systematic Survey on Large Language Models for Evolutionary Optimization: From Modeling to Solving

A Systematic Survey on Large Language Models for Evolutionary Optimization: From Modeling to Solving [26.5]
大規模言語モデル(LLM)は、最適化問題に対処するためにますます研究されている。急速な進歩にもかかわらず、この分野は依然として統一的な合成と体系的な分類を欠いている。この調査は、最近の開発を包括的にレビューし、構造化されたフレームワーク内でそれらを整理することで、このギャップに対処する。
論文参考訳（メタデータ） (Wed, 10 Sep 2025 04:05:54 GMT)
最適化問題に対するLLM活用のサーベイ
リポジトリはGitHub – ishmael233/LLM4OPT: A collection of LLMs for optimization, including modeling and solving

A Survey of Long-Document Retrieval in the PLM and LLM Era

A Survey of Long-Document Retrieval in the PLM and LLM Era [19.1]
この調査は、LDR(Long-Docment Search)の最初の包括的治療を提供する。古典的語彙モデルと初期ニューラルモデルから近代事前学習モデル(PLM)および大規模言語モデル(LLM)への進化を体系化する。我々は、ドメイン固有のアプリケーション、特別な評価リソースをレビューし、効率のトレードオフ、マルチモーダルアライメント、忠実さといった重要なオープン課題を概説する。
論文参考訳（メタデータ） (Tue, 09 Sep 2025 13:57:53 GMT)
長い文書の取り扱いに関するサーベイ

Pre-training under infinite compute

Pre-training under infinite compute [87.0]
本研究では、エポック数の増加とパラメータ数の増加に対するデータ制約によるアプローチが、最終的には過度に適合することを示す。独立に訓練されたモデルのアンサンブルは、正規化レシピよりもはるかに低損失の漸近を達成できる。この結果から,計算量の多い将来において,よりデータ効率の高い事前学習が実現できることが示唆された。
論文参考訳（メタデータ） (Thu, 18 Sep 2025 09:36:23 GMT)
「Our best intervention combining epoching, regularization, parameter scaling, and ensemble scaling achieves an asymptote at 200M tokens using 5.17× less data than our baseline, and our data scaling laws predict that this improvement persists at higher token budgets. We find that our data efficiency gains can be realized at much smaller parameter counts as we can distill an ensemble into a student model that is 8× smaller and retains 83% of the ensembling benefit.」とデータ枯渇の懸念に対する回答になりそうな結果。

MobileLLM-R1, APERTUS

先週はOpenAIによるICPCの成果（https://x.com/MostafaRohani/status/1968360976379703569）などが話題になった。クローズドモデルの性能向上は本当にすごい。とはいえ、Metaによる小型モデルMobileLLM-R1（facebook/MobileLLM-R1-950M · Hugging Face）やオープンかつ権利関係にも気を使い他のモデルと競合的な性能を達成しているAPERTUS など公開モデルの取り組みも興味深い状況が続く。本当に目が離せない。

Apertus: Democratizing Open and Compliant LLMs for Global Language Environments [163.7]
Apertusは、今日のオープンモデルエコシステムにおける2つのシステム的欠点に対処するために設計された、大きな言語モデル(LLM)の完全なオープンスイートである。 Apertusモデルは、公開データにのみ事前訓練されており、ロボット.txtの除外や、非許容的で有毒で個人が特定可能なコンテンツに対するフィルタリングを尊重している。 Apertusモデルはまた、1800以上の言語から15Tトークンをトレーニングし、非英語コンテンツに割り当てられた事前トレーニングデータの40%をトレーニングしている。
論文参考訳（メタデータ） (Wed, 17 Sep 2025 17:59:21 GMT)
オープンかつ多言語、さらに権利関係にもかなり配慮しているモデル「The models are trained on 15T tokens from 1811 languages with retroactive respect for robots.txt and related opt outs, and with a Goldfish-style objective to curb verbatim reproduction of training text.」。性能もかなり高く、非常に興味深い。
モデルはswiss-ai/Apertus-70B-Instruct-2509 · Hugging Face

Qwen3-Next-80B-A3B, Qwen3-ASR, Hunyuan-MT, MMBERT

先週の大きなニュースは非常に疎な構成を持ち性能の高いQwen/Qwen3-Next-80B-A3B-Instruct · Hugging Faceの発表だろうと思う。DeepSeekなども同様にMoE構成ではとてもスパースな構造をとることが流行っている。Qwenからはマルチリンガルな音声認識モデルQwen-ASRも発表されている。周辺領域もしっかりと作っている印象。

Hunyuan-MTはHunyuanをベースとした機械翻訳モデルである。特化型大規模言語モデル『PLaMo翻訳』を公開しました – Preferred Networks Research & Developmentもだが、LLMベースのものは非常に強力である。

最後にマルチリンガルなencoder onlyモデル、MMBERTも発表されていた。decoder onlyなLLM全盛という感じではあるが、分類など実用的なタスクでは今でも重要なアプローチである。

Hunyuan-MT Technical Report [20.9]
Hunyuan-MT-7Bは33の主要言語にまたがる双方向翻訳をサポートしている。 Hunyuan-MT-Chimera-7Bは、スローシンキングモードにインスパイアされた翻訳モデルである。
論文参考訳（メタデータ） (Fri, 05 Sep 2025 16:11:05 GMT)
「The development of our models follows a holistic training process specifically engineered for multilingual translation, which begins with general and MT-oriented pre-training to build foundational capabilities, proceeds to Supervised Fine-Tuning (SFT) for task-specific adaptation, and culminates in advanced alignment through Reinforcement Learning (RL) and weak-to-strong RL.」とあるがそれぞれのパイプラインもとても凝っている。
リポジトリはtencent/Hunyuan-MT-7B · Hugging Face

mmBERT: A Modern Multilingual Encoder with Annealed Language Learning [57.6]
mmBERTは、多言語テキストの3Tトークンで事前訓練されたエンコーダのみの言語モデルである。データに1700以上の低リソース言語を追加しています。分類および検索タスクにおける従来のモデルよりも, mmBERTの方が優れていたことを示す。
論文参考訳（メタデータ） (Mon, 08 Sep 2025 17:08:42 GMT)
「We do this by pre-training our new model suite, MMBERT, on 3T tokens of multilingual text using an architecture inspired from ModernBERT (Warner et al , 2024).」というマルチリンガルBERT。
リポジトリはGitHub – JHU-CLSP/mmBERT: A massively multilingual modern encoder language model

2025年11月
月	火	水	木	金	土	日
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30