staka – ページ 14 – arXiv最新論文の紹介

A Survey on World Models Grounded in Acoustic Physical Information

A Survey on World Models Grounded in Acoustic Physical Information [13.0]
本調査は, 音波物理情報に基づく世界モデルの新しい分野を包括的に概観する。理論的基盤、重要な方法論の枠組み、最近の技術進歩について考察する。この調査では、ロボット工学、自律運転、ヘルスケア、ファイナンスにおけるアコースティックワールドモデルの重要な応用について詳述している。
論文参考訳（メタデータ） (Mon, 16 Jun 2025 04:59:42 GMT)
World modelを念頭にPhysical acousticsに注目したサーベイ。

SGIC: A Self-Guided Iterative Calibration Framework for RAG

SGIC: A Self-Guided Iterative Calibration Framework for RAG [45.2]
大規模言語モデル(LLM)は、頑健な文脈内推論を生かしている。ツールとして不確実性スコアを用いる新しいフレームワークを提案する。また、反復的な自己校正訓練セットを構築するための革新的なアプローチも導入する。
論文参考訳（メタデータ） (Thu, 19 Jun 2025 09:45:13 GMT)
不確実性スコアを使ってRAGの性能向上を狙うアプローチ（(1) estimating the uncertainty scores of each document and the generated answers (Section 3.1); (2) iteratively utilizing the generated answers and their corresponding uncertainty scores from the validation set to perform the self-calibration process during the inference stage (Section 3.2); and (3) designing a strategy to reconstruct a new training set to fine-tune a self-guided iterative calibration LLM with uncertainty awareness (Section 3.3).）。トークンレベルで確信度的な値が取れるオープンなモデルだと効果が大きいように見える。
「Our framework consistently improves performance for both open-weight and closed-source models by utilizing uncertainty scores of documents and generated answers.」とのこと

Distilling On-device Language Models for Robot Planning with Minimal Human Intervention

Distilling On-device Language Models for Robot Planning with Minimal Human Intervention [117.9]
PRISMは、SLM(Small Language Model)対応ロボットプランナーを蒸留するためのフレームワークである。 PRISMを3つのLCM対応プランナーに適用し、マッピング、探索、操作、家事支援を行う。 GPT-4o の 10-20% から 93% 以上まで, PRISM は Llama-3.2-3B の性能を向上することを示した。
論文参考訳（メタデータ） (Fri, 20 Jun 2025 21:44:27 GMT)
robot planningを対象とした「Given a source LLM-enabled planner, PRISM synthesizes tasks and environments, elicits plans from the LLM-enabled planner in these synthesized environments, and then uses the resulting data to train an SLM-enabled planner that serves as a drop-in replacement for the source model.」という蒸留フレームワークの提案。直観的にも有効そうだが実際有望な結果。
プロジェクトサイトはPRISM

From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents

From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents [96.7]
推論とエージェント能力を備えた大規模言語モデル(LLM)は、エージェントディープリサーチ(Agenic Deep Research)と呼ばれる新しいパラダイムを取り入れている。静的なWeb検索から,計画,探索,学習を行う対話型エージェントベースのシステムへの進化を辿ります。我々はエージェントディープリサーチが既存のアプローチを著しく上回るだけでなく、将来の情報探索において支配的なパラダイムになることを実証する。
論文参考訳（メタデータ） (Thu, 26 Jun 2025 17:18:00 GMT)
DeepResearchに関するサーベイ、論文が出るのも凄いスピードだが、サーベイが出るのも早い・・・
リポジトリはGitHub – DavidZWZ/Awesome-Deep-Research: [Up-to-date] Awesome Agentic Deep Research Resources

A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models

A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models [45.1]
Webのコンテキストでは、退屈な日々のタスクを扱う人々を支援するために、AI Agents — WebAgents — を活用することで、生産性と効率が劇的に向上する。 LFMの可能性を十分に探求するために、ユーザの指示に従って日々のWebタスクを完了させるように設計されたWebAgentsに広範な研究が登場した。
論文参考訳（メタデータ） (Mon, 26 May 2025 07:05:18 GMT)
利用が広がるWebAgentのサーベイ

Early Stopping Tabular In-Context Learning

Early Stopping Tabular In-Context Learning [40.6]
テキスト内学習を早期に行うことを提案する。トランスフォーマーエンコーダの各レイヤの後にコンテキスト内学習を停止させるかどうかを動的に評価することでこれを実現する。一旦停止すると、プレトレーニングされたレイヤワイズデコーダを使って埋め込みをデコードします。
論文参考訳（メタデータ） (Thu, 26 Jun 2025 15:36:37 GMT)
tabular foundation modelに対するearly stopping。TabPFNで効果を確認している。

What Matters in LLM-generated Data: Diversity and Its Effect on Model Fine-Tuning [

What Matters in LLM-generated Data: Diversity and Its Effect on Model Fine-Tuning [22.4]
LLM生成データの多様性レベルが下流モデルの性能にどのように影響するかを示す。また、LLM生成データの異なる割合を混合したデータに基づいて訓練されたモデルの性能についても検討する。
論文参考訳（メタデータ） (Tue, 24 Jun 2025 02:44:58 GMT)
合成データが与える影響に関する報告。特に多様性の度合いに注目している。
「Our experimental results show that, with minimal distribution shift, moderately diverse LLM-generated data can enhance model performance in scenarios with insufficient labeled data, whereas highly diverse generated data has a negative impact.」とのこと。

Deep Research API, Gemini CLI, Mistral-Small-3.2-24B, Hunyuan-A13B, OpusLM

様々なニュースがあるが、先週の注目はDeepResearchAPIの登場（Introduction to deep research in the OpenAI API）、Gemini CLIのリリース（Gemini CLI : オープンソース AI エージェント | Google Cloud 公式ブログ）のように思う。LLMやLRMなど基盤モデルを提供するベンダーが応用領域にも進出してくるのは生成AI周りでは特徴的。より付加価値を得ていく動きとしては当然ではあるが、API利用で勝負しているベンダーやスタートアップにとってはつらい展開が続く。

Mistralからはmistralai/Mistral-Small-3.2-24B-Instruct-2506 · Hugging Faceが出ていた。また、Tencentからは80B, 13 ActiveなMoE・ReasoningハイブリッドモデルのHunyuan-A13Bが発表されている（GitHub – Tencent-Hunyuan/Hunyuan-A13B: Tencent Hunyuan A13B (short as Hunyuan-A13B), an innovative and open-source LLM built on a fine-grained MoE architecture.）。

別軸でOpenなSpeechLMも発表されている。オープンな動きにも注目したい。

OpusLM: A Family of Open Unified Speech Language Models [56.1]
OpusLMは、213K時間の音声テキストペアと292Bのテキスト専用トークンで継続的に事前トレーニングされている。本稿では,トークン化,マルチストリーム言語モデル,マルチステージトレーニング戦略に関するSpeechLMの設計について述べる。
論文参考訳（メタデータ） (Sat, 21 Jun 2025 06:30:59 GMT)
Open Unified Speech Language Models でOpusLMs
モデルはespnet/OpusLM_7B_Anneal · Hugging Face

The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas / Position: Intelligent Science Laboratory Requires the Integration of Cognitive and Embodied AI

The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas [90.3]
良いアイデアは単に斬新なものではなく、実行後により良い研究がもたらされるべきである。 AIが生み出すアイデアがより良い研究成果をもたらすかどうかをテストするために、我々は実行研究を行う。実行前後の同じアイデアのレビュースコアを比較すると、LLM生成のアイデアのスコアは専門家によるアイデアよりも大幅に減少する。
論文参考訳（メタデータ） (Wed, 25 Jun 2025 19:47:23 GMT)
LLMが出したアイデアと専門家のアイデアを「Our execution participants spend an average of 103 hours executing the assigned idea and then submit the codebase and paper to document their experiments. All projects are then reviewed blindly by our recruited expert reviewers」と評価したところ「Average scores of AI ideas drop significantly more than Human ideas in the execution study across all the evaluation metrics.」という指摘。
やはり人間の専門家は深く考えているようという興味深い結果。同時に、アイデアのみだとAIの評価が高いということはアイデアだしでは有効なのではないか？とか最終的なスコアでもそこそこ健闘しているのではないか？と見えなくもない。下記論文のようにAI科学者の実現可能性は高まっているように思う。
リポジトリはGitHub – NoviScl/AI-Researcher

Position: Intelligent Science Laboratory Requires the Integration of Cognitive and Embodied AI [98.2]
知的科学研究所(ISL)のパラダイムを提案する。 ISLは、認知と具体的知性を深く統合した多層クローズドループフレームワークである。このようなシステムは、現在の科学的発見の限界を克服するために不可欠である、と我々は主張する。
論文参考訳（メタデータ） (Tue, 24 Jun 2025 13:31:44 GMT)
「1) Foundation Models provide multi-modal scientific knowledge representation and closed-loop learning capabilities, supporting complex reasoning and domain adaptation; (2) Agent Layer dynamically orchestrates scientific workflows—including hypothesis generation, literature review, experimental planning, execution, and analysis—while integrating model/toolkit via MCP integration; (3) Embodied Layer realizes robust physical interaction through advanced perception, navigation, and manipulation modules, enabling precise, adaptive operations in real-world laboratory environments.」からなるAI科学者・AIラボフレームワークの提案。
現状と課題がとても参考になる。

Language Modeling by Language Models

Language Modeling by Language Models [28.8]
本稿では,従来の研究段階をシミュレートするマルチエージェント言語モデル(LM)を提案する。新しいデザインが提案され、反対にレビューされ、実装され、選択的に検証される。新たに発見された1,162個の設計に関する実験を報告する。
論文参考訳（メタデータ） (Wed, 25 Jun 2025 08:46:10 GMT)
「We introduce Genesys, an autonomous system for discovering novel LM designs, featuring a novel unit-based design agent and cost-effective distributed evolution. We also present LMADE, a resource environment to support further research in this field.」というAIによるAIの研究。
「Genesys produced highly competitive designs; some outperformed human baselines such as the GPT and Mamba2 models in common downstream tasks. These results show the feasibility and lay the groundwork for autonomous evolutionary systems in scientifically complex and costly domains.」と現時点でも一定の成果、実現可能性がありそうなのが興味深い。
プロジェクトサイトはGenesys、リポジトリはGitHub – allenai/genesys: Source code and utilities for the Genesys distributed language model architecture discovery system.

2025年8月
月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31