2025年10月 – ページ 8 – arXiv最新論文の紹介

Prompts as Software Engineering Artifacts: A Research Agenda and Preliminary Findings

Prompts as Software Engineering Artifacts: A Research Agenda and Preliminary Findings [39.4]
この研究プログラムは、ソフトウェア工学における現在の急進的な実践、課題、および影響要因を特徴づける。我々は6カ国から74人のソフトウェア専門家を対象に,現在の迅速な実践と課題について調査を行った。プロンプトは、試行錯誤によって洗練され、滅多に再利用されず、標準化されたプラクティスよりも個々の実践者が形作ることが多い。
論文参考訳（メタデータ） (Mon, 22 Sep 2025 09:08:29 GMT)
ソフトウェア工学の観点から見たプロンプトの整理、「The findings reveal that prompt usage in SE is largely ad-hoc: prompts are often refined through trial-and-error, rarely reused, and shaped more by individual heuristics than standardized practices.」は直観とも整合的。だが問題は大有り。
データ等はPrompts as Software Engineering Artifacts: A Research Agenda and Preliminary Findingsに存在。

CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset

CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset [99.1]
CS-FLEURSは4つのテストセットから構成されており、52言語にまたがる113の独特な言語ペアをカバーしている。 CS-FLEURSはまた、16のX字対にわたる128時間の生成テキスト音声データのトレーニングセットも提供している。
論文参考訳（メタデータ） (Wed, 17 Sep 2025 16:45:22 GMT)
リポジトリはbyan/cs-fleurs · Datasets at Hugging Face

SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines

SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines [112.8]
我々は、自然言語と異質な科学的表現を整合させる科学的推論基盤モデルを提案する。このモデルは、科学的なテキスト、純粋なシーケンス、シーケンスとテキストのペアにまたがる206Bのコーパスで事前訓練され、4000万の命令でSFTを介してアライメントされる。 i) テキストと科学形式間の忠実な翻訳、(ii) テキスト/知識抽出、(iii) プロパティの予測、(iv) プロパティの分類、(v) 条件なしおよび条件付きシーケンスの生成と設計。
論文参考訳（メタデータ） (Thu, 25 Sep 2025 17:52:06 GMT)
「By mapping natural language, DNA/RNA/protein sequences, molecular strings, and materials representations into a shared backbone via task-aware tokenization and consistent input–output schemas, the model moves beyond narrow, discipline-specific solutions and limited task menus.」、と自然言語なLLMと科学的記述を統合する取り組み。「The model is pretrained on a 206B-token corpus spanning scientific text, pure sequences, and sequence–text pairs, then aligned via SFT on 40M instructions, annealed cold-start bootstrapping to elicit long-form chain-of-thought, and reinforcement learning with task-specific reward shaping, which instills deliberate scientific reasoning.」と正面突破なアプローチ。
リポジトリはGitHub – open-sciencelab/SciReason、SciReason (SciReason)

MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents

MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents [15.0]
本稿ではGUIショートカットハイブリッドエージェントの評価の先駆けとなるベンチマークであるMAS-Benchを紹介する。 11の現実世界アプリケーションに139の複雑なタスク、88のショートカットの知識ベース、RPAスクリプト、そして7つの評価メトリクスがある。実験の結果、ハイブリッドエージェントはGUIのみのエージェントよりも成功率と効率が著しく高いことがわかった。
論文参考訳（メタデータ） (Mon, 08 Sep 2025 09:43:48 GMT)
GUI操作をショートカットする（画面を操作せずにAPIコールするなど）ことも含めたベンチマークの提案。
プロジェクトサイトはMAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents

月	火	水	木	金	土	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31