ソフトウェアエンジニアリング – arXiv最新論文の紹介

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence [150.4]
大規模言語モデル(LLM)は、自然言語記述を直接関数コードに変換することによって、自動ソフトウェア開発を変革した。コードLLMに関する総合的な合成と実践的ガイド(一連の解析および探索実験)を提供する。一般LLM(GPT-4, Claude, LLaMA)とコード特殊化LLM(StarCoder, Code LLaMA, DeepSeek-Coder, QwenCoder)のコード機能の解析を行う。
論文参考訳（メタデータ） (Tue, 02 Dec 2025 17:14:33 GMT)
ソフトウェア開発におけるAI活用に関する包括的なサーベイ。
1ページ目の図が攻めている一方で納得感もある。

An Empirical Study of Agent Developer Practices in AI Agent Frameworks

An Empirical Study of Agent Developer Practices in AI Agent Frameworks [59.9]
大規模言語モデル(LLM)の台頭はエージェントへの関心の高まりを引き起こし、エージェントフレームワークの急速な成長につながった。エージェントフレームワークが広く使われているにもかかわらず、それらの実践的応用とエージェント開発プロセスにどのように影響するかは未解明のままである。開発者の80%以上が、特定の開発要件に最も適合するフレームワークを特定するのに苦労していると報告している。
論文参考訳（メタデータ） (Mon, 01 Dec 2025 17:52:15 GMT)
エージェントフレームワークのサーベイ。
「Specifically, we find that (i) Langchain and CrewAI lower the technical threshold for beginners. (ii) AutoGen and LangChain excel at rapid prototyping. (iii) In terms of functional encapsulation, AutoGen and LangChain are leading in task decomposition and multi-agent collaboration. (iv) Performance optimization is a common shortcoming across all frameworks. (v) Despite their mature ecosystems, AutoGen and LangChain face the highest maintenance complexity.」とのこと。
メンテナンスについては「6.2.5 Maintainability.」でほとんどのフレームワークが酷評されている・・・

A Systematic Literature Review of Code Hallucinations in LLMs: Characterization, Mitigation Methods, Challenges, and Future Directions for Reliable AI

A Systematic Literature Review of Code Hallucinations in LLMs: Characterization, Mitigation Methods, Challenges, and Future Directions for Reliable AI [54.3]
大規模言語モデルがソフトウェアエンジニアリングタスクに統合されるにつれ、コードの幻覚の理解と緩和が不可欠になる。コード指向LLMにおける幻覚現象を4つの重要な観点から体系的に検討する。
論文参考訳（メタデータ） (Sun, 02 Nov 2025 02:58:41 GMT)
「(1) NLP surveys that summarize hallucination research in natural language generation, and (2) software engineering papers that directly investigate hallucinations in code.」を中心としたサーベイ。

The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Production Agents

The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Production Agents [46.3]
本稿では,ソフトウェア開発エージェントを実装するツールキットであるOpenHands Software Agent SDKを紹介する。柔軟性を達成するために、デフォルトケースで数行のコードしか必要としないエージェントを実装するためのシンプルなインターフェースを設計する。セキュリティと信頼性のために、シームレスなローカル-リモート実行ポータビリティ、REST/WebSocketサービスの統合を提供する。
論文参考訳（メタデータ） (Wed, 05 Nov 2025 18:16:44 GMT)
OpenHandsの論文。「Unlike prior library-only SDKs (Anthropic, 2025a; OpenAI, 2024), OpenHands includes a built-in REST/WebSocket server for remote execution and a suite of interactive workspace interfaces—a browser-based VSCode IDE, VNC desktop, and persistent Chromium browser—for human inspection and control.」と統合された環境としても優秀。
リポジトリはGitHub – OpenHands/software-agent-sdk: A clean, modular SDK for building AI agents with OpenHands V1.

A Survey of Vibe Coding with Large Language Models

A Survey of Vibe Coding with Large Language Models [93.9]
視覚符号化(Vibe Coding)は、開発者が成果観察を通じてAI生成の実装を検証する開発手法である。変革の可能性にもかかわらず、この創発的パラダイムの有効性は未解明のままである。この調査は、大規模な言語モデルによるVibe Codingの総合的かつ体系的なレビューを初めて提供する。
論文参考訳（メタデータ） (Tue, 14 Oct 2025 11:26:56 GMT)
「a novel development methodology termed “Vibe Coding” where developers validate AI-generated implementations through outcome observation rather than line-by- line code comprehension.」とVibe codingのサーベイ。。。
リポジトリはGitHub – YuyaoGe/Awesome-Vibe-Coding

通常の（？）ソフトウェアエンジニアリングのサーベイも出ていた。

A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System [54.9]
本調査は, LLMを利用したソフトウェア工学の総合的解析を初めて行ったものである。我々は150以上の最近の論文を分析し、2つの主要な次元にまたがる包括的分類に分類する。我々の分析は、この分野が単純なプロンプトエンジニアリングから複雑なエージェントシステムへとどのように進化してきたかを明らかにする。
論文参考訳（メタデータ） (Fri, 10 Oct 2025 06:56:50 GMT)
software engineering + LLM based agentsのサーベイ

Prompts as Software Engineering Artifacts: A Research Agenda and Preliminary Findings

Prompts as Software Engineering Artifacts: A Research Agenda and Preliminary Findings [39.4]
この研究プログラムは、ソフトウェア工学における現在の急進的な実践、課題、および影響要因を特徴づける。我々は6カ国から74人のソフトウェア専門家を対象に,現在の迅速な実践と課題について調査を行った。プロンプトは、試行錯誤によって洗練され、滅多に再利用されず、標準化されたプラクティスよりも個々の実践者が形作ることが多い。
論文参考訳（メタデータ） (Mon, 22 Sep 2025 09:08:29 GMT)
ソフトウェア工学の観点から見たプロンプトの整理、「The findings reveal that prompt usage in SE is largely ad-hoc: prompts are often refined through trial-and-error, rarely reused, and shaped more by individual heuristics than standardized practices.」は直観とも整合的。だが問題は大有り。
データ等はPrompts as Software Engineering Artifacts: A Research Agenda and Preliminary Findingsに存在。

Rethinking Testing for LLM Applications: Characteristics, Challenges, and a Lightweight Interaction Protocol

Rethinking Testing for LLM Applications: Characteristics, Challenges, and a Lightweight Interaction Protocol [83.8]
大言語モデル(LLM)は、単純なテキストジェネレータから、検索強化、ツール呼び出し、マルチターンインタラクションを統合する複雑なソフトウェアシステムへと進化してきた。その固有の非決定主義、ダイナミズム、文脈依存は品質保証に根本的な課題をもたらす。本稿では,LLMアプリケーションを3層アーキテクチャに分解する:システムシェル層、プロンプトオーケストレーション層、およびLLM推論コア
論文参考訳（メタデータ） (Thu, 28 Aug 2025 13:00:28 GMT)
LLMを用いたソフトウェアに対するテストのサーベイ
conclusionに「A key insight is that LLM application testing is neither a mere extension of traditional software testing nor a straightforward application of AI-security techniques.」とある通り、LLM活用のソフトウェアは動的・確率的な動作にならざるを得ないためテスト手法はかなり変わるよう。

SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? [32.7]
SWE-Perfは、認証されたリポジトリコンテキスト内のコードパフォーマンス最適化タスクにおいて、LLM(Large Language Models)を評価するために設計された最初のベンチマークである。 SWE-Perfは140の慎重にキュレートされたインスタンスで構成されており、それぞれが人気のあるGitHubリポジトリのパフォーマンス改善プルリクエストに由来する。
論文参考訳（メタデータ） (Wed, 16 Jul 2025 17:05:17 GMT)
パフォーマンス最適化能力を測るベンチマークの提案。Claude-4-sonnet > Gemini-2.5-pro > OpenAI-o3ではあるものの全体的に厳しい結果。
プロジェクトサイトはSWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

Training Software Engineering Agents and Verifiers with SWE-Gym

Training Software Engineering Agents and Verifiers with SWE-Gym [89.6]
SWE-Gymは、現実世界のソフトウェアエンジニアリング(SWE)エージェントをトレーニングするための最初の環境である。 SWE-Gymには2,438の現実世界のPythonタスクインスタンスが含まれている。
論文参考訳（メタデータ） (Mon, 30 Dec 2024 18:15:39 GMT)
ソフトウェアエンジニアリング用エージェント開発のための環境の提案、および、高性能なエージェントの開発。o3で圧倒的な結果を見た後ではあるが、「Through extensive experiments, we demonstrate that SWE-Gym enables both agent and verifier models to achieve significant improvements in resolving complex software tasks. Our findings highlight the scalability of these approaches, revealing potential for continuous performance gains with increased compute.」とエージェント的動作の有効性は高い。
リポジトリはGitHub – SWE-Gym/SWE-Gym

Agents in Software Engineering: Survey, Landscape, and Vision

Agents in Software Engineering: Survey, Landscape, and Vision [46.0]
大規模言語モデル(LLM)は目覚ましい成功を収め、下流の様々なタスクで広く使われてきた。 LLMとソフトウェア工学(SE)を組み合わせた多くの研究では、明示的にも暗黙的にもエージェントの概念が採用されている。本稿では,知覚,記憶,行動の3つの重要なモジュールを含む,SE における LLM ベースのエージェントのフレームワークを提案する。
論文参考訳（メタデータ） (Fri, 13 Sep 2024 17:55:58 GMT)
Large Language Model-Based Agents for Software Engineering: A Survey – arXiv最新論文の紹介 (devneko.jp)とは別のチームによるソフトウェアエンジニアリングにおけるエージェント活用のサーベイ。エージェント側の技術に注目したものになっている。
リポジトリはGitHub – DeepSoftwareAnalytics/Awesome-Agent4SE

2026年3月
月	火	水	木	金	土	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31