{"id":7655,"date":"2025-10-31T05:34:00","date_gmt":"2025-10-30T20:34:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=7655"},"modified":"2025-10-25T21:39:29","modified_gmt":"2025-10-25T12:39:29","slug":"vagen-reinforcing-world-model-reasoning-for-multi-turn-vlm-agents","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=7655","title":{"rendered":"VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents\u00a0<\/strong>[130.7]<br>\u8a00\u8a9e\u30e2\u30c7\u30eb(LLM)\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u3068\u6bd4\u8f03\u3057\u3066\u3001\u8996\u899a\u8a00\u8a9e\u30e2\u30c7\u30eb(VLM)\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u3092\u8a13\u7df4\u3059\u308b\u969b\u306e\u91cd\u8981\u306a\u8ab2\u984c\u306f\u3001\u30c6\u30ad\u30b9\u30c8\u72b6\u614b\u304b\u3089\u8907\u96d1\u306a\u8996\u899a\u89b3\u5bdf\u306b\u79fb\u884c\u3059\u308b\u3053\u3068\u3067\u3042\u308b\u3002 VLM\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u306f\u3001\u660e\u793a\u7684\u306a\u8996\u899a\u72b6\u614b\u63a8\u8ad6\u306b\u3088\u3063\u3066\u5185\u90e8\u4e16\u754c\u30e2\u30c7\u30eb\u3092\u69cb\u7bc9\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u308b\u304b? \u6211\u3005\u306f\u3001\u5f37\u5316\u5b66\u7fd2(RL)\u3092\u901a\u3057\u3066\u3001\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u306e\u63a8\u8ad6\u30d7\u30ed\u30bb\u30b9\u3092\u5efa\u7bc9\u7684\u306b\u5b9f\u65bd\u3057\u3001\u5831\u5968\u3059\u308b\u3002 \u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u306e\u72b6\u614b\u63a8\u5b9a\u3068\u9077\u79fb\u30e2\u30c7\u30ea\u30f3\u30b0\u3078\u306e\u63a8\u8ad6\u304c\u6210\u529f\u306b\u4e0d\u53ef\u6b20\u3067\u3042\u308b\u3053\u3068\u304c\u5206\u304b\u308a\u307e\u3057\u305f\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2510.16907v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2510.16907v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Sun, 19 Oct 2025 16:05:07 GMT)<\/li>\n\n\n\n<li>\u300cHow can we effectively teach VLMs to build internal world models through explicit visual state reasoning?\u300d\u3001\u300cVision-language Model (VLM) agentic tasks are inherently complex due to the challenges in understanding visual states, which often are partial and noisy Observations, fundamentally reframing the problem from an Markov Decision Process (MDP) to a more challenging Partially Observable Markov Decision Process (POMDP).\u300d\u3068\u3044\u3046\u30e2\u30c1\u30d9\u30fc\u30b7\u30e7\u30f3\u304b\u3089World Model\u306e\u69cb\u7bc9\u3092\u63a8\u9032\u3059\u308b\u305f\u3081\u306e\u30d5\u30ec\u30fc\u30e0\u30ef\u30fc\u30af\u3092\u63d0\u6848\u3002\u300cTo optimize an agent\u2019s world model reasoning, we propose turn-level WorldModeling Reward for a dense turn-level reward to evaluate the accuracy of the agent\u2019s internal state simulation against ground-truth; to solve the critical challenge of long-horizon credit assignment, we propose Bi-Level GAE to first computes the value of an entire turn\u2019s reasoning before propagating that credit precisely to the individual tokens. Our VAGEN framework significantly enhances task performance and visual reasoning quality for VLM in agentic tasks.\u300d<\/li>\n\n\n\n<li>\u30d7\u30ed\u30b8\u30a7\u30af\u30c8\u30b5\u30a4\u30c8\u306f<a href=\"https:\/\/vagen-ai.github.io\/\">VAGEN &#8211; VLM Agent Training<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[251,524],"class_list":["post-7655","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-mllm","tag-524"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7655","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7655"}],"version-history":[{"count":1,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7655\/revisions"}],"predecessor-version":[{"id":7656,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7655\/revisions\/7656"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7655"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7655"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7655"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}