{"id":7688,"date":"2025-11-07T05:38:00","date_gmt":"2025-11-06T20:38:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=7688"},"modified":"2025-11-02T10:43:12","modified_gmt":"2025-11-02T01:43:12","slug":"roboomni-proactive-robot-manipulation-in-omni-modal-context","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=7688","title":{"rendered":"RoboOmni: Proactive Robot Manipulation in Omni-modal Context\u00a0"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>RoboOmni: Proactive Robot Manipulation in Omni-modal Context\u00a0<\/strong>[165.1]<br>\u6211\u3005\u306f,\u97f3\u58f0\u5bfe\u8a71\u3084\u74b0\u5883\u97f3,\u8996\u899a\u7684\u624b\u304c\u304b\u308a\u304b\u3089\u610f\u56f3\u3092\u5c0e\u51fa\u3059\u308b,\u30af\u30ed\u30b9\u30e2\u30fc\u30c0\u30eb\u306a\u6587\u8108\u6307\u793a\u3092\u5c0e\u5165\u3059\u308b\u3002 \u76ee\u7684\u8a8d\u8b58,\u30a4\u30f3\u30bf\u30e9\u30af\u30b7\u30e7\u30f3\u78ba\u8a8d,\u30a2\u30af\u30b7\u30e7\u30f3\u5b9f\u884c\u3092\u7d71\u4e00\u3059\u308b,\u30a8\u30f3\u30c9\u30c4\u30fc\u30a8\u30f3\u30c9\u306eOmni-Modal LLM\u306b\u57fa\u3065\u304f\u30d5\u30ec\u30fc\u30e0\u30ef\u30fc\u30af\u3067\u3042\u308bRoboOmni\u3092\u63d0\u6848\u3059\u308b\u3002 \u30b7\u30df\u30e5\u30ec\u30fc\u30b7\u30e7\u30f3\u3068\u5b9f\u4e16\u754c\u306e\u8a2d\u5b9a\u306e\u5b9f\u9a13\u3067\u306f\u3001Robo Omni\u306f\u30c6\u30ad\u30b9\u30c8\u30d9\u30fc\u30b9\u3068ASR\u30d9\u30fc\u30b9\u306e\u30d9\u30fc\u30b9\u30e9\u30a4\u30f3\u3092\u8d8a\u3048\u3001\u6210\u529f\u7387\u3001\u63a8\u8ad6\u901f\u5ea6\u3001\u610f\u56f3\u8a8d\u8b58\u3001\u7a4d\u6975\u7684\u306b\u652f\u63f4\u3057\u3066\u3044\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2510.23763v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2510.23763v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Mon, 27 Oct 2025 18:49:03 GMT)<\/li>\n\n\n\n<li>\u300cThere arises a key research question: Can a robot integrate cross-modal context, including speech, environmental audio, and visual observations, to proactively infer and verify user intent?\u300d\u3068\u3044\u3046\u7591\u554f\u306b\u5bfe\u3057\u3066\u306e\u30de\u30eb\u30c1\u30e2\u30fc\u30c0\u30eb\u30e2\u30c7\u30eb\u300cwe propose RoboOmni, an end-to-end omni-modal framework for manipulation that closes the loop of intent recognition, interaction confirmation, and action execution. Unlike prior approaches, RoboOmni supports direct speech interaction without ASR, infers latent commands by fusing human speech, environmental audio, and vision through spatiotemporal modeling, and verifies intent via interaction.\u300d<\/li>\n\n\n\n<li>\u30d7\u30ed\u30b8\u30a7\u30af\u30c8\u30b5\u30a4\u30c8\u306f<a href=\"https:\/\/openmoss.github.io\/RoboOmni\/\">RoboOmni: Proactive Robot Manipulation in Omni-modal Context<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[8,342],"class_list":["post-7688","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-action","tag-robot"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7688","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7688"}],"version-history":[{"count":1,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7688\/revisions"}],"predecessor-version":[{"id":7689,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7688\/revisions\/7689"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7688"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7688"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7688"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}