{"id":7382,"date":"2025-09-09T03:33:00","date_gmt":"2025-09-08T18:33:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=7382"},"modified":"2025-09-06T20:36:48","modified_gmt":"2025-09-06T11:36:48","slug":"strefer-empowering-video-llms-with-space-time-referring-and-reasoning-via-synthetic-instruction-data","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=7382","title":{"rendered":"Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data\u00a0<\/strong>[100.5]<br>Strefer\u306f\u30d3\u30c7\u30aa\u5927\u30e2\u30c7\u30eb\u306b\u53c2\u7167\u3068\u63a8\u8ad6\u6a5f\u80fd\u3092\u6301\u305f\u305b\u308b\u305f\u3081\u306b\u8a2d\u8a08\u3055\u308c\u305f\u5408\u6210\u30c7\u30fc\u30bf\u751f\u6210\u30d5\u30ec\u30fc\u30e0\u30ef\u30fc\u30af\u3067\u3042\u308b\u3002 Strefer\u306f\u3001\u6642\u9593\u7684\u306b\u5bc6\u5ea6\u304c\u9ad8\u304f\u304d\u3081\u7d30\u304b\u306a\u30d3\u30c7\u30aa\u30e1\u30bf\u30c7\u30fc\u30bf\u3092\u64ec\u4f3c\u30a2\u30ce\u30c6\u30fc\u30b7\u30e7\u30f3\u3059\u308b\u30c7\u30fc\u30bf\u30a8\u30f3\u30b8\u30f3\u3092\u4f7f\u7528\u3057\u3066\u3001\u591a\u69d8\u306a\u547d\u4ee4\u751f\u6210\u30c7\u30fc\u30bf\u3092\u751f\u6210\u3059\u308b\u3002 \u6211\u3005\u306e\u30a2\u30d7\u30ed\u30fc\u30c1\u306f\u3001\u30d3\u30c7\u30aaLLM\u304c\u7a7a\u9593\u7684\u304a\u3088\u3073\u6642\u9593\u7684\u53c2\u7167\u3092\u89e3\u91c8\u3059\u308b\u80fd\u529b\u3092\u9ad8\u3081\u3001\u73fe\u5b9f\u306eAI\u30b3\u30f3\u30d1\u30cb\u30aa\u30f3\u306b\u4e0d\u53ef\u6b20\u306a\u3001\u3088\u308a\u6c4e\u7528\u7684\u3067\u6642\u7a7a\u9593\u5bfe\u5fdc\u306e\u63a8\u8ad6\u3092\u80b2\u3080\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2509.03501v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2509.03501v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Wed, 03 Sep 2025 17:33:20 GMT)<\/li>\n\n\n\n<li>\u300cOur approach begins with a modular framework that orchestrates multiple agents\u2014including pretrained Large Language Models (LLMs), Video LLMs, and Pixel-Level Multimodal Vision Foundation Models (e g , RexSeek [20], GroundingDINO [32] and SAM2 [44])\u2014to pseudo-annotate video metadata with temporally dense and object-centric space-time information. This metadata captures detailed spatial and temporal structures, such as subjects, objects, their locations as masklets (segmentation masks tracked over time), and action timelines. Building on this structured metadata, we leverage in-context learning and well-defined task schemas to guide LLMs in generating high-utility instruction data for tuning Video LLMs.\u300d\u3068\u51dd\u3063\u305f\u69cb\u6210\u306b\u3088\u308b\u52d5\u753b\u306b\u5bfe\u3059\u308b\u5408\u6210\u30c7\u30fc\u30bf\u30d5\u30ec\u30fc\u30e0\u30ef\u30fc\u30af\u306e\u63d0\u6848\u3002<\/li>\n\n\n\n<li>\u30d7\u30ed\u30b8\u30a7\u30af\u30c8\u30b5\u30a4\u30c8\u306f<a href=\"https:\/\/strefer.github.io\/\">Strefer: Data Engine for Video LLMs<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[390],"class_list":["post-7382","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-synthetic-data"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7382","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7382"}],"version-history":[{"count":1,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7382\/revisions"}],"predecessor-version":[{"id":7383,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/7382\/revisions\/7383"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7382"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7382"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7382"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}