{"id":5902,"date":"2024-12-16T06:07:00","date_gmt":"2024-12-15T21:07:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=5902"},"modified":"2024-12-16T06:07:00","modified_gmt":"2024-12-15T21:07:00","slug":"expanding-performance-boundaries-of-open-source-multimodal-models-with-model-data-and-test-time-scaling","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=5902","title":{"rendered":"Phi4, InternVL 2.5, EXAONE 3.5"},"content":{"rendered":"\n<p>Gemini 2.0\u3084OpenAI\u306e12\u65e5\u9593\u767a\u8868\u3067\u76db\u308a\u4e0a\u304c\u3063\u3066\u3044\u308b\u304c\u3001OSS\u3084\u516c\u958b\u30e2\u30c7\u30eb\u306b\u3064\u3044\u3066\u3082\u69d8\u3005\u306a\u30e2\u30c7\u30eb\u304c\u767a\u8868\u3055\u308c\u3066\u3044\u308b\u3002<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Phi-4 Technical Report\u00a0<\/strong>[72.1]<br>\u672c\u7814\u7a76\u3067\u306f,\u30c7\u30fc\u30bf\u54c1\u8cea\u306b\u91cd\u70b9\u3092\u7f6e\u3044\u305f14\u30d3\u30ea\u30aa\u30f3\u30d1\u30e9\u30e1\u30fc\u30bf\u8a00\u8a9e\u30e2\u30c7\u30eb phi-4 \u3092\u63d0\u6848\u3059\u308b\u3002 \u591a\u304f\u306e\u8a00\u8a9e\u30e2\u30c7\u30eb\u3068\u306f\u7570\u306a\u308a\u3001\u4e8b\u524d\u5b66\u7fd2\u306f\u4e3b\u306bWeb\u30b3\u30f3\u30c6\u30f3\u30c4\u3084\u30b3\u30fc\u30c9\u306a\u3069\u306e\u6709\u6a5f\u30c7\u30fc\u30bf\u30bd\u30fc\u30b9\u306b\u57fa\u3065\u3044\u3066\u304a\u308a\u3001phi-4\u306f\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u30d7\u30ed\u30bb\u30b9\u5168\u4f53\u3092\u901a\u3057\u3066\u6226\u7565\u7684\u306b\u5408\u6210\u30c7\u30fc\u30bf\u3092\u7d44\u307f\u8fbc\u3093\u3067\u3044\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2412.08905v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2412.08905v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Thu, 12 Dec 2024 03:37:41 GMT)<\/li>\n\n\n\n<li>\u5c0f\u578b\u3001\u9ad8\u6027\u80fd\u30e2\u30c7\u30ebPhi\u306e\u6700\u65b0\u30d0\u30fc\u30b8\u30e7\u30f3\u3001\u300cphi-4 strategically incorporates synthetic data throughout the training process.\u300d\u3068\u306e\u3053\u3068\u3067\u5408\u6210\u30c7\u30fc\u30bf\u3092\u3046\u307e\u304f\u6d3b\u7528\u3059\u308b\u30a2\u30d7\u30ed\u30fc\u30c1\u3002Phi3\u3092\u8d85\u3048\u3001GPT-4o mini\u306b\u8feb\u3063\u3066\u3044\u308b\u512a\u79c0\u306a\u30e2\u30c7\u30eb\u3002<\/li>\n\n\n\n<li>\u516c\u5f0fBlog\u3067\u3082\u767a\u8868\u304c\u3042\u308b\u3000<a href=\"https:\/\/techcommunity.microsoft.com\/blog\/aiplatformblog\/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple\/4357090\">Introducing Phi-4: Microsoft\u2019s Newest Small Language Model Specializing in Complex Reasoning | Microsoft Community Hub<\/a><\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>EXAONE 3.5: Series of Large Language Models for Real-world Use Cases\u00a0<\/strong>[35.0]<br>EXAONE 3.5\u8a00\u8a9e\u30e2\u30c7\u30eb\u306f32B\u30017.8B\u30012.4B\u306e3\u3064\u306e\u69cb\u6210\u3067\u63d0\u4f9b\u3055\u308c\u3066\u3044\u308b\u3002 \u5546\u7528\u5229\u7528\u306b\u3064\u3044\u3066\u306f\u3001LG AI Research\u306e\u516c\u5f0f\u30b3\u30f3\u30bf\u30af\u30c8\u30dd\u30a4\u30f3\u30c8\u3092\u53c2\u7167\u3057\u3066\u304f\u3060\u3055\u3044\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2412.04862v2\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2412.04862v2\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Mon, 09 Dec 2024 09:31:10 GMT)<\/li>\n\n\n\n<li>LG\u306b\u3088\u308b\u516c\u958b\u30e2\u30c7\u30eb\u3001\u540c\u30b5\u30a4\u30ba\u306eQwen2.5\u3068\u7af6\u5408\u3059\u308b\u6027\u80fd<\/li>\n\n\n\n<li>\u30ea\u30dd\u30b8\u30c8\u30ea\u306f<a href=\"https:\/\/huggingface.co\/LGAI-EXAONE\">LGAI-EXAONE (LG AI Research)<\/a><\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling&nbsp;<\/strong>[121.1]<br>InternVL 2.5\u306f\u3001InternVL 2.0\u4e0a\u306b\u69cb\u7bc9\u3055\u308c\u305f\u9ad8\u5ea6\u30de\u30eb\u30c1\u30e2\u30fc\u30c0\u30eb\u5927\u898f\u6a21\u8a00\u8a9e\u30e2\u30c7\u30eb(MLLM)\u30b7\u30ea\u30fc\u30ba\u3067\u3042\u308b\u3002 InternVL 2.5\u306f\u3001GPT-4o\u3084Claude-3.5-Sonnet\u3068\u3044\u3063\u305f\u4e3b\u8981\u306a\u5546\u7528\u30e2\u30c7\u30eb\u3068\u7af6\u5408\u3059\u308b\u7af6\u4e89\u529b\u3092\u6301\u3064\u3002 \u3053\u306e\u30e2\u30c7\u30eb\u304c\u3001\u30de\u30eb\u30c1\u30e2\u30fc\u30c0\u30ebAI\u30b7\u30b9\u30c6\u30e0\u306e\u958b\u767a\u3068\u9069\u7528\u306e\u305f\u3081\u306e\u65b0\u3057\u3044\u6a19\u6e96\u3092\u8a2d\u5b9a\u3059\u308b\u3053\u3068\u3067\u3001\u30aa\u30fc\u30d7\u30f3\u30bd\u30fc\u30b9\u30b3\u30df\u30e5\u30cb\u30c6\u30a3\u306b\u8ca2\u732e\u3067\u304d\u308b\u3053\u3068\u3092\u9858\u3063\u3066\u3044\u307e\u3059\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2412.05271v1\">\u8ad6\u6587<\/a>&nbsp;&nbsp;<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2412.05271v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>&nbsp; &nbsp;(Fri, 06 Dec 2024 18:57:08 GMT)<\/li>\n\n\n\n<li>OSS\u306eMLLM\u3001\u6027\u80fd\u306f\u5546\u7528\u30e2\u30c7\u30eb\u3068\u7af6\u5408\u7684\u3068\u306e\u3053\u3068\u3002\u300cwe integrate a newly incrementally pre-trained InternViT with various pre-trained LLMs, including InternLM 2.5 and Qwen 2.5, using a randomly initialized MLP projector.\u300d\u3068\u3044\u3046\u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\u3067ViT\u3092Projector\u3067LLM\u3068\u3064\u306a\u3050\u30a2\u30d7\u30ed\u30fc\u30c1<\/li>\n\n\n\n<li>\u30ea\u30dd\u30b8\u30c8\u30ea\u306f<a href=\"https:\/\/huggingface.co\/OpenGVLab\/InternVL2_5-78B\">OpenGVLab\/InternVL2_5-78B \u00b7 Hugging Face<\/a>\u3001<a href=\"https:\/\/github.com\/OpenGVLab\/InternVL\">GitHub &#8211; OpenGVLab\/InternVL: [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. \u63a5\u8fd1GPT-4o\u8868\u73b0\u7684\u5f00\u6e90\u591a\u6a21\u6001\u5bf9\u8bdd\u6a21\u578b<\/a><\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions\u00a0<\/strong>[104.9]<br>\u672c\u7814\u7a76\u306f,\u30b9\u30c8\u30ea\u30fc\u30df\u30f3\u30b0\u6620\u50cf\u3068\u30aa\u30fc\u30c7\u30a3\u30aa\u5165\u529b\u3068\u306e\u30ea\u30a2\u30eb\u30bf\u30a4\u30e0\u30a4\u30f3\u30bf\u30e9\u30af\u30b7\u30e7\u30f3\u3092\u5b9f\u73fe\u3059\u308b\u305f\u3081\u306b,\u975e\u7d61\u307f\u5408\u3044\u306e\u30b9\u30c8\u30ea\u30fc\u30df\u30f3\u30b0\u77e5\u899a,\u63a8\u8ad6,\u30e1\u30e2\u30ea\u6a5f\u69cb\u3092\u5c0e\u5165\u3057\u3066\u3044\u308b\u3002 \u3053\u306e\u30d7\u30ed\u30b8\u30a7\u30af\u30c8\u306f\u4eba\u9593\u306e\u3088\u3046\u306a\u8a8d\u77e5\u3092\u30b7\u30df\u30e5\u30ec\u30fc\u30c8\u3057\u3001\u591a\u30e2\u30fc\u30c0\u30eb\u306a\u5927\u898f\u6a21\u8a00\u8a9e\u30e2\u30c7\u30eb\u304c\u6642\u9593\u3068\u3068\u3082\u306b\u7d99\u7d9a\u7684\u304b\u3064\u9069\u5fdc\u7684\u306a\u30b5\u30fc\u30d3\u30b9\u3092\u63d0\u4f9b\u3067\u304d\u308b\u3088\u3046\u306b\u3059\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2412.09596v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2412.09596v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Thu, 12 Dec 2024 18:58:30 GMT)<\/li>\n\n\n\n<li>\u30ea\u30a2\u30eb\u30bf\u30a4\u30e0\u30b9\u30c8\u30ea\u30fc\u30df\u30f3\u30b0\u3060\u3051\u3067\u306a\u304f\u30e1\u30e2\u30ea\u6a5f\u80fd\u306a\u3069\u3082\u5099\u3048\u308b\u30d5\u30ec\u30fc\u30e0\u30ef\u30fc\u30af<\/li>\n\n\n\n<li>\u30ea\u30dd\u30b8\u30c8\u30ea\u306f<a href=\"https:\/\/github.com\/InternLM\/InternLM-XComposer\">GitHub &#8211; InternLM\/InternLM-XComposer: InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions<\/a><\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Owl-1: Omni World Model for Consistent Long Video Generation\u00a0<\/strong>[75.5]<br>Omni World ModeL (Owl-1) \u3092\u63d0\u6848\u3059\u308b\u3002 Owl-1 \u306f VBench-I2V \u3068 VBench-Long \u306e SOTA \u30e1\u30bd\u30c3\u30c9\u3068\u540c\u7b49\u306e\u6027\u80fd\u3092\u5b9f\u73fe\u3057\u3066\u3044\u308b\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2412.09600v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2412.09600v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Thu, 12 Dec 2024 18:59:01 GMT)<\/li>\n\n\n\n<li>\u52d5\u753b\u751f\u6210\u30e2\u30c7\u30eb\u3001\u30ea\u30dd\u30b8\u30c8\u30ea\u306f<a href=\"https:\/\/github.com\/huang-yh\/Owl\">GitHub &#8211; huang-yh\/Owl<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Gemini 2.0\u3084OpenAI\u306e12\u65e5\u9593\u767a\u8868\u3067\u76db\u308a\u4e0a\u304c\u3063\u3066\u3044\u308b\u304c\u3001OSS\u3084\u516c\u958b\u30e2\u30c7\u30eb\u306b\u3064\u3044\u3066\u3082\u69d8\u3005\u306a\u30e2\u30c7\u30eb\u304c\u767a\u8868\u3055\u308c\u3066\u3044\u308b\u3002<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[251,293,365,390],"class_list":["post-5902","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-mllm","tag-oss","tag-slm","tag-synthetic-data"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/5902","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5902"}],"version-history":[{"count":0,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/5902\/revisions"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5902"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5902"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5902"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}