{"id":5870,"date":"2024-12-10T04:59:00","date_gmt":"2024-12-09T19:59:00","guid":{"rendered":"https:\/\/devneko.jp\/wordpress\/?p=5870"},"modified":"2024-12-10T04:59:00","modified_gmt":"2024-12-09T19:59:00","slug":"liquid-language-models-are-scalable-multi-modal-generators","status":"publish","type":"post","link":"https:\/\/devneko.jp\/wordpress\/?p=5870","title":{"rendered":"Liquid: Language Models are Scalable Multi-modal Generators"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><strong>Liquid: Language Models are Scalable Multi-modal Generators\u00a0<\/strong>[112.7]<br>Liquid\u306f\u8996\u899a\u7684\u7406\u89e3\u3068\u751f\u6210\u3092\u30b7\u30fc\u30e0\u30ec\u30b9\u306b\u7d71\u5408\u3059\u308b\u81ea\u52d5\u56de\u5e30\u751f\u6210\u30d1\u30e9\u30c0\u30a4\u30e0\u3067\u3042\u308b\u3002 \u5f93\u6765\u306e\u30de\u30eb\u30c1\u30e2\u30fc\u30c0\u30eb\u306a\u5927\u8a00\u8a9e\u30e2\u30c7\u30eb(MLLM)\u3068\u306f\u7570\u306a\u308a\u3001Liquid\u306f\u5358\u4e00\u306e\u5927\u8a00\u8a9e\u30e2\u30c7\u30eb\u3092\u7528\u3044\u3066\u3053\u306e\u7d71\u5408\u3092\u5b9f\u73fe\u3059\u308b\u3002 \u521d\u3081\u3066Liquid\u306f\u3001\u30d3\u30b8\u30e5\u30a2\u30eb\u30bf\u30b9\u30af\u3068\u8a00\u8a9e\u30bf\u30b9\u30af\u306e\u7d71\u4e00\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u306b\u3088\u3063\u3066\u5fc5\u7136\u7684\u306b\u30d1\u30d5\u30a9\u30fc\u30de\u30f3\u30b9\u304c\u4f4e\u4e0b\u3059\u308b\u3001\u30b9\u30b1\u30fc\u30ea\u30f3\u30b0\u306e\u6cd5\u5247\u3092\u660e\u3089\u304b\u306b\u3057\u305f\u3002<br><a href=\"http:\/\/arxiv.org\/abs\/2412.04332v1\">\u8ad6\u6587<\/a>\u00a0\u00a0<a href=\"https:\/\/fugumt.com\/fugumt\/paper_check\/2412.04332v1\">\u53c2\u8003\u8a33\uff08\u30e1\u30bf\u30c7\u30fc\u30bf\uff09<\/a>\u00a0 \u00a0(Thu, 05 Dec 2024 16:48:16 GMT)<\/li>\n\n\n\n<li>\u65e2\u5b58\u306eLLM\u306b\u5bfe\u3057\u3066\u300cThe only modification is the addition of 8192 new learnable embeddings for discrete image tokens. Correspondingly, we extend the original LM head by 8192 dimensions to enable the model to predict both text and image tokens within the same embedding space.\u300d\u3068\u3044\u3046\u5909\u66f4\u3092\u52a0\u3048\u753b\u50cf\u3092\u6271\u3046\u3068\u3044\u3046\u7814\u7a76<\/li>\n\n\n\n<li>\u300cFor image generation, Liquid outperforms other auto-regressive based models, as well as some diffusion models like SD-XL and achieve FID of 5.47 on MJHQ-30K, demonstrating that LLMs can acquire excellent imagery capabilities efficiently with a limited amount of data.\u300d\u3068\u3044\u3046\u7d50\u679c\u306b\u9a5a\u304d\u3060\u304c\u3001\u3055\u3089\u306b\u306f\u300cFor visual understanding, Liquid surpasses Chameleon and achieved results comparable to those of well-established MLLMs. In text-only tasks, Liquid achieves comparable performance with Chameleon, which used mix pre-training on a very large scale, and surpasses the performance of LLAMA2, demonstrating undegraded linguistic capabilities.\u300d\u3068\u306e\u3053\u3068\u3002<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[251],"class_list":["post-5870","post","type-post","status-publish","format-standard","hentry","category-arxiv","tag-mllm"],"_links":{"self":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/5870","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5870"}],"version-history":[{"count":0,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/5870\/revisions"}],"wp:attachment":[{"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5870"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5870"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devneko.jp\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5870"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}