2023年12月27日 – arXiv最新論文の紹介

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models [91.2]
本稿では,命令ベースの画像編集の新しいアプローチであるSmartEditを紹介する。 MLLM(Multimodal Large Language Models)を利用して、その理解と推論能力を強化する。我々は,より複雑な命令に対して,SmartEditの編集機能を効果的に刺激することのできる,少量の複雑な命令編集データを示す。
論文参考訳（メタデータ） (Mon, 11 Dec 2023 17:54:11 GMT)
テキストの命令による画像編集。対象を理解したうえで編集し、画像も綺麗で違和感が少ない。
プロジェクトサイトはSmartEdit (yuzhou914.github.io)、リポジトリはGitHub – TencentARC/SmartEdit、デモは準備中のよう

How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation [90.9]
GPT-4Vは最も先進的な多モード基盤モデルとして機能する。本研究は, GPT-4Vの動的環境における適応性と一般化能力について, 厳密に評価する。
論文参考訳（メタデータ） (Wed, 13 Dec 2023 13:00:57 GMT)
GPT-4Vの環境変化に対する能力を検証した論文、CLIPやLLaVAとも比較。「Our findings reveal that while GPT-4V demonstrates notable adaptability and zero-shot generalization capabilities, its performance varies significantly across different scenarios of distribution shifts.」「our journey toward creating truly robust and versatile AI foundation models is ongoing」との結論。
リポジトリはGitHub – jameszhou-gl/gpt-4v-distribution-shift: Code for “How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation”