TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration

TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration [33.9]
視覚言語基礎モデル(CLIPなど)は、大規模な画像テキスト事前学習により、転送学習におけるその能力を示している。本稿では,分離されたエージェントの知識を統一的に伝達する,汎用的で簡潔なTransAgentフレームワークを提案する。われわれのTransAgentは、11の視覚的認識データセット上で最先端のパフォーマンスを達成する。
論文参考訳（メタデータ） (Wed, 16 Oct 2024 03:01:44 GMT)
エージェンティックなモデルの統合、「By adaptively integrating the external knowledge of agents from different modalities via MoA gating mechanism, TransAgent achieves state-of-the-art performance on 11 datasets under the low-shot scenarios.」とのこと。
リポジトリはGitHub – markywg/transagent: [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration

コメントを残す

コメントを残す コメントをキャンセル