SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement

SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement [100.9]
ThinkLite-VLはQwen2.5-VL-7Bインストラクションの平均性能を7%向上させる。私たちのコード、データ、モデルはhttps://github.com/si0wang/ThinkLite-VL.orgで公開されています。
論文参考訳（メタデータ） (Thu, 10 Apr 2025 17:49:05 GMT)
効率のよいVision-Languageモデルの推論強化方法の提案。「Our model achieves SoTA performance using only 11k data, and without any additional knowledge distillation.」と使用データが少ない。カギはデータ品質とのこと「Our key insight highlights the critical importance of selecting genuinely challenging examples for Reinforcement Fine-Tuning (RFT).」
リポジトリはGitHub – si0wang/ThinkLite-VL

コメントを残す

コメントを残す コメントをキャンセル