2024年9月19日 – arXiv最新論文の紹介

MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct [148.4]
我々は、画像テキストの命令データをキュレートするための新しいフレームワークであるMMEvolを提案する。 MMEvolは、微粒な知覚の進化、認知的推論の進化、相互作用の進化を組み合わせている。提案手法は,3.1ポイントの平均精度向上を実現し,13の視覚言語タスクのうち9つで最先端(SOTA)性能に達する。
論文参考訳（メタデータ） (Mon, 9 Sep 2024 17:44:00 GMT)
「a novel multimodal instruction data evolution framework that combines fine-grained perception evolution, cognitive reasoning evolution, and interaction evolution.」、マルチモーダルな点が特徴的。効果は「The data evolved through three rounds of evolution is used to train a new model, demonstrating state-of-the-art (SOTA) performance across a comprehensive set of benchmarks.」としている。
テキストや数学的問題を超えて、マルチモーダルな文脈でも有効性が確かめられているのは面白いのと、今後の取り組みで画像生成モデルとの統合に言及があった点も興味深い。
プロジェクトサイトはMMEvol: Welcome (rainbowluocs.github.io)

Abstractive Text Summarization: State of the Art, Challenges, and Improvements [6.3]
このレビューでは、最先端のメソッド、課題、ソリューション、比較、制限、将来の改善をチャートアップする包括的なアプローチを取り上げる。本論文は,不適切な意味表現,事実整合性,制御可能なテキスト要約,言語間要約,評価指標などの課題を強調する。
論文参考訳（メタデータ） (Wed, 04 Sep 2024 03:39:23 GMT)
抽象型要約のサーベイ。LLMより前の手法から紹介されている。
今後の方向性として「Enhancing factual consistency, developing cross-lingual and multilingual summarization systems, concentrating on domain-specific summarization, dealing with noisy data, and enhancing long-document summarization are a few of these research directions.」が挙げられている。