arXiv最新論文の紹介

Configurable Foundation Models: Building LLMs from a Modular Perspective

Configurable Foundation Models: Building LLMs from a Modular Perspective [115.6]
LLMを多数の機能モジュールに分解する傾向が高まり、複雑なタスクに取り組むためにモジュールの一部とモジュールの動的アセンブリを推論することができる。各機能モジュールを表すブロックという用語を造語し、モジュール化された構造をカスタマイズ可能な基礎モデルとして定義する。検索とルーティング,マージ,更新,成長という,レンガ指向の4つの操作を提示する。 FFN層はニューロンの機能的特殊化と機能的ニューロン分割を伴うモジュラーパターンに従うことが判明した。
論文参考訳（メタデータ） (Wed, 4 Sep 2024 17:01:02 GMT)
Configurable Foundation Models、再構成可能なモジュール化された基盤モデルに関する研究、サーベイ
有用性は分かるが難しい問題との認識。model mergeなどの成果を見ると可能性を感じるとともに現時点では機能別の領域同定も簡単ではなさそうという印象。

MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct

MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct [148.4]
我々は、画像テキストの命令データをキュレートするための新しいフレームワークであるMMEvolを提案する。 MMEvolは、微粒な知覚の進化、認知的推論の進化、相互作用の進化を組み合わせている。提案手法は,3.1ポイントの平均精度向上を実現し,13の視覚言語タスクのうち9つで最先端(SOTA)性能に達する。
論文参考訳（メタデータ） (Mon, 9 Sep 2024 17:44:00 GMT)
「a novel multimodal instruction data evolution framework that combines fine-grained perception evolution, cognitive reasoning evolution, and interaction evolution.」、マルチモーダルな点が特徴的。効果は「The data evolved through three rounds of evolution is used to train a new model, demonstrating state-of-the-art (SOTA) performance across a comprehensive set of benchmarks.」としている。
テキストや数学的問題を超えて、マルチモーダルな文脈でも有効性が確かめられているのは面白いのと、今後の取り組みで画像生成モデルとの統合に言及があった点も興味深い。
プロジェクトサイトはMMEvol: Welcome (rainbowluocs.github.io)

Abstractive Text Summarization: State of the Art, Challenges, and Improvements

Abstractive Text Summarization: State of the Art, Challenges, and Improvements [6.3]
このレビューでは、最先端のメソッド、課題、ソリューション、比較、制限、将来の改善をチャートアップする包括的なアプローチを取り上げる。本論文は,不適切な意味表現,事実整合性,制御可能なテキスト要約,言語間要約,評価指標などの課題を強調する。
論文参考訳（メタデータ） (Wed, 04 Sep 2024 03:39:23 GMT)
抽象型要約のサーベイ。LLMより前の手法から紹介されている。
今後の方向性として「Enhancing factual consistency, developing cross-lingual and multilingual summarization systems, concentrating on domain-specific summarization, dealing with noisy data, and enhancing long-document summarization are a few of these research directions.」が挙げられている。

Paper Copilot, TravelAgent

LLMを用いたアプリケーションに近い論文も内部動作・設計を見る上で参考になる。

Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance [14.5]
本稿では,研究者を支援する自己進化型,効率的なLCMシステムであるPaper Copilotを紹介する。 Paper Copilotはパーソナライズされたリサーチサービスを提供し、リアルタイムで更新されたデータベースを維持する。本稿では,Paper Copilotの設計と実装について詳述し,パーソナライズされた学術的支援への貢献と研究プロセスの合理化の可能性について述べる。
論文参考訳（メタデータ） (Fri, 06 Sep 2024 20:04:04 GMT)
論文確認用のアシスタント
デモシステムはArxivCopilot – a Hugging Face Space by ulab-ai

TravelAgent: An AI Assistant for Personalized Travel Planning [36.0]
大規模言語モデル(LLM)を利用した旅行計画システムであるTravelAgentを紹介する。 TravelAgentはツール使用、推奨、計画、メモリモジュールの4つのモジュールで構成されている。我々は,TravelAgentの性能を人間とシミュレーションユーザで評価し,その全体的な効果を3つの基準で示し,パーソナライズされたレコメンデーションの精度を確認した。
論文参考訳（メタデータ） (Thu, 12 Sep 2024 14:24:45 GMT)
旅行計画用のエージェント、構築方法など参考になる。

A Survey on Emergent Language

A Survey on Emergent Language [9.8]
この論文は、人工知能における創発的言語に関する181の科学論文の包括的なレビューを提供する。その目的は、この分野に興味のある研究者や熟練した研究者の参考となることである。
論文参考訳（メタデータ） (Wed, 04 Sep 2024 12:22:05 GMT)

Agent Workflow Memory

Agent Workflow Memory [71.8]
本稿では、一般的に再利用されるルーチンを誘導するAgent Memoryを紹介する。 AWMはベースラインの結果を24.6%、相対的な成功率51.1%で大幅に改善する。オンラインAWMは、クロスタスク、ウェブサイト、ドメイン評価を強力に一般化する。
論文参考訳（メタデータ） (Wed, 11 Sep 2024 17:21:00 GMT)
「AWM induces workflows from agent trajectories by extracting reusable routines, and then integrates these workflows into agent memory to guide future task-solving processes.」というフレームワークの提案。過去の経験を一般化し貯める動的メモリのイメージで、オフラインシナリオだけでなくオンラインでも有効とのこと。
リポジトリはGitHub – zorazrw/agent-workflow-memory: AWM: Agent Workflow Memory

Can LLMs Generate Novel Research Ideas? / Can Large Language Models Unlock Novel Scientific Research Ideas?

LLMが研究のアイデアを生成できるかについての論文が2つでいた。

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers [90.3]
大型言語モデル(LLM)は、科学的な発見を加速する可能性についての楽観主義を喚起した。新しいアイデアとLLMと人間のアイデアの盲点レビューを書くことで、研究アイデアのための現在のLLM能力に関する最初の統計的に重要な結論を得る。 LLMの自己評価の失敗や世代における多様性の欠如など,研究エージェントの構築と評価におけるオープンな問題を明らかにする。
論文参考訳（メタデータ） (Fri, 06 Sep 2024 08:25:03 GMT)
LLMのアイデアと人間のアイデアを研究者が比較「we find LLM-generated ideas are judged as more novel (p < 0.05) than human expert ideas while being judged slightly weaker on feasibility.」とのこと。結果も面白いが「7 Limitations of LLMs
」、「11 Ethical Considerations」の考察も興味深い。
リポジトリはGitHub – NoviScl/AI-Researcher

Can Large Language Models Unlock Novel Scientific Research Ideas? [21.2]
大規模言語モデル(LLM)と公開可能なChatGPTは、人工知能を人々の日常生活に組み込む上で、大きな転換点となっている。本研究は,研究論文からの情報に基づく新たな研究アイデアを創出する上でのLLMの能力について考察する。
論文参考訳（メタデータ） (Tue, 10 Sep 2024 03:26:42 GMT)
上記と近いタイトルだが、こちらは「To address this task, we create a dataset of papers published after the year 2022 from these five domains.We annotate the papers with future research ideas.To evaluate the novelty and relevance of ideas generated by the LLMs, we propose an Idea Alignment Score (IAScore).This score reflects how well the generated ideas align with those proposed by the authors.」という方針で過去論文をもとにしている。Leakageが気になるところ。
リポジトリはGitHub – sandeep82945/Future-Idea-Generation

Unleashing Worms and Extracting Data: Escalating the Outcome of Attacks against RAG-based Inference in Scale and Severity Using Jailbreaking

Unleashing Worms and Extracting Data: Escalating the Outcome of Attacks against RAG-based Inference in Scale and Severity Using Jailbreaking [6.9]
我々は、GenAIモデルをジェイルブレイクする能力により、攻撃者はRAGベースのアプリケーションに対する攻撃の結果をエスカレートできることを示した。論文の前半では、攻撃者がRAG文書抽出攻撃に対してRAGメンバシップ推論攻撃をエスカレートできることが示されている。論文の第2部では、攻撃者がRAGデータ中毒攻撃の規模を拡大し、単一のアプリケーションに妥協することで、GenAIエコシステム全体を妥協できることを示す。
論文参考訳（メタデータ） (Thu, 12 Sep 2024 13:50:22 GMT)
RAGに対する攻撃、RAG membership inference attacks、RAG entity extraction attacksからRAG documents extraction attacksへ。
「Adversarial Self-Replicating Prompts」の考え方が面白い。
リポジトリはGitHub – StavC/UnleashingWorms-ExtractingData: Unleashing Worms and Extracting Data: Escalating the Outcome of Attacks against RAG-based Inference in Scale and Severity Using Jailbreaking

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Towards a Unified View of Preference Learning for Large Language Models: A Survey [89.7]
大きな言語モデル(LLM)は、非常に強力な能力を示す。成功するための重要な要因の1つは、LLMの出力を人間の好みに合わせることである。選好学習のすべての戦略を、モデル、データ、フィードバック、アルゴリズムの4つの構成要素に分解する。
論文参考訳（メタデータ） (Wed, 04 Sep 2024 15:11:55 GMT)
LLM構築で重要なPreference Learningのサーベイ
リポジトリはGitHub – KbsdJames/Awesome-LLM-Preference-Learning: The official repository of our survey paper: “Towards a Unified View of Preference Learning for Large Language Models: A Survey”

Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources

Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources [38.3]
我々は、コストのかかる人的アノテーションに頼ることなく、LLMに新しいスキルを教えるために使用できる新しい方法、Source2 Synthを提案する。 Source2 Synthはカスタムデータソースを入力として、実世界のソースをベースとした中間的推論ステップを備えた合成データポイントを生成する。マルチホップ質問応答(MHQA)とツール質問応答(TQA)の推論能力をテストする。
論文参考訳（メタデータ） (Thu, 12 Sep 2024 17:39:08 GMT)
「we propose Source2Synth, a general approach to generate synthetic data grounded in external real-world sources.」でDataset generation → Dataset Curation → Fine tuningに進むながれ。
キュレーションフェーズで「This is achieved by slicing the dataset in two and using one slice to fine-tune the LLM (LLMSynth).」、「Data filtering During filtering, LLMSynth is used to predict the output of the given synthetic example using k tries. If the output cannot be predicted at least once, it is assumed the example is low quality and is not included in the final curated dataset.」とのこと。極端なデータだけフィルタするような意図なのだろうか。（at least oneだと問題ないかもだが、閾値によってモデル崩壊を招くのかどうかなど気になるところ）

2025年7月
月	火	水	木	金	土	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31