arXiv最新論文の紹介

Editing Conceptual Knowledge for Large Language Models

Editing Conceptual Knowledge for Large Language Models [67.8]
本稿では,Large Language Models(LLMs)における概念知識の編集の先駆者となる。本研究では,新しいベンチマークデータセットConceptEditを構築し,評価のための新しいメトリクスセットを確立する。実験の結果,既存の編集手法は概念レベルの定義をある程度効率的に修正できるが,関連する瞬間的知識を歪ませる可能性も示された。
論文参考訳（メタデータ） (Sun, 10 Mar 2024 16:57:10 GMT)
概念を対象とした知識編集手法ができるか検証した論文。事実の編集よりも対象が大きく、既存手法で一定程度可能ではあるが限界もあるよう。ベンチマークデータが公開されていて「 To maintain the quality of our data, we manually review all the descriptions we gathered, replacing any unclear or ambiguous.」とのこと。。。
リポジトリはEditing Conceptual Knowledge for Large Language Models (zjukg.org)、データはzjunlp/ConceptEdit · Datasets at Hugging Face

WikiTableEdit

WikiTableEdit: A Benchmark for Table Editing by Natural Language Instruction [56.2]
本稿では,表編集作業におけるLarge Language Models(LLM)の性能について検討する。 Wikiデータセットから26,531のテーブルを活用し、6つの異なる基本操作のための自然言語命令を生成する。 WikiTableEditデータセット上でいくつかの代表的大規模言語モデルを評価し,その課題を実証する。
論文参考訳（メタデータ） (Tue, 5 Mar 2024 13:33:12 GMT)
表を編集するタスク（We select six commonly-employed fundamental operations for our dataset: (1) Adding a new row or column, (2) Removing a row or column, (3) Swapping two rows, (4) Reordering based on a certain column, (5) Merging adjacent cells with identical values, and (6) Splitting the merged cells.）の提案とデータセットの作成。GPT3.5-turboで厳しい感じであり、スコアを見るに簡単に見えて難しいタスクであるよう。（より大規模なモデルでの結果も知りたいところではある）
リポジトリはAnonymized Repository – Anonymous GitHub (4open.science)

Large Language Models are Parallel Multilingual Learners

Large Language Models are Parallel Multilingual Learners [50.1]
本研究では,多言語大言語モデル(LLM)の文脈内学習能力を明らかにする。入力を複数の言語に翻訳することで、並列入力(PIM)をLLMに提供し、その理解能力を大幅に向上させる。
論文参考訳（メタデータ） (Thu, 14 Mar 2024 03:33:46 GMT)
PIM（コンテキストとして同じ意味のテキストを複数の言語で与える）という新たなICL戦略の提案。特に多言語モデルでは性能向上効果があるとのこと。機械翻訳を通したテキストでも効果ありというのは面白い。
「Considering knowledge learnt from different languages memorized in separate neurons of LLMs, a straightforward explanation for the superiority of PIM is that it leads to the increasing number of activated neurons, utilizing more knowledge during the inference stage.」はなるほどと思いつつ「This finding is similar to the synaptic pruning happening in brains, which prunes less-used neural connections and makes frequently-used neural pathways more powerful and efficient (Huttenlocher et al , 1979; Huttenlocher, 1990).」はほんまかいなと思わなくもない。
リポジトリはtakagi97/LLMs-are-parallel-multilingual-learners: The implementation of Large Language Models are Parallel Multilingual Learners. (github.com)

On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models

On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models [77.9]
ビッグデータはAIの分野で画期的なブレークスルーを達成したが、潜在的な懸念を生じさせるかもしれない。このような懸念に対処するため、これらのモデルを人間の嗜好や価値観に適合させるアライメント技術が導入された。過去1年間にかなりの進歩があったにもかかわらず、最適アライメント戦略の確立には様々な課題がある。
論文参考訳（メタデータ） (Thu, 7 Mar 2024 04:19:13 GMT)
アライメントのサーベイ、と同時にLLMの急速な進化が感じられる。

TIVE: Task-level and Instance-level Value Estimation

Less is More: Data Value Estimation for Visual Instruction Tuning [127.4]
視覚的命令データにおける冗長性を除去する新しいデータ選択手法を提案する。 LLaVA-1.5の実験では、約7.5%のデータしか使用していないアプローチが、フルデータ微調整モデルと同等の性能を達成できることが示されている。
論文参考訳（メタデータ） (Thu, 14 Mar 2024 16:47:25 GMT)
visual instruction datasetには不要・冗長なデータが多く含まれており、その重要性を評価して削減する手法を提案。「using only about 7.5% data can achieve comparable performance as the full-data fine-tuned model across seven benchmarks, even surpassing it on four of the benchmarks.」とのことで、非常に効果的に見える。
「Our code and data will be publicly released.」らしい

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision [99.0]
現在のAIアライメント手法は、人間が提供する実演や判断に依存している。彼らの能力が人間のレベルを超えたとき、システムを改善するにはどうすればよいのか?
論文参考訳（メタデータ） (Thu, 14 Mar 2024 15:12:38 GMT)
The Unreasonable Effectiveness of Easy Training Data for Hard Tasks – arXiv最新論文の紹介 (devneko.jp)でも取り上げられていた話だが、PRMs(process reward models)やOPRMs(Outcome & Process Reward Model)を用いるとさらに有効とのこと。
AGIやASIという話を聞くにこのような手法の重要性が高まっているように思う（一方で結論にある「This approach presents a promising direction for developing AI systems capable of surpassing human problem-solving capabilities」のように人間がEasy側に位置づけられるのは複雑な思いもある）
リポジトリはEdward-Sun/easy-to-hard (github.com)

GrokとGemini 1.5とGemma

X（旧Twitter）で事前アナウンス「XユーザーのElon Muskさん: 「This week, @xAI will open source Grok」 / X (twitter.com)」の通り（？）Grokが公開された。314BのMoE構成とのことでfine tuning未済のベースモデルのみの公開。

Open Release of Grok-1 (x.ai)
xai-org/grok: Grok open release (github.com)

Model Details
・Base model trained on a large amount of text data, not fine-tuned for any particular task.
・314B parameter Mixture-of-Experts model with 25% of the weights active on a given token.
・Trained from scratch by xAI using a custom training stack on top of JAX and Rust in October 2023.
Open Release of Grok-1 (x.ai)

「The code and associated Grok-1 weights in this release are licensed under the Apache 2.0 license. The license only applies to the source files in this repository and the model weights of Grok-1.」とのことで、コード・モデル（torrentでの公開のよう）ともにApache-2ライセンスでの公開。完全なOSSであり大きな意味がありそう。

先週、arXivにGemini 1.5とGemmaの論文が公開されていた。Calude 3を含め、GPT-4一強ではない時代になりつつあるし、オープンな流れも加速してほしいところ。Mistralの動きが気になる。

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context [379.4]
Gemini 1.5 Pro は計算効率の良いマルチモーダル・ミックス・オブ・エキスパート・モデルである。モダリティ間の長文検索タスクにおいて、ほぼ完璧なリコールを実現する。 Gemini 1.0 Ultraの最先端のパフォーマンスは、幅広いベンチマークで一致または上回っている。
論文参考訳（メタデータ） (Fri, 8 Mar 2024 18:54:20 GMT)
SORAとGemini-1.5 – arXiv最新論文の紹介 (devneko.jp)と同じ内容

Gemma: Open Models Based on Gemini Research and Technology [126.0]
Gemmaは、Geminiモデルを作成するために使用される研究と技術から構築された、軽量で最先端のオープンモデルのファミリーである。 Gemmaモデルは、言語理解、推論、安全性のための学術ベンチマークで強力なパフォーマンスを示している。
論文参考訳（メタデータ） (Wed, 13 Mar 2024 06:59:16 GMT)
Geminiのオープンモデル。Gemma release – a google Collection (huggingface.co)などから利用可能。
寛容なライセンスに見えるが独自のライセンス（Gemma Terms of Use | Google AI for Developer）であり、Gemma Prohibited Use Policy | Google AI for Developersにある倫理的に問題のある利用は明示的に禁止されている。（派生物も守る必要がある）

TRAD: Thought Retrieval and Aligned Decision

TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision [32.2]
大規模言語モデル(LLM)エージェントは、Webナビゲーションやオンラインショッピングなど、さまざまなタスクのために構築されている。本稿では,これらの問題に対処するための新しいフレームワーク(TRAD)を提案する。 TRADはThought Retrievalを実行し、思考マッチングによるステップレベルのデモ選択を実現する。そして、TRADはAligned Decisionを導入し、検索したデモステップを、以前のステップまたはその後のステップで補完する。
論文参考訳（メタデータ） (Sun, 10 Mar 2024 13:58:38 GMT)
現時点で有効そうなアプローチを多く盛り込んだように見えるフレームワーク。「Furthermore, TRAD has been deployed in real-world scenarios of a global business insurance company and improves the success rate of robotic process automation.」というのは凄い。
リポジトリはSkyRiver-2000/TRAD-Official: TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision (github.com)

MM1

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training [105.4]
MLLM(Performant Multimodal Large Language Models)を構築する。特に,さまざまなアーキテクチャコンポーネントとデータ選択の重要性について検討する。本稿では,画像キャプチャ,インターリーブ画像テキスト,テキストのみのデータを組み合わせた大規模マルチモーダル事前学習について述べる。
論文参考訳（メタデータ） (Thu, 14 Mar 2024 17:51:32 GMT)
AppleのMultimodal Large Language Model。Appleがこの手の成果を公表するのは珍しい気がする。
apple/axlearn (github.com)を使っているとのこと。

GaLore: Low-Rank Projection

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection [139.2]
LLM(Large Language Models)のトレーニングは、重み付けやGPU状態の増大によって、メモリ上の重大な問題が発生する。本研究では,メモリ効率のトレーニング戦略としてグラディエント・ローランド・プロジェクション(GaLore)を提案する。私たちの8ビットのGaLoreは、BF16ベースラインと比較して、メモリを82.5%、トレーニング総メモリを63.3%削減します。
論文参考訳（メタデータ） (Wed, 6 Mar 2024 07:29:57 GMT)
LLMを扱う上で大問題になるメモリ効率を高めたトレーニング手法の提案。NVIDIA RTX 4090 RAM 24GBで7Bモデルを事前学習可能とのこと。

2025年9月
月	火	水	木	金	土	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30