arXiv最新論文の紹介

KAN: Kolmogorov-Arnold Networks

KAN: Kolmogorov-Arnold Networks [16.8]
MLP(Multi-Layer Perceptrons)の代替として、KAN(Kolmogorov-Arnold Networks)を提案する。カンはエッジ上で学習可能なアクティベーション機能を持つ(“weights”)。この一見単純な変化により、KANSAは精度と解釈可能性という点で、ニューラルネットワークを上回ります。
論文参考訳（メタデータ） (Tue, 30 Apr 2024 17:58:29 GMT)
MLPよりも性能・解釈可能性が優れていると主張する構造の提案。「KANs and MLPs are dual: KANs have activation functions on edges, while MLPs have activation functions on nodes. This simple change makes KANs better (sometimes much better!) than MLPs in terms of both model accuracy and interpretability.」とのこと。現時点では「Currently, the biggest bottleneck of KANs lies in its slow training. KANs are usually 10x slower than MLPs, given the same number of parameters.」という記載もあるが、本当かつ広く受け入れられるのだろうか。。
リポジトリはGitHub – KindXiaoming/pykan: Kolmogorov Arnold Networks

Capabilities of Gemini Models in Medicine

Capabilities of Gemini Models in Medicine [100.6]
医療専門のマルチモーダルモデルであるMed-Geminiを紹介する。メドジェニーニを14の医療ベンチマークで評価し,その内10に新たな最先端(SoTA)性能を確立した。我々の結果は、Med-Geminiの可能性を示唆する証拠を提供するが、より厳密な評価は実世界の展開に先立って重要である。
論文参考訳（メタデータ） (Mon, 29 Apr 2024 04:11:28 GMT)
医療特化のGemini、Med-Geminiに関する報告。GPT-4を上回る性能。かつ、「Finally, Med-Gemini’s performance suggests real-world utility by surpassing human experts on tasks such as medical text summarization and referral letter generation, alongside demonstrations of promising potential for multimodal medical dialogue, medical research and education.」
医療用にfine tuningすればこうなるだろうとは思いつつ、進化が速い。

A Careful Examination of Large Language Model Performance on Grade School Arithmetic

A Careful Examination of Large Language Model Performance on Grade School Arithmetic [4.7]
大規模言語モデル (LLM) は、数学的推論のための多くのベンチマークで驚くべき成功を収めた。このパフォーマンスの一部は、実際にデータセットの汚染を反映している、という懸念が高まっている。
論文参考訳（メタデータ） (Thu, 02 May 2024 17:18:51 GMT)
「Figure 1: Notable models arranged by their drop in performance between GSM8k and GSM1k (lower is worse).We notice that Mistral and Phi top the list of overfit models, with almost 10% drops on GSM1k compared to GSM8k, while models such as Gemini, GPT, and Claude show little to no signs of overfitting.」が衝撃的な論文で、ベンチマークデータの潜在的なLeakが問題になっていることを示している。
Fugu-MT 論文翻訳(概要): Pretraining on the Test Set Is All You Need (fugumt.com)や商用利用可能な130億パラメータの日本語LLM「Tanuki-ZeRo」を一般公開【代表的な日本語ベンチマークで世界6位: オープンモデルで1位相当､GPT3.5やClaude v2を一部凌駕, 23｜Kan Hatakeyama (note.com)など意図的に良くもできるが、そうでなくとも根深い問題で対策は簡単ではない。
個人の検証でもPhi-3もベンチマーク結果程よくはないのではないかと思う。

Better & Faster Large Language Models via Multi-token Prediction

Better & Faster Large Language Models via Multi-token Prediction [29.1]
GPTやLlamaのような大規模言語モデルは、次のトーケン予測損失で訓練される。複数の未来のトークンを同時に予測するための言語モデルをトレーニングすることで、より高いサンプル効率が得られることを提案する。
論文参考訳（メタデータ） (Tue, 30 Apr 2024 17:33:57 GMT)
正直アイデアとしてはよく聞く予測対象の複線化、「Our experiments (up to 7B parameters and 1T tokens) show that this is increasingly useful for larger models and in particular show strong improvements for code tasks.」とのこと。実験的に示したのは重要な成果であると思う。
結果の解釈も参考になる。

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models [92.7]
プロメテウス2は、人間とGPT-4の判断を密接に反映するより強力な評価器である。ユーザ定義評価基準でグループ化された、直接評価とペアのランキングフォーマットの両方を処理できる。 4つの直接評価ベンチマークと4つのペアのランキングベンチマークで、Prometheus 2は人間と独自のLM判事との相関と合意を最も高く評価している。
論文参考訳（メタデータ） (Thu, 02 May 2024 17:59:35 GMT)
評価のためのLMの提案。GPT-4を使ってデータを構築、「We choose Mistral-7B (Jiang et al , 2023a) and Mixtral8x7B (Jiang et al , 2024) as our base models, and merge the weights of evaluator LMs separately trained on the FEEDBACK COLLECTION and the PREFERENCE COLLECTION to obtain our resulting models, PROMETHEUS 2 (7B & 8x7B).」とのこと。
リポジトリはGitHub – prometheus-eval/prometheus-eval: Evaluate your LLM’s response with Prometheus 💯

SCORE: Self-COrrection ability in REasoning tasks

Small Language Models Need Strong Verifiers to Self-Correct Reasoning [69.9]
大規模言語モデル(LLM)の推論性能を高めるための有望なソリューションとして自己補正が登場した。本研究は,より小さい (= 13B) 言語モデル (LM) が,より強い LM から最小限の入力で推論タスクを自己補正できるかどうかを考察する。
論文参考訳（メタデータ） (Fri, 26 Apr 2024 03:41:28 GMT)
自己補正を強化するためのfine tuningプロセスの提案。学習用データをモデルに作らせるアプローチで自分自身を強化している感がある。
リポジトリはhttps://github.com/yunx-z/SCOREとのことだが、現在はNotFound

Weak-to-Strong Extrapolation Expedites Alignment

Weak-to-Strong Extrapolation Expedites Alignment [135.1]
人間の嗜好とLLMの整合性を高めるための簡単なExPO法を提案する。 AlpacaEval 2.0ベンチマークでは、ExPOがトレーニングされたモデルに、より好みの少ないデータで到達し、完全にトレーニングされたデータを超えていることが示されています。本研究は,LLMの能力を利用したモデル外挿の有効性を実証する。
論文参考訳（メタデータ） (Thu, 25 Apr 2024 17:39:50 GMT)
「By extrapolating from the weights of an SFT model Mw and a further trained one M, EXPO enables directly obtaining a better-aligned model without any additional training.」という手法の提案。とてもシンプルに外装しているように見え、なんでこんなんでうごくんや。
リポジトリはGitHub – chujiezheng/LLM-Extrapolation: Official repository for paper “Weak-to-Strong Extrapolation Expedites Alignment”

KS-LLM: Knowledge Selection of Large Language Models with Evidence Document for Question Answering

KS-LLM: Knowledge Selection of Large Language Models with Evidence Document for Question Answering [35.9]
大きな言語モデル(LLM)は幻覚の問題に悩まされ、知識集約的なタスクに適用した場合、重大な課題に直面します。本稿では,証拠文書から貴重な情報を特定することを目的とした,大規模言語モデル(KS-LLM)の新たな知識選択手法を提案する。まず、入力された質問に基づいて三つ組を生成し、次に証拠文書から三つ組に最もよく似たエビデンス文を選択し、最後に、エビデンス文と三つ組を組み合わせ、大きな言語モデルで回答を生成する。
論文参考訳（メタデータ） (Wed, 24 Apr 2024 05:32:41 GMT)
トリプルを使うタイプの知識選択手法。効果は一定ありそう？

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM [83.7]
視覚的コンテキスト獲得と論理的推論の集約は、視覚的推論タスクに取り組む上で重要であると我々は主張する。我々はCantorと呼ばれる革新的なマルチモーダルCoTフレームワークを提案し、その特徴は知覚決定アーキテクチャである。提案手法の有効性を実証し,マルチモーダルCoT性能の大幅な向上を示した。
論文参考訳（メタデータ） (Wed, 24 Apr 2024 17:59:48 GMT)
マルチモーダルなCoTフレームワークの提案、様々なMLLMで有効
リポジトリはCantor (ggg0919.github.io)

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs [160.6]
『各項目を1つずつリストアップ』では、タグの数字順に従って画像上に表示される全ての視覚タグを列挙して記述するようモデルに求めている。比較的小さな(タグ付き10k-30k画像)でも、この新しいデータセットは視覚的推論能力を大幅に向上させ、MLLMの幻覚を低減させる。
論文参考訳（メタデータ） (Thu, 25 Apr 2024 07:29:17 GMT)
GPT-4Vで有効なことが知られているSoM（Set-of-Mark Prompting – arXiv最新論文の紹介 (devneko.jp)）をオープンなモデルでも有効にする手法を提案。 List Items One by Oneタスクを解かせる（データセットはGPT-4Vを利用して作成）
リポジトリはGitHub – zzxslp/SoM-LLaVA: Empowering Multimodal LLMs with Set-of-Mark Prompting and Improved Visual Reasoning Ability.
（オープンなモデルでは有効性が低いことに少し驚き）

2025年12月
月	火	水	木	金	土	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31