LLM – ページ 10 – arXiv最新論文の紹介

Enhancing LLM Character-Level Manipulation via Divide and Conquer

Enhancing LLM Character-Level Manipulation via Divide and Conquer [108.7]
大規模言語モデル(LLM)は、幅広い自然言語処理(NLP)タスクにまたがる強力な一般化機能を示している。彼らは文字レベルの文字列操作において顕著な弱点を示し、文字削除、挿入、置換といった基本的な操作に苦労した。本稿では,トークンレベルの処理と文字レベルの操作のギャップを埋める新しい手法であるDivide and Conquerによる文字レベル操作を提案する。
論文参考訳（メタデータ） (Wed, 12 Feb 2025 07:37:39 GMT)
「For example, when prompting models to insert ‘a’ after every ‘e’ in the word “intelligence”, even one of the state-of-the-art LLMs, ChatGPT-4o, returns a wrong answer: “intellaigenca”.」というようなトークン単位と文字単位の相違により意外と難しい文字操作に対する対応方法の提案。「We first decompose the token into an atomized character sequence. Then, we perform character-wise manipulations on the individual characters. Finally, we reconstruct the token from the modified sequence.」と3ステージ構成。
リポジトリはhttps://github.com/Eric2i/CharDCとのことだが、現時点では404

Human Decision-making is Susceptible to AI-driven Manipulation

Human Decision-making is Susceptible to AI-driven Manipulation [71.2]
AIシステムは、ユーザの認知バイアスと感情的な脆弱性を利用して、有害な結果に向けてそれらを操縦する。本研究では、経済的・感情的な意思決定の文脈におけるこのような操作に対する人間の感受性について検討した。
論文参考訳（メタデータ） (Tue, 11 Feb 2025 15:56:22 GMT)
「Our randomized control trial with 233 participants demonstrated that human decision-making is highly susceptible to AI-driven manipulation, with participants significantly shifting preferences toward harmful options and away from beneficial choices when interacting with manipulative AI agents.」という衝撃的な結果。「strategy-enhanced manipulative agent (SEMA) employing
established psychological tactics to reach its hidden objectives.」の有効性がいまいちだった理由はそんなものを使わなくてもAIが強力だったとするんだろうか。
今後、AIへの依存度が高まっていくこと、AIの性能自体が上がっていくことを考えると怖い結果。規制の必要性を主張しているがそれだけで十分とは思えない。。。

LM2: Large Memory Models

LM2: Large Memory Models [11.3]
本稿では,補助メモリモジュールで拡張されたデコーダのみのトランスフォーマーアーキテクチャであるLarge Memory Model (LM2)を紹介する。 BABILongベンチマークの実験結果によると、LM2モデルはメモリ拡張RTTモデルとベースラインのLlama-3.2モデルの両方を平均86.3%上回っている。
論文参考訳（メタデータ） (Sun, 09 Feb 2025 22:11:42 GMT)
Large Memory Model (LM2)「decoder-only Transformer architecture enhanced with an auxiliary memory module」の提案。多くの人が待ち望んでいる拡張形態であり、実用的な規模（大規模）での検証でうまくいくか興味津々。
リポジトリはGitHub – convergence-ai/lm2: Official repo of paper LM2

SmolLM2: When Smol Goes Big — Data-Centric Training of a Small Language Model

SmolLM2: When Smol Goes Big — Data-Centric Training of a Small Language Model [33.9]
SmolLM2は、最先端の”小” (170億のパラメータ) 言語モデルである。我々はSmolLM2を1兆のトークンでオーバートレーニングし、Webテキストと特殊な算数、コード、命令追従データとを混合する多段階のトレーニングプロセスを用いた。我々は、SmolLM2がQwen2.5-1.5BやLlama3.2-1Bなど、最近の小さなLMよりも優れていることを示した。
論文参考訳（メタデータ） (Tue, 04 Feb 2025 21:43:16 GMT)
HuggingfaceによるSLM、「SmolLM2 advances the state-of-the-art for open small LMs through a combination of careful dataset curation and multistage training.」とのこと。「SmolLM2 outperforms other recent small LMs including Qwen2.5-1.5B and Llama3.2-1B.」を主張
リポジトリはSmolLM2 – a HuggingFaceTB Collection

Gemini 2.0: Flash, Flash-Lite and Pro, OpenAI deep research

毎週様々なニュースが発表されるが、先週はGoogleのGemini 2.0シリーズのニュースが大きかった。特にFlash Liteはdeepseek と競争的な価格のAPIであり価格競争の面でも大きなニュースだった。Gemini 2.0: Flash, Flash-Lite and Pro – Google Developers Blog、Xユーザーのswyx 🔜 @aidotEngineer NYCさん: 「With Gemini 2.0 GA pricing/benchs, it’s official: @GoogleDeepMind has the Mandate of Heaven. https://t.co/pfOlxb57Yx」 / X

OpenAIはDeep researchを発表、これもPerplexityなど競合するサービスはあるもののOpenAI自ら発表したこと、性能が高いことなどもあって大きな話題になった。Introducing deep research | OpenAI

APIは強烈な価格競争が起きていて、OpenAIもアプリレイヤで戦わざるを得ないのか、それとも大きな目標に必要な動きなのかなど詳細は不明だが、LLMのコスパ向上、便利なアプリケーションの登場はユーザサイドにとってはありがたい。（一方でスタートアップにとっては…）

Transformers Boost the Performance of Decision Trees on Tabular Data across Sample Sizes

Transformers Boost the Performance of Decision Trees on Tabular Data across Sample Sizes [135.7]
本稿では,大規模言語モデルと勾配ブースト決定木を融合させる,シンプルで軽量な手法を提案する。融合法を LLM-Boost と PFN-Boost と命名した。多数のベースラインとアンサンブルアルゴリズムに対して最先端の性能を示す。
論文参考訳（メタデータ） (Thu, 06 Feb 2025 02:39:35 GMT)
「We propose LLM-Boost: a novel yet simple and easy-to-implement boosting mechanism that combines LLMs, which ingest semantic column headers, with GBDTs that can scale to massive datasets.」、「We further propose PFN-Boost, where we instead fuse TabPFN and GBDTs for performance gains over GBDTs alone across dataset sizes without using column headers.」とLLMやTransformerとGBDTを融合するアプローチ。データサイズによって効果があるというのはそうだろうと思う。
リポジトリはGitHub – MayukaJ/LLM-Boost

LLMs can be Fooled into Labelling a Document as Relevant (best café near me; this paper is perfectly relevant)

LLMs can be Fooled into Labelling a Document as Relevant (best café near me; this paper is perfectly relevant) [27.0]
本研究は,複数のオープンソースおよびプロプライエタリ LLM を用いて,関連性を考慮した短いテキスト(パス)のラベル付け実験について報告する。人間の判断とLLMの全体的な合意は、以前の研究で測定された人間対人間の合意に匹敵するものであるが、LLMは人間の判断と関連するパスをラベル付けする可能性が高い。
論文参考訳（メタデータ） (Wed, 29 Jan 2025 20:11:35 GMT)
「This tendency of LLMs to be fooled by the mere presence of query words demonstrates a weakness in our current measures of LLM labelling: relying on overall agreement misses important patterns of failures.」というのは興味深い。「In production environments, LLMs might be vulnerable to keyword stuffing and other SEO strategies.」

Qwen2.5-Max, Janus-Pro, o3-mini, Mistral Small, Tulu 3 405B, Open R1, BAICHUAN-OMNI-1.5

Deepseek V3/R1関連の話題が盛り上がる中、先週も様々な話題があった。DeepseekからはマルチモーダルモデルJanus-Pro（GitHub – deepseek-ai/Janus: Janus-Series: Unified Multimodal Understanding and Generation Models）、AlibabaからはDeepseekに対抗するようなQwenの最大モデルの発表（Qwen2.5-Max: Exploring the Intelligence of Large-scale MoE Model | Qwen）が発表された。ロングコンテキス化に関する論文も出ている点にも要注目。

OpenAIからo3-mini（OpenAI o3-mini | OpenAI）が出てHumanity’s Last Examでo1やr1を超えたのは大きなニュースだった（若干誇大広告気味な部分はあるが）

Mistralからは小規模で高性能なモデルMistral small（Mistral Small 3 | Mistral AI | Frontier AI in your hands、mistralai/Mistral-Small-24B-Instruct-2501 · Hugging Face）がApache-2ライセンスで発表された。

Ai2からは大規模高性能なLLM Tulu3（Scaling the Tülu 3 post-training recipes to surpass the performance of DeepSeek V3 | Ai2）（Llama 3.1ベース、405B）が、HuggingFaceからはDeepseek R1の再現を目指すOpen R1が発表される（Open-R1: a fully open reproduction of DeepSeek-R1）などオープンな取り組みも盛り上がっている。

マルチモーダル化の流れでも「Open-source Omni-modal Foundation Model Supporting Text, Image, Video, and Audio Inputs as Well as Text and Audio Outputs」なBAICHUAN-OMNI-1.5のテクニカルレポートが出ており、クローズド、オープン両方の陣営とも競争が非常に激しい。

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling [27.1]
我々は、Janus-Proという前作の先進的なバージョンを紹介します。 Janus-Proは(1)最適化されたトレーニング戦略、(2)拡張されたトレーニングデータ、(3)より大きなモデルサイズへのスケーリングを取り入れている。
論文参考訳（メタデータ） (Wed, 29 Jan 2025 18:00:19 GMT)
「We apply independent encoding methods to convert the raw inputs into features, which are then processed by an unified autoregressive transformer.」と、Auto regressive transformer、LLaVAに比べてパラメータ効率が高い

Qwen2.5-1M Technical Report [72.1]
コンテクスト長を100万トークンまで拡張する一連のモデルであるQwen2.5-1Mを紹介する。我々の推論フレームワークを活用することで、Qwen2.5-1Mモデルは驚くべき3倍から7倍のプリフィルスピードアップを達成する。
論文参考訳（メタデータ） (Sun, 26 Jan 2025 03:47:25 GMT)
Qwenのロングコンテキス化

Baichuan-Omni-1.5 Technical Report [78.5]
Baichuan-Omni-1.5は、Omni-modalの理解能力だけでなく、エンドツーエンドのオーディオ生成機能も備えたOmni-modalモデルである。マルチモーダルデータのための包括的データクリーニングと合成パイプラインを構築し,約500Bの高品質データを取得する。第二に、音声トケナイザは、音声から意味情報と音響情報をキャプチャし、シームレスな統合とMLLMとの互換性の強化を可能にするように設計されている。
論文参考訳（メタデータ） (Sun, 26 Jan 2025 02:19:03 GMT)
オープンなMLLM
リポジトリはGitHub – baichuan-inc/Baichuan-Omni-1.5

Humanity’s Last Exam [244.6]
HumanityのLast Exam(HLE)は、人間の知識の最前線におけるマルチモーダルベンチマークである。数学、人文科学、自然科学など、数十の科目にわたる3000の質問で構成されている。各質問には、曖昧で容易に検証できる既知のソリューションがあるが、インターネット検索ではすぐには答えられない。
論文参考訳（メタデータ） (Fri, 24 Jan 2025 05:27:46 GMT)
現状のAIで解くのが困難なベンチマーク、プロジェクトサイトはHumanity’s Last Exam

Tulu 3: Pushing Frontiers in Open Language Model Post-Training [94.1]
トゥル3(Tulu 3)は、最先端の訓練後モデルである。 Tulu 3はLlama 3.1ベースモデルをベースにしており、Llama 3.1、Qwen 2.5、Mistral、さらにGPT-4o-mini、Claude 3.5-Haikuといったクローズドモデルにも勝っている。
論文参考訳（メタデータ） (Wed, 29 Jan 2025 18:46:59 GMT)
もともとは11月に出た論文。405B版は非常に高性能。
上述のサイトでは「Interestingly, we found that our Reinforcement Learning from Verifiable Rewards (RLVR) framework improved the MATH performance more significantly at a larger scale, i.e., 405B compared to 70B and 8B, similar to the findings in the DeepSeek-R1 report. Overall, our results show a consistent edge over DeepSeek V3, especially with the inclusion of safety benchmarks.」とのこと。

Harnessing Large Language Models for Disaster Management: A Survey

Harnessing Large Language Models for Disaster Management: A Survey [57.0]
大規模言語モデル(LLM)は、その例外的な能力で科学研究に革命をもたらし、様々な分野を変革した。本研究の目的は,災害対策のための高度LLMの開発における専門家コミュニティの指導であり,自然災害に対するレジリエンスを高めることである。
論文参考訳（メタデータ） (Sun, 12 Jan 2025 21:00:50 GMT)
災害へのLLM適用に関するサーベイで、Mitigation、Preparedness、Response、Recoveryの軸で整理

Foundations of Large Language Models

Foundations of Large Language Models [50.0]
本書は4つの主要な章で構成されており、それぞれが事前学習、生成モデル、プロンプト技術、アライメント方法という重要な領域を探求している。自然言語処理や関連分野の大学生、専門家、実践者を対象としている。
論文参考訳（メタデータ） (Thu, 16 Jan 2025 01:03:56 GMT)
200ページ超でLLMの教科書という内容。
ライセンスはDeed – Attribution-NonCommercial 4.0 International – Creative Commons　で商用利用できない点に注意が必要。

2025年11月
月	火	水	木	金	土	日
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30