LLM – ページ 7 – arXiv最新論文の紹介

Knowledge Injection via Prompt Distillation

Knowledge Injection via Prompt Distillation [48.7]
本稿では,新しい知識を学習するための新しい微調整手法を提案し,RAGの性能に到達できることを示す。提案手法は, 急速蒸留と呼ばれる自己蒸留法に基づいている。
論文参考訳（メタデータ） (Thu, 19 Dec 2024 15:44:01 GMT)
LLMにない知識を用いる場合はRAGを利用することが多いが、それと同様の性能を発揮できるfine tuning手法、 prompt distillation の提案。RAGと組み合わせることも可能とのこと。

RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation

RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation [21.8]
RetroLLMは、検索と生成を単一の凝集プロセスに統合する統合フレームワークである。制約付きエビデンス生成の過程での偽プルーニングを軽減するために,階層的FM-Index制約を導入する。 5つのオープンドメインQAデータセットの実験では、ドメイン内タスクとドメイン外タスクの両方にわたって、RetroLLMの優れたパフォーマンスが示されている。
論文参考訳（メタデータ） (Mon, 16 Dec 2024 16:03:25 GMT)
検索と生成をシームレスにつなぐフレームワークの提案、
リポジトリはGitHub – sunnynexus/RetroLLM: RetroLLM: Empowering LLMs to Retrieve Fine-grained Evidence within Generation

CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy

CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy [88.1]
CC-OCRは、マルチシーンテキスト読取、多言語テキスト読取、文書解析、キー情報抽出の4つのOCR中心のトラックで構成されている。 CC-OCRは、OCR中心のタスクにおけるLMMの能力を総合的に評価し、LMMの進歩を促進することを目的としている。
論文参考訳（メタデータ） (Tue, 03 Dec 2024 07:03:25 GMT)
MLLMのためのOCRベンチマーク、全般的にGemini Proの性能が高い
リポジトリはhttps://github.com/QwenLM/CC-OCR

From Intention To Implementation: Automating Biomedical Research via LLMs

From Intention To Implementation: Automating Biomedical Research via LLMs [32.0]
本稿では,バイオメディカル研究プロセス全体を合理化するために設計された,初のエンドツーエンド自動システムであるBioResearcherを紹介する。複雑なタスクを論理的に関連するサブタスクに分解することで、BioResearcherは多分野要求と論理複雑性の課題を効果的に解決する。 BioResearcherは8つの未測定研究目標に対して平均実行成功率63.07%を達成している。
論文参考訳（メタデータ） (Thu, 12 Dec 2024 16:35:05 GMT)
「BioResearcher employs a modular multi-agent architecture, integrating specialized agents for search, literature processing, experimental design, and programming.」とのこと。
解釈が難しい数値とはいえ、達成率はかなり高い印象。。。

Political-LLM: Large Language Models in Political Science

Political-LLM: Large Language Models in Political Science [160.0]
大規模言語モデル(LLM)は、政治科学のタスクで広く採用されている。政治LLMは、LLMを計算政治科学に統合する包括的な理解を促進することを目的としている。
論文参考訳（メタデータ） (Mon, 09 Dec 2024 08:47:50 GMT)
「In this work, we—a multidisciplinary team of researchers spanning computer science and political science—present the first principled framework termed Political-LLM to advance the comprehensive understanding of integrating LLMs into computational political science.」、「The intended audience of this survey includes (1) computer science researchers and practitioners who seek a structured understanding of how LLMs are applied in political science, aiming to bridge interdisciplinary gaps; and (2) political science researchers and practitioners who seek to leverage LLMs in ways that are sensitive to the unique requirements of their field, such as nuanced interpretation and contextual accuracy [57].」ということで、政治へのLLM応用について調査したサーベイ。政治とあるが社会的なLLMの活用方針についての示唆も多く参考になる点が多い。プロジェクトサイトのライセンスがCC BY-SAであるのはありがたい。
プロジェクトサイトはPolitical-LLM: Large Language Models in Political Science

Amazon Nova, OpenAI o-1 pro, Gemini-Exp-1206, Llama 3.3

先週はLLM関連の話題が特に多い週だった。Amazon、OpenAI、Google、Metaが大きめのリリースを出しており、OpenAIはこれから発表を続けていくとのことでとても楽しみである。

Introducing-Amazon-Nova-A-New-Generation-of-Foundation-Models – US Press Center
- Amazonから発表された高性能LLM、下記のように様々なバージョンが存在
  - Amazon Nova Micro（高速なtext to text）
  - Amazon Nova Lite（高速なマルチモーダル）
  - Amazon Nova Pro （高性能なマルチモーダル）
  - Amazon Nova Premier（複雑な推論を得意とするモデル？）
  - Amazon Nova Canva（画像生成）
  - Amazon Nova Reel（動画生成）
Introducing ChatGPT Pro | OpenAI
- ChatGPT proの発表、OpenAI o1 pro modeはo1から性能をさらに上げている。
https://aistudio.google.com/app/prompts/new_chat?model=gemini-exp-1206
- 2024-12-05時点でChatbot Arena (formerly LMSYS): Free AI Chat to Compare & Test Best AI Chatbotsトップのモデル
Llama 3.3 | Model Cards and Prompt formats
- 「Llama 3.3 is a text-only 70B instruction-tuned model that provides enhanced performance relative to Llama 3.1 70B–and to Llama 3.2 90B when used for text-only applications. Moreover, for some applications, Llama 3.3 70B approaches the performance of Llama 3.1 405B.」を主張するMetaのモデル、公開モデル

各社の競争が非常に激しい。

Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier

Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier [72.6]
8Bおよび32Bパラメータ多言語モデルの新世代であるAya Expanseモデルファミリを導入する。 Cohere For AIとCohereでの数年間の研究を活用することで、Aya Expanseは多言語パフォーマンスにおける新たな最先端技術を確立している。 Aya Expanse 8B と 32B が主要なオープンウェイトモデルより優れていることを示すために,23言語に翻訳された Arena-Hard-Auto データセットの評価を行った。
論文参考訳（メタデータ） (Thu, 05 Dec 2024 15:41:06 GMT)
Cohereによる多言語LLM、公開モデルの論文。他の公開モデルより高性能を主張。
リポジトリはCohereForAI/aya-expanse-8b · Hugging Face、CohereForAI/aya-expanse-32b · Hugging Face

BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment

BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment [42.2]
本稿では,ハイソース言語から低ソース言語へ効率的に生成能力と知識を伝達するBayLing 2を紹介する。 100以上の言語にまたがる多言語翻訳では、BayLingは同様のスケールのオープンソースモデルよりも優れたパフォーマンスを示している。 BayLingのデモ、ホームページ、コード、モデルが利用可能だ。
論文参考訳（メタデータ） (Mon, 25 Nov 2024 11:35:08 GMT)
fune tuningをベースとした多言語モデルの構築「By fine-tuning on high-resource language instructions and cross-lingual instructions, LLM can transfer knowledge and generative capabilities from high-resource languages to low-resource languages, thereby facilitating multilingual interaction.」「Cross-lingual instructions, such as interactive translation and multilingual translation, can efficiently enhance the language alignment within LLM, thereby improving translation performance.」とのことだが、結果の解釈はなかなか難しい・・・
リポジトリはGitHub – ictnlp/BayLing: “百聆”是一个基于LLaMA的语言对齐增强的英语/中文大语言模型，具有优越的英语/中文能力，在多语言和通用任务等多项测试中取得ChatGPT 90%的性能。BayLing is an English/Chinese LLM equipped with advanced language alignment, showing superior capability in English/Chinese generation, instruction following and multi-turn interaction.、プロジェクトサイトはhttp://nlp.ict.ac.cn/baylingだが執筆時点ではダウンしているよう（？）

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages [73.9]
ALM-benchは、100言語にわたるLMMを評価するための、これまでで最大かつ最も包括的な取り組みである。様々な言語でテキストと組み合わせた文化的に多様なイメージを理解し、推論する能力をテストすることで、既存のモデルに挑戦する。このベンチマークは、真/偽、複数選択、オープンな質問など、さまざまな質問フォーマットを備えた、堅牢でニュアンスの高い評価フレームワークを提供する。
論文参考訳（メタデータ） (Mon, 25 Nov 2024 15:44:42 GMT)
きわめて多い言語のLLM評価ベンチマーク。タスクはVQA。
リポジトリはAll Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

Multilingual Large Language Models: A Systematic Survey

Multilingual Large Language Models: A Systematic Survey [39.0]
本稿では,多言語大言語モデル(MLLM)の最新研究を包括的に調査する。まず,MLLMのアーキテクチャと事前学習の目的について論じ,多言語機能に寄与する重要なコンポーネントや方法論を強調した。本稿では,MLLMの言語間知識,推論,人的価値との整合性,安全性,解釈可能性,専門的応用に関する詳細な分類とロードマップを示す。
論文参考訳（メタデータ） (Sun, 17 Nov 2024 13:21:26 GMT)
マルチリンガルなLLMのサーベイ。MLLMのMは（最近は）マルチモーダルであることが多いので若干戸惑う。
リポジトリはGitHub – tjunlp-lab/Awesome-Multilingual-LLMs-Papers: Awesome-Multilingual-LLMs-Papers

2025年6月
月	火	水	木	金	土	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30