2024年7月1日 – arXiv最新論文の紹介

Gemma2, CriticGPT

Googleから公開モデルとしては規模の大きいLLM Gemma2がリリースされた。9Bと27Bの公開。Llama3など競合する公開モデルを超える性能とのこと。テクニカルレポート（gemma-2-report.pdf (storage.googleapis.com)）には「The 9 billion and 27 billion parameter models are available today, with a 2 billion parameter model to be released shortly.」とある。「We also train the 2B and 9B models with knowledge distillation (Hinton et al , 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3× bigger.」と蒸留を効果的に使っているもの面白い。5. Ablationsをみるに効果は大きそう

ニュースリリース：Google launches Gemma 2, its next generation of open models (blog.google)
リポジトリ
- google/gemma-2-9b-it · Hugging Face
- google/gemma-2-27b-it · Hugging Face

いつもの翻訳ベンチマークでは非常に高い性能を示した。期待大である。Gemma 2 9Bの機械翻訳性能 | ぷるーふおぶこんせぷと (staka.jp)

OpenAIからはGPT-4の間違いを見つけ修正提案するCriticGPTが出ている。今はコードの修正が対象。限界もあるようだがこのような研究は重要。Finding GPT-4’s mistakes with GPT-4 | OpenAI

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources [100.2]
ファンデーションモデル開発は、急速に成長するコントリビュータ、科学者、アプリケーションを引き付けている。責任ある開発プラクティスを形成するために、我々はFoundation Model Development Cheatsheetを紹介します。
論文参考訳（メタデータ） (Wed, 26 Jun 2024 02:19:01 GMT)
責任ある基盤モデル開発のためのチートシート。チートシートとあるが広範な内容となっている。
プロジェクトサイトはResources for Foundation Models – Foundation Model Development Cheatsheet (fmcheatsheet.org)

SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation

SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation [45.4]
本稿では,Self-Aware Knowledge Retrieval (SeaKR)を紹介する。 SeaKRは, LLMの自己認識不確かさを内部状態から抽出する適応RAGモデルである。複雑で単純な問合せ解答データセットを用いた実験により,SeaKRが既存の適応RAG法より優れていることが示された。
論文参考訳（メタデータ） (Thu, 27 Jun 2024 14:38:33 GMT)
「SEAKR activates retrieval when the LLMs present high self-aware uncertainty for generation.」という戦略のRAG。Agenticで複雑な動作でFLARE（Fugu-MT 論文翻訳(概要): Active Retrieval Augmented Generation (fugumt.com)）やDRAGIN（Fugu-MT 論文翻訳(概要): DRAGIN: Dynamic Retrieval Augmented Generation based on the Real-time Information Needs of Large Language Models (fugumt.com)）を上回る。
リポジトリはGitHub – THU-KEG/SeaKR

Themis: Towards Flexible and Interpretable NLG Evaluation

Themis: Towards Flexible and Interpretable NLG Evaluation [39.1]
我々は,人間とGPT-4アノテーションを併用した大規模NLG評価コーパスNLG-Evalを構築し,この分野における関連データの欠如を軽減した。我々は,NLG評価専用のLLMであるThemisを提案する。
論文参考訳（メタデータ） (Wed, 26 Jun 2024 14:04:29 GMT)
評価のためのデータセット構築（0.5 million samples and 58 datasets across 9 NLG tasks）とfine tunedなモデルの提案。UniEvalやGEvalを上回る性能。
リポジトリはGitHub – PKU-ONELab/Themis: The official repository for our NLG evaluation LLM Themis and the paper Themis: Towards Flexible and Interpretable NLG Evaluation.

On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey

On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey [26.7]
大規模言語モデル(LLM)は、合成データ生成による現実世界のデータ制限を軽減するために、データ中心のソリューションを提供する。本稿では、合成データ生成の一般的なワークフローに基づく、関連する研究の組織を提供する。
論文参考訳（メタデータ） (Fri, 14 Jun 2024 07:47:09 GMT)
合成データ生成の汎用ワークフローに関するサーベイ

月	火	水	木	金	土	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31