arXiv – ページ 5 – arXiv最新論文の紹介

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

JetMoE: Reaching Llama2 Performance with 0.1M Dollars [25.3]
このレポートでは、JetMoE-8Bという新しい大規模言語モデルを紹介します。低コストにもかかわらず、JetMoE-8BはLlama2-7Bモデルより優れ、JetMoE-8B-ChatはLlama2-13B-Chatモデルより優れていた。本報告では,すべてのトレーニングパラメータとデータ混合物について詳述し,オープンファンデーションモデルの開発における今後の取り組みを促進する。
論文参考訳（メタデータ） (Thu, 11 Apr 2024 00:52:39 GMT)
安価（といっても「$0.1 million, using 1.25T tokens from carefully mixed open-source corpora and 30,000 H100 GPU hours.」）でLLMを構築するレシピの提案
リポジトリはmyshell-ai/JetMoE: Reaching LLaMA2 Performance with 0.1M Dollars (github.com)

Many-Shot In-Context Learning

Many-Shot In-Context Learning [57.6]
大規模言語モデル (LLMs) は、文脈内学習 (ICL) において優れている我々は、多種多様な生成的および識別的タスクにおける顕著なパフォーマンス向上を観察する。 Reinforced と Unsupervised ICL は多発的なシステムでは極めて有効であることがわかった。
論文参考訳（メタデータ） (Wed, 17 Apr 2024 02:49:26 GMT)
Gemini 1.5などで可能になったMany shot（500 shotなど）などの効果の分析。性能が上がる例が多いが「On some tasks (e g , code verifier, planning), we did observe slight performance deterioration beyond a certain number of shots.」とのこと。Reinforced ICL、Unsupervised ICL という人間を介さないICLも検証していて「We found that, for problem-solving domains where human-generated rationales are expensive to obtain, Reinforced and Unsupervised ICL can obtain strong performance when compared to ICL with human data.」とのこと。
長いコンテキストの利点をアピールする論文。SSMだとどうなんるんやろという興味がある。

Which questions should I answer? Salience Prediction of Inquisitive Questions

Which questions should I answer? Salience Prediction of Inquisitive Questions [118.1]
非常に健全な質問は、同じ記事で経験的に答えられる可能性が高いことを示す。質問に対する回答が,ニュースの要約品質の指標であることを示すことで,我々の知見をさらに検証する。
論文参考訳（メタデータ） (Tue, 16 Apr 2024 21:33:05 GMT)
質問の良さを予測するためのデータセット構築とモデルの提案。「Our work connects two ideas: a theoretical idea of which questions are useful for understanding and likely to be answered later in a text, and an empirical notion of what questions are useful.」
論文でも指摘されている通り、品質評価にも重要。fine tunedなモデルはGPT-4をoutperformとのことだが、（Limitaionに記載の通り）ドメインの影響なども知りたいところ。
リポジトリはGitHub – ritikamangla/QSalience

Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers

Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers [81.5]
本稿では,MLLM(Multilingual Large Language Model)文学における最近の進歩と新たなトレンドを要約する一貫した視点を提示する。私たちの研究がコミュニティに迅速なアクセスを提供し、MLLMにおける画期的な研究を促進することを願っています。
論文参考訳（メタデータ） (Sun, 07 Apr 2024 11:52:44 GMT)
マルチリンガルLLMに対するサーベイ。アプローチも結果も様々でありがたいサーベイであり、かつ論文リストがプロジェクトサイトに整理して一覧化されているのもありがたい。
プロジェクトサイトはMLLM (multilingual-llm.net)

AUG: Aerial Image Urban Scene Graph Generation　データセット

AUG: A New Dataset and An Efficient Model for Aerial Image Urban Scene Graph Generation [40.1]
本稿では,航空画像都市景観グラフ生成(AUG)データセットを構築し,公開する。 AUGデータセットの画像は、低高度のオーバーヘッドビューでキャプチャされる。複雑な都市景観において局地的な状況が過大評価されるのを避けるため,本稿では,新たな局地性保存グラフ畳み込みネットワーク(LPG)を提案する。
論文参考訳（メタデータ） (Thu, 11 Apr 2024 14:29:30 GMT)
aerial image urban scene graph generation (AUG) datasetとモデルの提案。空撮画像から画像からの物体及び複雑な関係の理解を行う必要があり、とても難しそうなタスク。
リポジトリはLPG-SGG: locality-preserving graph convolutional network (LPG) (gitee.com)

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Introducing v0.5 of the AI Safety Benchmark from MLCommons [94.1]
本稿では,MLCommons AI Safety Working Groupが作成したAI Safety Benchmarkのv0.5を紹介する。このベンチマークは、チャットチューニング言語モデルを使用するAIシステムの安全性リスクを評価するように設計されている。
論文参考訳（メタデータ） (Thu, 18 Apr 2024 15:01:00 GMT)
AI Safety Benchmark の紹介、対象はチャット。分類など参考になる部分も多い。
リポジトリはmlcommons/modelbench: Run safety benchmarks against AI models and view detailed reports showing how well they performed. (github.com)

Llama 3, Mixtral 8x22B, Reka Core, WizardLM2

今年のHAI AI Index reportでも取り上げられていた通り基盤モデルの構築が盛んになっている。　AI Index Report 2024 – Artificial Intelligence Index (stanford.edu)

先週もLLM関連のニュースが多く、寛容な独自ライセンスのLlama 3、Apache-2ライセンスのMixtral 8x22Bとオープンなモデルの盛り上がりも衰えていない。設立間もないRekaによるReka Coreにも注目である。モデル性能も非常に高い。

WizardLM2も公開されたようだが、一時的になのかリポジトリにアクセスできなくなっている。@WizardLM on Hugging Face: “🔥🔥🔥 Introducing WizardLM-2! 📙Release Blog:…”、こちらも性能的に期待大

Meta Llama 3、Introducing Meta Llama 3: The most capable openly available LLM to date
- 8B, 70Bを公開。8Bは同規模のMistralやGemmaより高性能。70BはベンチマークによるがGPT-4やClaude、Geminiといった商用モデルと競合可能な性能。400Bを構築中、構築段階でもGPT-4を超えそうとのことで最終性能が非常に楽しみ。
- モデルカード（llama3/MODEL_CARD.md at main · meta-llama/llama3 (github.com)）が公開されており、構築に投じた計算リソースも公開されている。8Bで1.3M GPU hour、70Bで6.4M GPU hour。Lambda LabsのGPU Cloudでは3.5USD/GPU hour程度なのでかなりの額を投じていることになる。

Mixtral 8×22: Cheaper, Better, Faster, Stronger | Mistral AI | Frontier AI in your hands
- MistralによるMoE構成LLM。Apache-2ライセンスとOSS。性能はClaude HaikuやGemini Pro、GPT-3.5、Qwen 1.5 72Bに競合するレベルに見える。
- HuggingFaceにも公開されている　mistralai/Mixtral-8x22B-v0.1 · Hugging Face、mistralai/Mixtral-8x22B-Instruct-v0.1 · Hugging Face

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models [69.4]
Rekaモデルはテキスト、画像、ビデオ、オーディオ入力で処理し、推論することができる。 Reka EdgeとReka Flashは最先端のモデルであるだけでなく、多くの大きなモデルよりも優れています。最も有能で最大のモデルであるReka Coreは、自動評価とブライド評価の両方において、最高のフロンティアモデルにアプローチしています。
論文参考訳（メタデータ） (Thu, 18 Apr 2024 17:59:48 GMT)
Reka Core: Reka Core: Our Frontier Class Multimodal Language Model — Reka AI、マルチモーダルでGPT-4Vと競合。

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length [112.8]
文脈長無制限の効率的なシーケンスモデリングのためのニューラルネットワークであるMegalodonを紹介する。 Llama2と比較して、Megalodonは70億のパラメータと2兆のトレーニングトークンのスケールでTransformerよりも効率が良い。
論文参考訳（メタデータ） (Fri, 12 Apr 2024 20:28:14 GMT)
Transformerより効率が良いとする構造の提案。MEGA (exponential moving average with gated attention)を継承。同規模のLlama2より性能がよさそうで驚き。
リポジトリはXuezheMax/megalodon: Reference implementation of Megalodon 7B model (github.com)

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [56.8]
大規模言語モデルの自己改善のためのAlphaLLMを紹介する。モンテカルロ木探索(MCTS)とLLMを統合し、自己改善ループを確立する。実験の結果,AlphaLLM は付加アノテーションを使わずに LLM の性能を大幅に向上することがわかった。
論文参考訳（メタデータ） (Thu, 18 Apr 2024 15:21:34 GMT)
Monte Carlo Tree Search + LLM、「we use the term option as a search node and propose option-level MCTS where each option represents a sequence of tokens, which can range from multiple tokens to several sentences.」というのが興味深く、性能向上にも寄与

On the Causal Nature of Sentiment Analysis

On the Causal Nature of Sentiment Analysis [98.4]
感性分析(SA)は、製品レビューのようなテキストで表される感情を特定することを目的としている。本稿では2つのタスクの組み合わせとしてSAを定式化する。予測タスクでは,LLMの性能向上のために,サンプルの裏側にある因果関係のメカニズムを用いる。
論文参考訳（メタデータ） (Wed, 17 Apr 2024 04:04:34 GMT)
causally-awareで心理学を考慮したセンチメント分析手法の提案、効果検証。「we have formulated the task of SA into a prediction problem and a causal discovery problem.」（SA = sentiment analysis）と対象の問題を理解・分解したうえでLLMに解かせるというのは実務上重要になっていきそう、というのと、causal promptというのも興味深い。

2024年5月
月	火	水	木	金	土	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31