arXiv最新論文の紹介

CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment

CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment [38.4]
英語中心のモデルは、通常他の言語では準最適である。そこで本研究では,言語間命令チューニングデータの混合合成を利用したCrossInという新しい手法を提案する。
論文参考訳（メタデータ） (Thu, 18 Apr 2024 06:20:50 GMT)
多言語能力を上げるためのInstruction tuningアプローチ。「CrossIn: It comprises cross-lingual instruction tuning datasets, where instruction and output are featured in two different languages」「Trans: It consists of translation pairs for instructions.」を併用。後者の「We hypothesize that if the model concurrently learns these translation tasks, it could facilitate the transfer of knowledge between languages.」は興味深い仮説。評価データも構築している。
Mistral等を使って提案手法の効果を検証。
リポジトリはGitHub – Lingy12/CrossIn

Dynamic Typography: Bringing Text to Life via Video Diffusion Prior

Dynamic Typography: Bringing Text to Life via Video Diffusion Prior [73.7]
動的タイポグラフィー(Dynamic Typography)と呼ばれる自動テキストアニメーション方式を提案する。意味的意味を伝えるために文字を変形させ、ユーザプロンプトに基づいて活気ある動きを注入する。本手法は,ベクトルグラフィックス表現とエンドツーエンド最適化に基づくフレームワークを利用する。
論文参考訳（メタデータ） (Thu, 18 Apr 2024 06:06:29 GMT)
デモが非常にかっこいいDynamic Typography生成手法の提案。入力文字のベジェ曲線の制御点とベクトルグラフィクス（SVG）を連動させるアプローチでこちらも興味深い。
🪄 animate your word! (animate-your-word.github.io)

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

JetMoE: Reaching Llama2 Performance with 0.1M Dollars [25.3]
このレポートでは、JetMoE-8Bという新しい大規模言語モデルを紹介します。低コストにもかかわらず、JetMoE-8BはLlama2-7Bモデルより優れ、JetMoE-8B-ChatはLlama2-13B-Chatモデルより優れていた。本報告では,すべてのトレーニングパラメータとデータ混合物について詳述し,オープンファンデーションモデルの開発における今後の取り組みを促進する。
論文参考訳（メタデータ） (Thu, 11 Apr 2024 00:52:39 GMT)
安価（といっても「$0.1 million, using 1.25T tokens from carefully mixed open-source corpora and 30,000 H100 GPU hours.」）でLLMを構築するレシピの提案
リポジトリはmyshell-ai/JetMoE: Reaching LLaMA2 Performance with 0.1M Dollars (github.com)

Many-Shot In-Context Learning

Many-Shot In-Context Learning [57.6]
大規模言語モデル (LLMs) は、文脈内学習 (ICL) において優れている我々は、多種多様な生成的および識別的タスクにおける顕著なパフォーマンス向上を観察する。 Reinforced と Unsupervised ICL は多発的なシステムでは極めて有効であることがわかった。
論文参考訳（メタデータ） (Wed, 17 Apr 2024 02:49:26 GMT)
Gemini 1.5などで可能になったMany shot（500 shotなど）などの効果の分析。性能が上がる例が多いが「On some tasks (e g , code verifier, planning), we did observe slight performance deterioration beyond a certain number of shots.」とのこと。Reinforced ICL、Unsupervised ICL という人間を介さないICLも検証していて「We found that, for problem-solving domains where human-generated rationales are expensive to obtain, Reinforced and Unsupervised ICL can obtain strong performance when compared to ICL with human data.」とのこと。
長いコンテキストの利点をアピールする論文。SSMだとどうなんるんやろという興味がある。

Which questions should I answer? Salience Prediction of Inquisitive Questions

Which questions should I answer? Salience Prediction of Inquisitive Questions [118.1]
非常に健全な質問は、同じ記事で経験的に答えられる可能性が高いことを示す。質問に対する回答が,ニュースの要約品質の指標であることを示すことで,我々の知見をさらに検証する。
論文参考訳（メタデータ） (Tue, 16 Apr 2024 21:33:05 GMT)
質問の良さを予測するためのデータセット構築とモデルの提案。「Our work connects two ideas: a theoretical idea of which questions are useful for understanding and likely to be answered later in a text, and an empirical notion of what questions are useful.」
論文でも指摘されている通り、品質評価にも重要。fine tunedなモデルはGPT-4をoutperformとのことだが、（Limitaionに記載の通り）ドメインの影響なども知りたいところ。
リポジトリはGitHub – ritikamangla/QSalience

Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers

Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers [81.5]
本稿では,MLLM(Multilingual Large Language Model)文学における最近の進歩と新たなトレンドを要約する一貫した視点を提示する。私たちの研究がコミュニティに迅速なアクセスを提供し、MLLMにおける画期的な研究を促進することを願っています。
論文参考訳（メタデータ） (Sun, 07 Apr 2024 11:52:44 GMT)
マルチリンガルLLMに対するサーベイ。アプローチも結果も様々でありがたいサーベイであり、かつ論文リストがプロジェクトサイトに整理して一覧化されているのもありがたい。
プロジェクトサイトはMLLM (multilingual-llm.net)

AUG: Aerial Image Urban Scene Graph Generation　データセット

AUG: A New Dataset and An Efficient Model for Aerial Image Urban Scene Graph Generation [40.1]
本稿では,航空画像都市景観グラフ生成(AUG)データセットを構築し,公開する。 AUGデータセットの画像は、低高度のオーバーヘッドビューでキャプチャされる。複雑な都市景観において局地的な状況が過大評価されるのを避けるため,本稿では,新たな局地性保存グラフ畳み込みネットワーク(LPG)を提案する。
論文参考訳（メタデータ） (Thu, 11 Apr 2024 14:29:30 GMT)
aerial image urban scene graph generation (AUG) datasetとモデルの提案。空撮画像から画像からの物体及び複雑な関係の理解を行う必要があり、とても難しそうなタスク。
リポジトリはLPG-SGG: locality-preserving graph convolutional network (LPG) (gitee.com)

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Introducing v0.5 of the AI Safety Benchmark from MLCommons [94.1]
本稿では,MLCommons AI Safety Working Groupが作成したAI Safety Benchmarkのv0.5を紹介する。このベンチマークは、チャットチューニング言語モデルを使用するAIシステムの安全性リスクを評価するように設計されている。
論文参考訳（メタデータ） (Thu, 18 Apr 2024 15:01:00 GMT)
AI Safety Benchmark の紹介、対象はチャット。分類など参考になる部分も多い。
リポジトリはmlcommons/modelbench: Run safety benchmarks against AI models and view detailed reports showing how well they performed. (github.com)

Llama 3, Mixtral 8x22B, Reka Core, WizardLM2

今年のHAI AI Index reportでも取り上げられていた通り基盤モデルの構築が盛んになっている。　AI Index Report 2024 – Artificial Intelligence Index (stanford.edu)

先週もLLM関連のニュースが多く、寛容な独自ライセンスのLlama 3、Apache-2ライセンスのMixtral 8x22Bとオープンなモデルの盛り上がりも衰えていない。設立間もないRekaによるReka Coreにも注目である。モデル性能も非常に高い。

WizardLM2も公開されたようだが、一時的になのかリポジトリにアクセスできなくなっている。@WizardLM on Hugging Face: “🔥🔥🔥 Introducing WizardLM-2! 📙Release Blog:…”、こちらも性能的に期待大

Meta Llama 3、Introducing Meta Llama 3: The most capable openly available LLM to date
- 8B, 70Bを公開。8Bは同規模のMistralやGemmaより高性能。70BはベンチマークによるがGPT-4やClaude、Geminiといった商用モデルと競合可能な性能。400Bを構築中、構築段階でもGPT-4を超えそうとのことで最終性能が非常に楽しみ。
- モデルカード（llama3/MODEL_CARD.md at main · meta-llama/llama3 (github.com)）が公開されており、構築に投じた計算リソースも公開されている。8Bで1.3M GPU hour、70Bで6.4M GPU hour。Lambda LabsのGPU Cloudでは3.5USD/GPU hour程度なのでかなりの額を投じていることになる。

Mixtral 8×22: Cheaper, Better, Faster, Stronger | Mistral AI | Frontier AI in your hands
- MistralによるMoE構成LLM。Apache-2ライセンスとOSS。性能はClaude HaikuやGemini Pro、GPT-3.5、Qwen 1.5 72Bに競合するレベルに見える。
- HuggingFaceにも公開されている　mistralai/Mixtral-8x22B-v0.1 · Hugging Face、mistralai/Mixtral-8x22B-Instruct-v0.1 · Hugging Face

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models [69.4]
Rekaモデルはテキスト、画像、ビデオ、オーディオ入力で処理し、推論することができる。 Reka EdgeとReka Flashは最先端のモデルであるだけでなく、多くの大きなモデルよりも優れています。最も有能で最大のモデルであるReka Coreは、自動評価とブライド評価の両方において、最高のフロンティアモデルにアプローチしています。
論文参考訳（メタデータ） (Thu, 18 Apr 2024 17:59:48 GMT)
Reka Core: Reka Core: Our Frontier Class Multimodal Language Model — Reka AI、マルチモーダルでGPT-4Vと競合。

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length [112.8]
文脈長無制限の効率的なシーケンスモデリングのためのニューラルネットワークであるMegalodonを紹介する。 Llama2と比較して、Megalodonは70億のパラメータと2兆のトレーニングトークンのスケールでTransformerよりも効率が良い。
論文参考訳（メタデータ） (Fri, 12 Apr 2024 20:28:14 GMT)
Transformerより効率が良いとする構造の提案。MEGA (exponential moving average with gated attention)を継承。同規模のLlama2より性能がよさそうで驚き。
リポジトリはXuezheMax/megalodon: Reference implementation of Megalodon 7B model (github.com)

2025年7月
月	火	水	木	金	土	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31