llama-3 – arXiv最新論文の紹介

Towards Effective and Efficient Continual Pre-training of Large Language Models

Towards Effective and Efficient Continual Pre-training of Large Language Models [163.3]
CPT(Continuous pre-training)は、特定のドメインやタスクに言語モデルを適用する上で重要なアプローチである。本稿では,Llama-3 (8B) の継続事前訓練に関する技術的報告を報告する。バックボーンモデルの中国語能力と科学的推論能力を大幅に向上させる。
論文参考訳（メタデータ） (Fri, 26 Jul 2024 13:55:21 GMT)
Llama-3に対して継続学習を実施、中国語能力と科学技術関連の推論能力を強化した報告。合成データを活用している点が興味深い。
リポジトリはGitHub – RUC-GSAI/Llama-3-SynE、現時点ではcoming soon

The Llama 3 Herd of Models [345.5]
本稿ではLlama 3と呼ばれる新しい基礎モデルについて述べる。 Llama 3は、多言語性、コーディング、推論、ツール使用をサポートする言語モデルの群れである。 Llama 3は、GPT-4のような主要な言語モデルに匹敵する品質を多くのタスクで提供しています。
論文参考訳（メタデータ） (Wed, 31 Jul 2024 17:54:27 GMT)
Llama3の様々なバリエーションの紹介、モデル構築に関連する情報も多くとても興味深いのと、マルチモーダル化を進めているよう。「The resulting models are not yet being broadly released as they are still under development.」、「We note that our multimodal models are still under development and not yet ready for release.」など今後を期待させる表現も多い。

Is Bigger Edit Batch Size Always Better? — An Empirical Study on Model Editing with Llama-3 [2.6]
本研究では,最新の大言語モデルであるLlama-3に着目したターゲットモデル編集分析を行う。最大4096個の編集を対象とする評価により,最も効果的な編集層を同定する。
論文参考訳（メタデータ） (Wed, 01 May 2024 17:50:37 GMT)
Llama-3を対象としたモデル編集、出るのが速い・・・
「Contrary to previous belief, our experiments show that earlier layers may be more optimal intervention points, and that smaller, frequent sequential batch size edits have a superior performance in comparison to larger batch sizes.」、この手のテクニックはモデルが更新されるたび変わるのだろうか。。。