数値データ – arXiv最新論文の紹介

Probing for Arithmetic Errors in Language Models

Probing for Arithmetic Errors in Language Models [86.8]
言語モデルの内部アクティベーションは、算術誤差を検出するために使用できる。単純なプローブはモデルが予測した出力と正解の両方を隠蔽状態から正確に復号できることを示す。モデル精度を90%以上の精度で予測する軽量エラー検出器を訓練する。
論文参考訳（メタデータ） (Wed, 16 Jul 2025 16:27:50 GMT)
「Starting with a controlled set- ting of 3-digit addition, we show that simple probes can accurately decode both the model’s predicted output and the correct an- swer from hidden states, regardless of whether the model’s output is correct.」はまぁできるだろうとして、「We then extend this analysis to a more complex setting, where the model is asked to solve math word problems only requiring addition (Cobbe et al , 2021) using a structured chain-of-thought (CoT) format (Wei et al , 2022), in which intermediate steps are expressed as equations (e g , <a+b=c>). Remarkably, we find that the same probes trained on simple arithmetic queries can be applied directly to this setting, maintaining over 80% accuracy in detecting whether the model is producing correct intermediate results.」やself correlationに役立ったりは面白い結果。

Number Cookbook: Number Understanding of Language Models and How to Improve It [64.0]
大規模言語モデル(LLM)は、基本的な数値的な理解と処理において予期せぬ誤りを犯しながら、複雑な推論タスクの増大を解決することができる。本稿では,LLMの数値理解と処理能力(NUPA)について包括的に検討する。
論文参考訳（メタデータ） (Wed, 06 Nov 2024 08:59:44 GMT)
LLMにおける numerical understanding and processing ability (NUPA)の分析と、その改善方法の検討。現状だとコード生成を介すなどツールを使うアプローチが有力だが、「1) we want to study the self-contained NUPA of LLMs,　2) calling external tools whenever encountering numbers increases the inference latency (Xu et al , 2024), and 3) we believe NUPA without tools is a necessary ability of AGI.」という点から本件ではツール利用が検討対象外となっている。
現時点では「We investigate NUPA of LLMs and introduce a comprehensive benchmark, the NUPA test, to reveal that numerical problems remain challenging for modern LLMs.」とのこと。やはり難しい問題。実用上はコード生成を介すなどして対応できなくはないが・・・。
リポジトリはGitHub – GraphPKU/number_cookbook

How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs [69.6]
本稿では,変圧器を用いた大規模言語モデルの数学的タスクにおける有効性に影響を与える重要な要因として,数値的精度を同定する。その結果,数値精度の低いトランスフォーマーでは,繰り返し加算や整数乗算などの算術的なタスクに対処できないことがわかった。対照的に、標準的な数値精度のトランスフォーマーは、モデルサイズを大幅に小さくすることで、これらのタスクを効率的に処理することができる。
論文参考訳（メタデータ） (Thu, 17 Oct 2024 17:59:35 GMT)
「Our results show that Transformers operating with low numerical precision fail to address arithmetic tasks, such as iterated addition and integer multiplication, unless the model size grows super-polynomially with respect to the input length.」という指摘。

NumGPT: Improving Numeracy Ability of Generative Pre-trained Models [59.9]
テキスト中の数値特性を明示的にモデル化する生成事前学習モデルであるNumGPTを提案する。具体的には、プロトタイプベースの数字埋め込みを利用して、数字の仮数をエンコードし、個別の埋め込み方法によって数字の指数をエンコードする。数値認識損失関数は、NumGPTの事前学習目的に数値を統合するように設計されている。
論文参考訳（メタデータ） (Tue, 7 Sep 2021 15:06:12 GMT)
- テキストの中の数値を通常の文字とは異なる扱いとする事前学習モデルの提案。数値関連のデータについて性能が向上したとのこと。
- 機械翻訳でも数値の取り扱いに苦労することが多い。機械的に対訳ペアを生成したデータセットの多くが数値関連の問題を抱えていることからも、数値を別扱いするというのは良い方法のように思える。
  - ニューラル機械翻訳モデルと対訳データの品質で示したように、対訳コーパス（WikiMatrixなど）によっては数値対応が取れていないデータをルールベースで削除することが精度向上につながることもある。