2024年7月12日 – arXiv最新論文の紹介

How Does Quantization Affect Multilingual LLMs? [50.9]
量子化技術は、大規模な言語モデルの推論速度と展開を改善するために広く使われている。量子化多言語 LLM の徹底的な解析を行い、言語間および様々なスケールでその性能に焦点をあてる。自動ベンチマーク, LLM-as-a-Judge 法, 人的評価を用いて, 1) 量子化の有害な影響は人的評価において明らかであり, 1) 自動タスクにおける日本語の1.7%の平均低下は, 現実的なプロンプト上での人間の評価者による16.0%の減少に対応し, 2) 言語は量子化の影響を受け, 非ラテン語のスクリプト言語が最悪の影響を受け, (3) 数学的推論などの課題が急速に悪化する。
論文参考訳（メタデータ） (Wed, 03 Jul 2024 15:39:40 GMT)
LLMに対する量子化の影響を多言語の観点で調査した論文。「(1) Damage from quantization is much worse than appears from automatic metrics: even when not observed automatically, human evaluators notice it.」、「(2) Quantization affects languages to varying degrees, with nonLatin script languages more severely affected on automatic benchmarks.」、「(3) Challenging tasks degrade fast and severely: math performance is strikingly reduced, as are responses on realistic challenging」という結論。
多言語（というか日本語）への影響は経験的にもそうだと思う。英語以外の対応にも力を入れているCohereらしい調査な気がする。

CausalScore: An Automatic Reference-Free Metric for Assessing Response Relevance in Open-Domain Dialogue Systems [43.5]
本稿では,対話履歴と応答の因果的強度を測定することで応答の関連性を評価する,CausalScoreと呼ばれる新しい指標を提案する。実験の結果,CausalScoreは人間の判断と整合し,既存の最先端指標を大きく上回っていることがわかった。
論文参考訳（メタデータ） (Tue, 25 Jun 2024 06:08:16 GMT)
「we propose a novel metric CausalScore to quantify the relevance of responses by estimating the causal strength (Janzing et al , 2013a) between utterances and responses, where causal strength measures the strength of causal relations.」という指標の提案、および評価データセットの構築
リポジトリはGitHub – WilliamsToTo/causalscore_dialogue

月	火	水	木	金	土	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31